GUADEC followup: evolution mallocs by callstack

Evolution memory usage has been a sore spot for many GNOME desktop users, including those of us who run GNOME on a thin client. In his GUADEC talk on GNOME performance, Federico Mena Quintero mentioned that it would be useful to profile evolution memory usage. This entry shows the dtrace oneliner which profiles arg0 (malloc size) for the calls to malloc during the startup of the evolution process. During our talk I tried to also profile mallocs by callstack, but Federico's blog on this topic, indicated that profiling evolution-data-server mallocs while visiting all email folders might be useful. Here is the dtrace script:

#!/usr/sbin/dtrace -s

pid$target:libc:malloc:entry
{
   @memwhere[ustack()] = quantize(arg0);
}
I call it with:
./g_evolutionmalloc.d -x aggsize=10m -c /usr/lib/evolution-data-server-1.2/evolution-data-server-1.6  > edatamalloc.out

(aggsize is the size of the aggregation buffer). Then I launch evolution and visit all of the folders in my mailbox. The output is quite large, remember for each unique callstack which calls malloc, I quantize the size of the malloc. Using the oneliner shows the overall distribution of sizes by evolution-data-server when I run evolution and browse all of my folders:

       
    value  ------------- Distribution ------------- count
               0 |                                         0
               1 |                                         27
               2 |@                                        311
               4 |@@@                                      663
               8 |@@@@@@@@@@                               2046
              16 |@@@@@@@@@@@@                             2553
              32 |@@@@@@@                                  1471
              64 |@@@                                      721
             128 |@                                        177
             256 |@                                        142
             512 |@                                        132
            1024 |                                         91
            2048 |                                         21
            4096 |                                         5
            8192 |                                         48
           16384 |                                         2
           32768 |                                         0
           65536 |                                         4
          131072 |                                         0
          262144 |                                         2
          524288 |                                         0

Lots of 8 and 16 byte mallocs, is this death by a thousand cuts? Gslice (and/or Solaris's slab allocator) should at least help reduce the time for these small mallocs. But where are we when these allocs take place? You are here. (bzip2 compressed because our roller admin doesn't like files > 3M.) As you might expect, most of the unique callstacks only have one value for arg0 (within a power of 2). Here are a few excerpts that caught my eye: Calendar timezone stuff calls malloc with a wide range of sizes:


              libc.so.1`malloc
              libecal-1.2.so.3.2.8`icalarray_append+0x3a
              libecal-1.2.so.3.2.8`icaltimezone_parse_zone_tab+0x21f
              libecal-1.2.so.3.2.8`icaltimezone_init_builtin_timezones+0x24
              libecal-1.2.so.3.2.8`icaltimezone_get_utc_timezone+0x21
              libecalbackendfile.so`e_cal_backend_file_init+0x63
              libgobject-2.0.so.0.1000.2`g_type_create_instance+0x2e4
              libgobject-2.0.so.0.1000.2`g_object_constructor+0x23
              libgobject-2.0.so.0.1000.2`g_object_newv+0x350
              libgobject-2.0.so.0.1000.2`g_object_new_valist+0x331
              libgobject-2.0.so.0.1000.2`g_object_new+0x62
              libecalbackendfile.so`_events_new_backend+0x35
              libedata-cal-1.2.so.1.3.10`e_cal_backend_factory_new_backend+0x35
              libedata-cal-1.2.so.1.3.10`impl_CalFactory_getCal+0x336
              libedata-cal-1.2.so.1.3.10`_ORBIT_skel_small_GNOME_Evolution_Calendar_CalFactory_getCal+0x35
              libORBit-2.so.0.1.0`ORBit_POAObject_invoke+0x3d
              libORBit-2.so.0.1.0`ORBit_OAObject_invoke+0x34
              libORBit-2.so.0.1.0`ORBit_small_invoke_adaptor+0x2b5
              libORBit-2.so.0.1.0`ORBit_POAObject_handle_request+0x334
              libORBit-2.so.0.1.0`ORBit_POAObject_invoke_incoming_request+0x54

           value  ------------- Distribution ------------- count
             512 |                                         0
            1024 |@@@                                      1
            2048 |@@@                                      1
            4096 |@@@@@@@@@@                               3
            8192 |@@@@@@@@@@@@@@@@@@@@                     6
           16384 |@@@                                      1
           32768 |                                         0

And there are some individual calls to malloc from calendar timezone code which are relatively large.

              libc.so.1`malloc
              libecal-1.2.so.3.2.8`icalarray_append+0x3a
              libecal-1.2.so.3.2.8`icaltimezone_parse_zone_tab+0x21f
              libecal-1.2.so.3.2.8`icaltimezone_init_builtin_timezones+0x24
              libecal-1.2.so.3.2.8`icaltimezone_get_utc_timezone+0x21
              libecalbackendfile.so`e_cal_backend_file_init+0x63
              libgobject-2.0.so.0.1000.2`g_type_create_instance+0x2e4
              libgobject-2.0.so.0.1000.2`g_object_constructor+0x23
              libgobject-2.0.so.0.1000.2`g_object_newv+0x350
              libgobject-2.0.so.0.1000.2`g_object_new_valist+0x331
              libgobject-2.0.so.0.1000.2`g_object_new+0x62
              libecalbackendfile.so`_events_new_backend+0x35
              libedata-cal-1.2.so.1.3.10`e_cal_backend_factory_new_backend+0x35
              libedata-cal-1.2.so.1.3.10`impl_CalFactory_getCal+0x336
              libedata-cal-1.2.so.1.3.10`_ORBIT_skel_small_GNOME_Evolution_Calendar_CalFactory_getCal+0x35
/
              libedata-book-1.2.so.2.2.1`e_book_backend_load_source+0x89
              libedata-book-1.2.so.2.2.1`e_book_backend_open+0x89
              libedata-book-1.2.so.2.2.1`impl_GNOME_Evolution_Addressbook_Book_open+0x5a
              libedata-book-1.2.so.2.2.1`_ORBIT_skel_small_GNOME_Evolution_Addressbook_Book_open+0x2e
              libORBit-2.so.0.1.0`ORBit_POAObject_invoke+0x3d
              libORBit-2.so.0.1.0`ORBit_OAObject_invoke+0x34
              libORBit-2.so.0.1.0`ORBit_small_invoke_adaptor+0x2b5
              libORBit-2.so.0.1.0`ORBit_POAObject_handle_request+0x334
              libORBit-2.so.0.1.0`ORBit_POAObject_invoke_incoming_request+0x54
              libORBit-2.so.0.1.0`giop_thread_queue_process+0x105
              libORBit-2.so.0.1.0`giop_request_handler_thread+0x5b
              libglib-2.0.so.0.1000.2`g_thread_pool_thread_proxy+0xb4
              libglib-2.0.so.0.1000.2`g_thread_create_proxy+0x11b

           value  ------------- Distribution ------------- count
          131072 |                                         0
          262144 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ 1
          524288 |                                         0



Uggh, too much data! So I can add a predicate to the original script so I only have look at malloc calls between 8 and 32 bytes:

#!/usr/sbin/dtrace -s

pid$target:libc:malloc:entry
/arg0 > 8 && arg0 < 32/
{
   @memwhere[ustack()] = quantize(arg0);
}

That does thin it out a bit, but there are still 578 unique callstacks. Maybe I could sum the total amount of memory allocated for each callstack and only print the ones which exceed 1M? I'm open to suggestions.

Similar memory profiles of the evolution process itself are also interesting. During evolution startup and folder browse, there are 30 malloc requests for 64K from this evolution process callstack:

              libc.so.1`malloc+0x60
              libresolv.so.2`__memget+0x1f
              libresolv.so.2`ho_byname2+0x42
              libresolv.so.2`ho_byname2+0x7c
              libresolv.so.2`gethostbyname2_p+0x111
              libresolv.so.2`gethostbyname_p+0x45
              libresolv.so.2`res_gethostbyname+0x21
              nss_dns.so.1`_gethostbyname+0x1c
              nss_dns.so.1`getbyname+0x212
              libc.so.1`nss_search+0x1a9
              libnsl.so.1`_switch_getipnodebyname_r+0x73
              libnsl.so.1`_door_getipnodebyname_r+0xf6
              libnsl.so.1`_get_hostserv_inetnetdir_byname+0xb62
              libnsl.so.1`getipnodebyname+0xdf
              libsocket.so.1`get_addr+0x126
              libsocket.so.1`_getaddrinfo+0x414
              libsocket.so.1`getaddrinfo+0x19
              libcamel-1.2.so.0.0.0`cs_getaddrinfo+0x2b
              libc.so.1`_thr_setup+0x51
              libc.so.1`_lwp_start

           value  ------------- Distribution ------------- count
           32768 |                                         0
           65536 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ 30

If anyone finds this useful, I can upload a dump of this or any simililar information.

Comments:

Excellent observability. This exactly what I need to reduce the time in our EU sponsored Computer Vision project. It is amazing. Now, imagine my brain dead managers who prefer to use Windows because they are mainstream, in a Computer Vison Project. I cannot disclose my name!

Posted by FITHIS on July 06, 2006 at 11:13 AM GMT+00:00 #

Thanks for the comment. I was thinking of how dtrace could help a windows project. I don't think running windows within qemu or vmware ontop of Solaris would gain you much. But if your software is written in Java, you could run it in Java 6 beta on Solaris and you would get dtrace observability.

Posted by bnitz on July 07, 2006 at 02:17 PM GMT+00:00 #

Post a Comment:
Comments are closed for this entry.
About

bnitz

Search

Archives
« April 2014
MonTueWedThuFriSatSun
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
    
       
Today