WebSphere Tuning Tip: Scalability on Solaris

If you see scalability issues with IBM WebSphere Application Server (WAS) on Solaris systems (especially on the CMT based servers), you should consider using an alternate memory allocator. In fact, you should do this as a best practice. The white paper titled "Improving Application Efficiency Through Chip Multi-Threading", has an excellent explanation about the CMT and SPARC architecture overview and how to improve application performance and CMT system efficiency. In section, it describes about memory allocators as follows: "malloc and free are single-threaded operations and are among the bottlenecks for multi-threaded applications. A multi-threaded malloc scales with multi-threaded requests and can improve multi-threaded application performance. The Solaris OS has two types of multi-threaded malloc libraries, mt-malloc and umem".

The single threaded malloc and free functions are in the default libc.so library. You can find libmtmalloc.so and libumem.so libraries in the /usr/lib directory on Solaris 10. One of the main reasons that Sun decides to keep the standard malloc with libc is that there are many ISV's and applications that depend on this library. Thus, we provide the alternate memory allocators as options. Below, I provide an example how you can verify the hot locks with libc.so in your WAS java process.

You can use the plockstat command on Solaris 10. This is a DTrace client, so you have to execute it in the global zone as root. To execute it as non-root, you must have proper privileges granted for DTrace. See the man page for plockstat for more details about the options. Here, I hit <Control-c> after several seconds. Alternately, you can use the "-e <secs>" option to execute plockstat for a number of seconds without having to hit <Control-c> to break out of the command.

# plockstat -H -p <WAS_PID>
Mutex hold

Count     nsec Lock                         Caller
   30    89886 0xfe3e1940                   libjvm.so`__1cCosRpd_suspend_thread6Fpn
   30    78896 0xfe3e17c0                   libjvm.so`__1cCosRpd_suspend_thread6Fpn
   14    72921 libc.so.1`_uberdata          libc.so.1`_lwp_start
   30    72633 0xfe3e1800                   libjvm.so`__1cCosRpd_suspend_thread6Fpn
   30    71996 0xfe3e1840                   libjvm.so`__1cCosRpd_suspend_thread6Fpn
    1    34800 0xfe3e1b40                   libjvm.so`_start+0x4c
    2    33500 libc.so.1`libc_malloc_lock   libjvm.so`__1cUGenericGrowableArrayUcle
    4    32750 libc.so.1`libc_malloc_lock   0xfea32e40
    1    28300 libc.so.1`libc_malloc_lock   0xfeaa7a88
    1    26600 0xfe3e1940                   libjvm.so`__1cCosMstart_thread6FpnGThre
    1    26300 0xfe3e1940                   libjvm.so`_start+0x4c
    1    25600 libc.so.1`libc_malloc_lock   libjvm.so`__1cQChunkPoolCleanerEtask6M_
   12    25541 libc.so.1`_uberdata          libc.so.1`thr_create+0x2c
    3    25433 libc.so.1`libc_malloc_lock   libjava.so`Java_java_lang_ClassLoader_d
    6    23566 libc.so.1`libc_malloc_lock   libjvm.so`__1cCosGmalloc6FI_pv_+0x20
    1    23500 libc.so.1`__sbrk_lock        libc.so.1`_malloc_unlocked+0x1fc
    1    21900 libc.so.1`libc_malloc_lock   libjava.so`JNU_GetStringPlatformChars+0
    2    20200 libc.so.1`libc_malloc_lock   libjvm.so`__1cCosGmalloc6FI_pv_+0x20
    1    20000 libc.so.1`libc_malloc_lock   libverify.so`VerifyFormat+0xdf8
   12    19200 libc.so.1`_uberdata          libjvm.so`__1cCosNcreate_thread6FpnGThr
    1    19200 libc.so.1`libc_malloc_lock   libWs60ProcessManagement.so`process_str
    1    18900 0xfe3e1b00                   libjvm.so`__1cCosMstart_thread6FpnGThre
    1    18000 0xfe3e1980                   libjvm.so`__1cCosMstart_thread6FpnGThre
    1    17800 0xfe3e1ac0                   libjvm.so`__1cCosMstart_thread6FpnGThre
    1    17500 0xfe3e1b40                   libjvm.so`__1cCosMstart_thread6FpnGThre
    1    17400 0xfe3e1bc0                   libjvm.so`__1cCosMstart_thread6FpnGThre
    1    17400 0xfe3e1a40                   libjvm.so`__1cCosMstart_thread6FpnGThre
   30    17300 0xfe3e1940                   libjvm.so`__1cGThreadMdo_vm_resume6Mi_i
    4    17250 libc.so.1`libc_malloc_lock   libjvm.so`__1cCosGmalloc6FI_pv_+0x20
    2    16700 libc.so.1`libc_malloc_lock   libjava.so`JNU_ReleaseStringPlatformCha
    4    16475 libc.so.1`libc_malloc_lock   libjvm.so`__1cCosGmalloc6FI_pv_+0x20
From this output, you notice there are lock contentions with malloc. To improve this situation, you can switch to use the multi-threaded libumem.so library. Assume you have a 32-bit WAS JVM and the default server profile. You need to stop the WAS java process, set LD_PRELOAD environment variable, and restart WAS. LD_PRELOAD is equivalent to LD_PRELOAD_32 by default. For 64-bit, use LD_PRELOAD_64.
bash-3.00# cd ${WAS_PROFILE_BIN}
bash-3.00# ./stopServer.sh server1
bash-3.00# LD_PRELOAD_32=/usr/lib/libumem.so ./startServer.sh server1
You can use the pldd command to verify that the WAS process is indeed started with libumem. The output of pldd will report libumem in one of the lines.
bash-3.00# pldd <WAS_PID>
Now, while running some user loads, you can examine your new WAS process with libumem with the plockstat command again.
bash-3.00# plockstat -H -p <WAS_PID>
Mutex hold

Count     nsec Lock                         Caller
    1  1317400 libumem.so.1`umem_cache_lock libumem.so.1`umem_update_thread+0x298
    1   207200 libumem.so.1`vmem_nosleep_lock libumem.so.1`vmem_populate+0x204
    2   126950 0xfe2e1980                   libjvm.so`__1cCosRpd_suspend_thread6Fpn
    2   106550 0xfe2e1840                   libjvm.so`__1cCosRpd_suspend_thread6Fpn
    2   105800 0xfe2e18c0                   libjvm.so`__1cCosRpd_suspend_thread6Fpn
    2    87500 0xfe2e1880                   libjvm.so`__1cCosRpd_suspend_thread6Fpn
    2    35250 0x41840                      libumem.so.1`umem_cache_alloc+0x1f4
    1    31200 0x459c0                      libumem.so.1`umem_cache_alloc+0x1f4
    2    30850 0xfe2e18c0                   libjvm.so`__1cGThreadMdo_vm_resume6Mi_i
    2    28050 0x364a8                      libumem.so.1`vmem_alloc+0x188
    1    26300 0x4e340                      libumem.so.1`umem_cache_alloc+0x1f4
    2    25950 0x44380                      libumem.so.1`umem_cache_alloc+0x1f4
    1    25100 0x459c0                      libumem.so.1`umem_cache_alloc+0xdc
  169    24936 libumem.so.1`vmem0+0x30      libumem.so.1`vmem_alloc+0x1f4
    1    24700 0x4e8c0                      libumem.so.1`umem_cache_alloc+0x1f4
    1    24700 0x44340                      libumem.so.1`umem_cache_alloc+0x1f4
    2    23300 0x4eec0                      libumem.so.1`umem_cache_alloc+0x1f4
    1    23200 0x453c0                      libumem.so.1`umem_cache_alloc+0xdc
   13    22876 0x41940                      libumem.so.1`umem_cache_alloc+0xdc
    2    21850 0x45980                      libumem.so.1`umem_cache_alloc+0x1f4
    2    20200 0xfe2e1880                   libjvm.so`__1cGThreadMdo_vm_resume6Mi_i
    1    19600 0x46780                      libumem.so.1`umem_cache_free+0xfc
    2    19100 0x4a8c0                      libumem.so.1`umem_cache_alloc+0xdc
    1    17700 0x46340                      libumem.so.1`umem_cache_alloc+0xdc
    2    17700 0x448c0                      libumem.so.1`umem_cache_alloc+0xdc
    4    17625 0x467c0                      libumem.so.1`umem_cache_alloc+0x1f4
    1    17600 0x47940                      libumem.so.1`umem_cache_alloc+0xdc
    1    17600 0x453c0                      libumem.so.1`umem_cache_alloc+0xdc
As you see, you have gotten rid of the malloc lock contentions. Do this for other WAS instances and their java processes. This should improve your application performance and overall system efficiency. Using libumem, you can also gain performance for applications that have heavy dependency on socket communications.


Post a Comment:
Comments are closed for this entry.

Mostly pertaining to Cloud Computing, Application Infrastructure, Oracle Exastack, Exalogic, Solaris, Java and Sun servers for the enterprise!


« April 2014