Monday Mar 28, 2011

Configuring your Linux server to use largepages for JRockit

JRockit can use the largepages on Linux platform and it can yield better performance for memory/gc intensive application. However on platform like Solaris and with Hotspot the largepages are already enabled you just have to use them, while on Linux it would require following to get to use the LargePages:
Enable the LargePages in the kernel (using CONFIG_HUGETLB_PAGE and CONFIG_HUGETLBFS).
In most cases we don't have to do the above and we can check if its available for use:

# cat /proc/meminfo | grep Huge
HugePages_Total: 0
HugePages_Free: 0
HugePages_Rsvd: 0
Hugepagesize: 2048 kB
This tells us that this system support largepages of 2 MB in size

No we need to tell the kernel how much memory we need to reserve which can be done by:

# echo 4096 > /proc/sys/vm/nr_hugepages
This would reserve 8 GB of memory for largepages.

This change can be made permanent in /etc/sysctl.conf by adding followig line:
vm.nr_hugepages = 4096  

nodev            /mnt/hugepages        hugetlbfs    rw,auto,uid=9500,user,sync    0 0

This line in /etc/fstab will give the user with id 9500 r/w permission on /mnt/hugepages.
This can be activated without reboot:
# mount -a
Its all setup now and JRockit can be started with -XlargePages:exitOnFailure=true , after using this
flag if the vm start successfully then again doing
# cat /proc/meminfo | grep Huge
will tell you how many pages are taken away by your JRockit. If the VM doesn't start that would mean you might have missed some steps.

Tuesday Apr 14, 2009

Scaling up WebSphere on Sun Blade X6270 Server

As always I got chance to try my hands on one of the new blade server that Sun has released based on Intel Xeon processor 5500 series namely, Sun Blade X6270 server module. The new Xeon processor has lot of features to deliver better performance based on the workload by which it is driven eg. automatically increasing processor frequency when needed and possible, as well as the capability to have extra hardware threads or taking advantage of the processor's power and thermal headroom. I would suggest you to visit for more details from domain expert including power consumption to virtualization and everything else supplemented by best OS on the planet "Solaris 10/OpenSolaris". I will leave rest of the discussion to my distinguished colleagues at Sun and will talk about what I observed when running WebSphere Application Server on this system.

As most current version of WebSphere Application Server available was V7.0 so I picked up this installed then made sure I am running with the latest fixpacks and ran it. Everything was breeze from installation, fix-pack upgrade to start/stop and then benchmarking. The Sun Blade X6270 I had was built on 2 sockets of the Intel Xeon L5520 processor and the Solaris 10 10/08 Operating System.

As always I used the same workload that I use for my WebSphere tests and there are other components which included the Database Server which was running DB2, and the test Drivers. For benchmark I used Faban which is very easy to use and run and it does the bookkeeping of benchmark so nicely that I don't have to keep records myself. I don't constrain my DB in terms of performance so I let it run in "/tmp" to just make everything simple so I don't spend too much time on tuning DB as I am working on WebSphere App Server.

The Intel Xeon processor 5500 series has two modes for running namely, Turbo mode and Hyper-Threading mode options in the BIOS setup. Which can be changed during boot by going to BIOS menu and doing the right selection. As Intel has put up "Nehalem, unleashes parallel processing performance enabled by an integrated memory controller and IntelĀ® QuickPath Technology providing high-speed interconnects per independent processing core. The IntelĀ® Turbo Boost Technology taking advantage of the processor's power and thermal headroom. This enables increased performance of both multi-threaded and single-threaded workloads and with Intel HT Technology, bringing high-performance applications into mainstream computing with 1-16+ threads optimized for new generation multi-core processor architecture." I did tried to run in both modes did not notice any significant difference for my tests while running in these modes.

Single instance of server was able to use much of the system capacity and stopped scaling. I was running all the tools and nicstat was one able to point me the moment when the n/w bandwidth was exhausted it stopped taking any more load. Then to work around the network capacity I put a direct connection with the DB Server which resulted in little better throughput and little more system utilization. On the network communication between DB and Server was happening over this interface. Still not enough and server had capacity neither the thread-dumps or system stats pointed out any problem other than network again being saturated. In which the public interface was full loaded and there was a very light load on the point to point connection interface. At this point I don't had any choice other than adding another instance of the server. To do this I created a Solaris Zone/Container and had one instance running into this zone. For this zone I used the point to point connection between the DB and the blade server which eliminated this and I was able to use the server to full capacity.

The network generated interrupts and there distribution can be checked by using the following command:
-bash-3.00# echo ::interrupts | mdb -k | grep igb
49 0x60 6 MSI-X 0 1 - igb_intr_tx_other
50 0x61 6 MSI-X 0 1 - igb_intr_rx
51 0x62 6 MSI-X 4 1 - igb_intr_tx_other
52 0x63 6 MSI-X 7 1 - igb_intr_rx

I looked for igb only becuase my network drivers are as:
igb0 - public (ie. going over GB switch)
igb1 - private (point-to-point)
From the above it is clear that the tx and rx interrupts are going over virtual cpu or more technically threads 0, 4 and 7 which is very well distributed.

So what did I do for tuning the WebSphere ?
My tunings remains same as I was not changing the Operating System and Software which includes Java. I will list them here for reference:

Disabled the Performance Monitoring Infrastructure
Disabled the Application Profiling
I used the DynaCache for the App just as to save some DB overhead
Set the WebContainer thread to the same number of threads as load driver
For Connection Pool I made sure that the number of DB connections are more than my WebContainer thread pool so none of the thread end up waiting for DB connections.
initialHeapSize="2048" maximumHeapSize="2048" -server -Xmn1024m -XX:+AggressiveOpts -XX:+UseParallelGC -XX:ParallelGCThreads=8 -XX:+UseParallelOldGC

Before I conclude I would like you to know that the only 64-bit of WebSphere is available on this platform so if your application happen to be memory hungry you are not limited by a 4GB address space and as you notice from David Dagastine blog that there are lot of things happening with 64-bit JVM and how it is getting optimized compared to 32-bit variant. But you may have to wait for the appropriate fixpacks to get to that. I will update as soon as it becomes available.
There are lot of other advantages of this server and I would encourage you to read more about this @




« July 2016