Scaling up WebSphere on Sun Blade X6270 Server
By dkumar on Apr 14, 2009
As always I got chance to try my hands on one of the new blade server that Sun has released based on Intel Xeon processor 5500 series namely, Sun Blade X6270 server module. The new Xeon processor has lot of features to deliver better performance based on the workload by which it is driven eg. automatically increasing processor frequency when needed and possible, as well as the capability to have extra hardware threads or taking advantage of the processor's power and thermal headroom. I would suggest you to visit http://blogs.sun.com for more details from domain expert including power consumption to virtualization and everything else supplemented by best OS on the planet "Solaris 10/OpenSolaris". I will leave rest of the discussion to my distinguished colleagues at Sun and will talk about what I observed when running WebSphere Application Server on this system.
As most current version of WebSphere Application Server available was V7.0 so I picked up this installed then made sure I am running with the latest fixpacks and ran it. Everything was breeze from installation, fix-pack upgrade to start/stop and then benchmarking. The Sun Blade X6270 I had was built on 2 sockets of the Intel Xeon L5520 processor and the Solaris 10 10/08 Operating System.
As always I used the same workload that I use for my WebSphere tests and there are other components which included the Database Server which was running DB2, and the test Drivers. For benchmark I used Faban which is very easy to use and run and it does the bookkeeping of benchmark so nicely that I don't have to keep records myself. I don't constrain my DB in terms of performance so I let it run in "/tmp" to just make everything simple so I don't spend too much time on tuning DB as I am working on WebSphere App Server.
The Intel Xeon processor 5500 series has two modes for running namely, Turbo mode and Hyper-Threading mode options in the BIOS setup. Which can be changed during boot by going to BIOS menu and doing the right selection. As Intel has put up "Nehalem, unleashes parallel processing performance enabled by an integrated memory controller and Intel® QuickPath Technology providing high-speed interconnects per independent processing core. The Intel® Turbo Boost Technology taking advantage of the processor's power and thermal headroom. This enables increased performance of both multi-threaded and single-threaded workloads and with Intel HT Technology, bringing high-performance applications into mainstream computing with 1-16+ threads optimized for new generation multi-core processor architecture." I did tried to run in both modes did not notice any significant difference for my tests while running in these modes.
Single instance of server was able to use much of the system capacity and stopped scaling. I was running all the tools and nicstat was one able to point me the moment when the n/w bandwidth was exhausted it stopped taking any more load. Then to work around the network capacity I put a direct connection with the DB Server which resulted in little better throughput and little more system utilization. On the network communication between DB and Server was happening over this interface. Still not enough and server had capacity neither the thread-dumps or system stats pointed out any problem other than network again being saturated. In which the public interface was full loaded and there was a very light load on the point to point connection interface. At this point I don't had any choice other than adding another instance of the server. To do this I created a Solaris Zone/Container and had one instance running into this zone. For this zone I used the point to point connection between the DB and the blade server which eliminated this and I was able to use the server to full capacity.
The network generated interrupts and there distribution can be checked by using the following command:
-bash-3.00# echo ::interrupts | mdb -k | grep igb
49 0x60 6 MSI-X 0 1 - igb_intr_tx_other
50 0x61 6 MSI-X 0 1 - igb_intr_rx
51 0x62 6 MSI-X 4 1 - igb_intr_tx_other
52 0x63 6 MSI-X 7 1 - igb_intr_rx
I looked for igb only becuase my network drivers are as:
igb0 - public (ie. going over GB switch)
igb1 - private (point-to-point)
From the above it is clear that the tx and rx interrupts are going over virtual cpu or more technically threads 0, 4 and 7 which is very well distributed.
So what did I do for tuning the WebSphere ?
My tunings remains same as I was not changing the Operating System and Software which includes Java. I will list them here for reference:
Disabled the Performance Monitoring InfrastructureBefore I conclude I would like you to know that the only 64-bit of WebSphere is available on this platform so if your application happen to be memory hungry you are not limited by a 4GB address space and as you notice from David Dagastine blog that there are lot of things happening with 64-bit JVM and how it is getting optimized compared to 32-bit variant. But you may have to wait for the appropriate fixpacks to get to that. I will update as soon as it becomes available.
Disabled the Application Profiling
I used the DynaCache for the App just as to save some DB overhead
Set the WebContainer thread to the same number of threads as load driver
For Connection Pool I made sure that the number of DB connections are more than my WebContainer thread pool so none of the thread end up waiting for DB connections.
initialHeapSize="2048" maximumHeapSize="2048" -server -Xmn1024m -XX:+AggressiveOpts -XX:+UseParallelGC -XX:ParallelGCThreads=8 -XX:+UseParallelOldGC
There are lot of other advantages of this server and I would encourage you to read more about this @ http://blogs.sun.com.