Wednesday Mar 08, 2006

JAX-WS: Capturing SOAP messages at the client side

One of the usual steps involved in the debugging of Web Services applications is to inspect the request and response SOAP messages. An easy way to do this is to use a proxy that sits between the client and the server. tcpmon, the TCP connection monitoring tool is the one that I use.

However, in certain situations, you may want to capture and log the SOAP message within the client process itself. Recently I had to deal with such a requirement for JAX-WS 2.0 and it turned out that it was quite easy to do this using the Handler framework. I added a logging handler to capture the request and response messages. The code snippet is given below.

TestServicePortType proxy = new TestService().getTestServicePort();
BindingProvider bp = (BindingProvider) proxy;
bp.getRequestContext().put(BindingProvider.ENDPOINT_ADDRESS_PROPERTY, urlStr);
Binding binding = bp.getBinding();

// Add the logging handler
List handlerList = binding.getHandlerChain();
if (handlerList == null)
   handlerList = new ArrayList();
LoggingHandler loggingHandler = new LoggingHandler(...);


// Invoking a method will activate the handler.


// Remove the logging handler if done with it.
List handlerList = binding.getHandlerChain();
if (handlerList != null && loggingHandler != null)

public class LoggingHandler implements SOAPHandler<SOAPMessageContext> {

// Initialize OtputStream (fos) etc. ...

public boolean handleMessage (SOAPMessageContext c) {
   SOAPMessage msg = c.getMessage();

   boolean request = ((Boolean) c.get(SOAPMessageContext.MESSAGE_OUTBOUND_PROPERTY)).booleanValue();

   try {
      if (request) { // This is a request message.
         // Write the message to the output stream
         msg.writeTo (fos);
      else { // This is the response message
         msg.writeTo (fos);
   catch (Exception e) { ... }
   return true;

public boolean handleFault (SOAPMessageContext c) {
   SOAPMessage msg = c.getMessage();
   try {
      msg.writeTo (fos);
   catch (Exception e) {...}
   return true;

public void close (MessageContext c) {

public Set getHeaders() {
   // Not required for logging
   return null;

There is another implementation specific way of doing this - using the _setTransportFactory method. The handler approach is the more preferable option since it does not have any implementation dependencies.

Wednesday Feb 22, 2006

Java Tunings for Server Side XML Processing

The XML parser benchmark that we released, XMLTest is designed as a server side benchmark and appropriate Java tunings should be applied to obtain optimal performance. This is especially true if you use this benchmark to compare the performance of different parsers.

At the minimum, use the following Java options when you run these tests on 32-bit Windows: -server -Xmx -XX:+UseParallelGC

Based on J2SE 5.0 ergonomics, the default runtime compiler for Windows on the i586 platform is the client compiler, even for server class machines (one with 2 CPUs and at least 2GB of physical memory). In case you are wondering why, the 'Ergonomics in the 5.0 Java Virtual Machine' document cites the following reason - This choice was made because historically client applications (i.e., interactive applications) were run more often on this combination of platform and operating system. Since XMLTest is a server side benchmark, we need to enable the server runtime compiler using the -server option.

Garbage Collector Ergonomics describes the various default garbage collection options selected by the VM. Since parallel GC is not the default, add -XX:+UseParallelGC option to turn it on. By default, the max heap size is selected as the smaller of 1/4th of the physical memory or 1GB. It is important to understand these limits and tune them appropriately, especially for the parsing on large DOM documents.

Wednesday Feb 01, 2006

Performance Analysis Tools

Availability of good profiling and monitoring tools are essential for performance analysis and I thought I'll blog today about some of the tools that I use.

Sun offers a couple of awesome free tools that developers can use for performance analysis - The Sun Studio Performance Analyzer that is part of Sun Studio 11 (yes, it is available for use for free) and the Netbeans Profiler.

Sun Studio Performance Analyzer
Collector/Analyzer within Sun Studio is my primary profiler to identify performance bottlenecks and synchronization issues in Java applications. Profiling using Collector/Analyzer is a two step process. The profile data is first collected using the 'collect' command (usage: collect collect-options program program-arguments). The data can then be analyzed using the graphical data-analysis tool, 'analyzer'. The analyzer can show function, caller-callee, source, and disassembly data, as well as a TimeLine showing the profiling events in time-sequence. A sample screen shot is shown below. 'Exclusive time' (first column in the output) is the time spent in a method, excluding time spent calling other methods and 'Inclusive time' (second column) is the time spent in a method including time spent in other methods. A good way to start is by focusing on methods with high exclusive time values.

Analyzer screen shot

For more details, refer to the detailed documentation. Collector/Analyzer is supported on both Solaris and Linux. If you do performance work on Solaris or Linux, download it and check it out.

Netbeans Profiler
I first tried the Netbeans Profiler about a year or so ago. It was fairly complicated to set up (it required a customized JVM at that time), had some stability issues and I could not get it to work consistently. Recently, I tried the latest version, Netbean Profiler 5.0 RC2 on JDK 6.0. Wow!!! I was blown away, it is so slick. All you need to do to get started is to add an argument (-agentpath:path_to_profiler_lib/deployed/jdk15/solaris-sparc/,5140) to your Java startup script. There is an 'Attach Wizard' that walks you through it. It was so easy that I was up and running within a few minutes.

The best part I liked about the NB Profiler was the fact that you can profile just a selected part of your application. This allows you to concentrate your performance analysis to a section or module within your application rather than profiling the entire application. This functionality came in very handy when I wanted to look at just the XML parsing performance of an application where the XML processing was only a small piece of the entire application (it accounted for about 20% of the performance). This 'module only' profiling can be easily done by selecting the 'Part of Application' button in the Analyze Performance section. If you use this mode, you will need to select the starting root methods (entry points into the module that you are interested in profiling). Since I had already collected profiles using the Collector/Analyzer, I could easily find this information from the stack traces that Analyzer provided.

A screen shot from the NB Profiler is given below. The Hot Spots tab shows you both the method's self time, the time spent in all invocations of this method (excluding further method calls), and the number of invocations. The number of invocations allows you to identify the most frequently called methods and improving the performance of some of these methods methods may improve the performance of the application significantly.

Analyzer screen shot

NB Profiler can also do memory profiling. I plan to write about this in more detail some other time.

JVM Monitoring Tools
Since JDK 5.0, the JVM bundles several monitoring and management tools. The one that I use regularly is jstat . jstat provides a wide variety of statistics including the summary of garbage collection statistics (option: gcutil) and Hotspot compilation method statistics (option: printcompilation). The printcompilation option is very useful to identify how long the warmup period of your benchmark needs to be. For server side benchmarks, it is good practice to ensure that all the compilation has been completed before performance data is collected during the steady state period of the benchmark.

JVMstat is another excellent JVM performance monitoring tool. It includes visualgc, a visual garbage collection monitoring tool (see screen shot below). visualgc is also available as a module within Netbeans.

Analyzer screen shot

Solaris System Tools - UltraSPARC T1
Most of you are familiar with Solaris system tools like vmstat and mpstat to monitor system utilization. However, you have to be a little careful while interpreting these results if you run the tests on an UltraSPARC T1 based system (T1000/T2000). Ravi Talashikar has written an excellent blog titled UltraSPARC T1 utilization explained, that explains this in great detail.

Friday Jan 20, 2006

Starting off

Welcome to my blog!!!!

Since this is my first entry, a short introduction is in order. I am a member of the Java Performance Engineering Team focusing mainly on the performance analysis of Sun Java System Application Server and Java APIs for XML and Web Services. This is a great time to work in the performance team since I get to work on one of the coolest and best performing hardware platforms on the planet, the UltraSPARC T1 based Sun Fire T2000 server.  Richard McDougall's blog has tons of great information about CMT systems.

Over the last couple of months, I have done quite a bit of performance work on these servers, working with several products using a variety of micro and macro benchmarks. Here are some of my observations -

1. The UltraSPARC T1 based systems are ideally suited for multi-threaded applications. We have obtained some spectacular performance numbers for SPECjbb2005, SPECjappserver2004, SPECweb2005, Web Services and XML. Several of my colleagues - Dave Dagastine, Brian Doherty, Scott Oaks , Bharath Mundlapudi and Kim Lichong have blogged about these in great detail. Be sure to read their blogs.

2. Make sure that you use the right benchmark to measure the performance of CMT systems. Remember, UltraSPARC T1 based systems are designed for high throughput, not for single threaded performance. Simple micro benchmarks that measure the response time of an operation for a single user are useful for analyzing the performance a module. However, these benchmarks are not useful for measuring the overall performance of a system since it does not mimic the real world scenario of multiple concurrent users accessing the application simultaneously. I have seen several instances of people complaining about the performance of a system based on results obtained from an inappropriate benchmark. So be sure to use throughput based benchmarks using large enough loads when you are evaluating CMT systems.

3. The UltraSPARC T1 system supports upto 32 hardware threads and you need a fairly scalable application to use it effectively. Some applications run into lock contention issues and does not scale past a certain number of cores. One easy work around to this problem is to run multiple instances of the application. This solved the problem for all the cases that I worked on. However, it would be a useful exercise to identify the scalability issue and fixing it.




« April 2014