By binu on Jan 20, 2006
Welcome to my blog!!!!
Since this is my first entry, a short introduction is in order. I am a member of the Java Performance Engineering Team focusing mainly on the performance analysis of Sun Java System Application Server and Java APIs for XML and Web Services. This is a great time to work in the performance team since I get to work on one of the coolest and best performing hardware platforms on the planet, the UltraSPARC T1 based Sun Fire T2000 server. Richard McDougall's blog has tons of great information about CMT systems.
Over the last couple of months, I have done quite a bit of performance work on these servers, working with several products using a variety of micro and macro benchmarks. Here are some of my observations -
1. The UltraSPARC T1 based systems are ideally suited for multi-threaded applications. We have obtained some spectacular performance numbers for SPECjbb2005, SPECjappserver2004, SPECweb2005, Web Services and XML. Several of my colleagues - Dave Dagastine, Brian Doherty, Scott Oaks , Bharath Mundlapudi and Kim Lichong have blogged about these in great detail. Be sure to read their blogs.
2. Make sure that you use the right benchmark to measure the performance of CMT systems. Remember, UltraSPARC T1 based systems are designed for high throughput, not for single threaded performance. Simple micro benchmarks that measure the response time of an operation for a single user are useful for analyzing the performance a module. However, these benchmarks are not useful for measuring the overall performance of a system since it does not mimic the real world scenario of multiple concurrent users accessing the application simultaneously. I have seen several instances of people complaining about the performance of a system based on results obtained from an inappropriate benchmark. So be sure to use throughput based benchmarks using large enough loads when you are evaluating CMT systems.
3. The UltraSPARC T1 system supports upto 32 hardware threads and you need a fairly scalable application to use it effectively. Some applications run into lock contention issues and does not scale past a certain number of cores. One easy work around to this problem is to run multiple instances of the application. This solved the problem for all the cases that I worked on. However, it would be a useful exercise to identify the scalability issue and fixing it.