Getting past GO with Sparc CMT

The Sparc T1/T2 chip is a lean mean throughput machine. Load on DB users, application servers, JVMs, ect and this chip begins to hum. While the benchmark proof points are many, there still seem to be mis-conceptions about the performance of this chip.

I have ran across several performance evaluations lately where the T2000 was not being utilized to its full potential. The story goes like so...

System Administrators First impressions

Installing SW and configuring Solaris seems a little slow compared to the V490's we have sitting in the corner. But, this is not a show stopper - just an observation. The system admin presses on and preps the machine for the DBA.

DBAs First Impressions

After the OS is installed, the machine is turned over to the DBAs to install and configure Oracle. The DBA notices that, compared to the v490, Oracle installation is taking about twice as long. They continue to configure and begin loading an export file from the production machine. Again, this too is slower than the v490. Thinking something is wrong with the OS/HW, the DBA now contacts the system administrator.

Fanning the fire

At this point, the DBA and system admin have an "ah-ha" moment and begin to speculate that something is awry. The system admin "times" some simple unix commands. "tar", "gzip", "sort", ect... all seem slower on the T2000. The DBA does a few simple queries... again slower. What gives? Google fingers are warmed up, blogs are consulted, best practices are applied, and the results are unchanged.

Throughput requires more than one thing

The DBA and System admin have fallen into a the trap of not testing the \*real\* application. In the process of setting up the environment, the single-threaded jobs to install and configure the SW and load the database are slower. But, that is not the application. The real application, is a on-line store with several hundred users running concurrently. Now we are getting somewhere.

Throughput, Throughput, Throughput!

Finally, the real testing begins. Load generators are broken out to simulate the hundreds of users. After loading up the system, it is found that the T2000 DB server can handle about 2x the number of Orders/Sec than the V490! Wait a minute. "gzip" is slower but this little chip can process 2x the Orders? That's what CMT is all about... throughput, throughput, throughput!
Comments:

Glenn,

What is the reason that gzip is slower but the applications are faster? is it that the CPU speed is less (leading to individual programs running slower) but the CMT allows (for example) OLTP type of loads to run faster by better "exploiting" the task switches that occur due to the nature of the application (meaning something like request disk IO, wait, process data, request disk IO, wait, process...)?

Thanks,
Naresh

Posted by Naresh on August 16, 2007 at 01:50 PM PDT #

To optimize throughput and power efficiency, a simple core and cache was chosen for the T1 processor. The T1's strength is in being able to use previously unused memory cycles while not focusing on single-threaded performance. So while a 3GHz AMD chip might be able to do ONE thing fast, it can't keep up with the T1 when it comes to doing 32 things concurrently.

Here are several references which may help better explain the treading and performance of the T1 processor:

http://www.sun.com/processors/UltraSPARC-T1/
http://www.rz.rwth-aachen.de/computing/events/2006/sunhpc_2006/03_Tirumalai.pdf

Posted by Glenn Fawcett on August 16, 2007 at 03:21 PM PDT #

[Trackback] I’ll try not to make a habit of referencing a comment on my blog as subject matter for a new post, but this one is worth it. One of my blog readers posted this comment. The question posed was what performance effect there would be with DSS or OLTP wi...

Posted by Kevin Closson's Oracle Blog: Platform, Storage & Clustering Topics Related to Oracle Databases on August 27, 2007 at 11:40 AM PDT #

Hi Glenn,

Having tested the T2000 (8 core with 32GB RAM) extensively, all I can say is that there definitely should be better guidelines from Sun as to it's deployment. The T2000 is a groundbreaking chip, however it is not a one-size fit's all like it is advertised (to CIO's and such).

Throughput is important for an application serving ten's of thousand's of customers, however in many cases, there are not so many users.

All the tests I have done (real-world) show that the UltraSparc IV+ is a hell of a lot faster and scales quites well. I would rather prefer to have stellar performance for 500 users with degrading performance towards 750 users rather than poor (but acceptable) performance for all users (be it 100 or 500 or 750).

Thanks
Krishna

Posted by Krishna Manoharan on November 08, 2007 at 11:35 AM PST #

Post a Comment:
Comments are closed for this entry.
About

This blog discusses performance topics as running on Sun servers. The main focus is in database performance and architecture but other topics can and will creep in.

Search

Archives
« April 2014
SunMonTueWedThuFriSat
  
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
   
       
Today
News

No bookmarks in folder

Blogroll

No bookmarks in folder