Lesons learned from T1



We created UltrSPARC T1 and launched it into the world in November 2005. Adoption was slow at the start as CMT was so different. We spent many hours explaining patiently how it worked to customers, partners etc. We also did Proof of Concepts (POCs) with many of Sun's major customers around the world. As we progressed and the product ramped we gathered a body of knowledge on Applications, how they worked on CMT and how to tune them. We also wrote whitepapers available at http://www.sun.com/blueprints and we created tools that we posted at http://www.opensparc.net/cool-tools.html

Now here we are two years later about to launch UltraSPARC T2. We have learned many lessons along the way.

1. We have much more experience in helping customers to migrate to Solaris

Migrations involve a number of steps:


  • Recertifying customers' application stacks on Solaris 10
  • Identifying legacy applications that cannot be moved to Solaris 10

To help the deployment of Legacy applications we have developed the Solaris 8 Migration Assistant, internal Sun codename Etude http://blogs.sun.com/dp/entry/project_etude_revealed Etude allows a user to run a Solaris 8 application inside of a BrandZ zone in Solaris 10



2. Customers often evaluate CMT performance with a single threaded application.

Many times a customer has a standard benchmark that they have used for years to evaluate all hardware. They have collected a body of results that they use to rank servers. These tests are often single threaded and viewed as a “power” test of the server. The customer feels such as test has the effect of “leveling” the performance playing field. This test is often the first door that needs to be passed to enable further evaluation.

I have received many mails from folks that start “the performance on the T2000 was very poor, it was 50% of a v440”. My first question is always is the CMT server 96% idle at the time of the test. If so this usually points to a single threaded test.

Single thread performance is not the design point for CMT. The pipeline is simple and designed above all else to be shared thus masking the memory stall of threads. This design leads to its extreme low power and its extremely high throughput. In UltraSPARC T2 we have continued this focus. There are now twice the integer pipelines (16in total) which doubles the throughput of the chip. We have added a couple of more pipeline stages and increased the size of some of the caches which increases single thread performance about 25%.

With CMT we need to ask the question does this test really reflect the true nature of the customers workload. If the customer truly needs single thread performance then CMT is not for them. In many situations, however, the reality is they really require throughput. As many customers have found in a throughput environment CMT is a far superior architecture.



3. The 1/4 frequency argument is a common misconception of CMT performance

The ¼ Frequency argument goes as follows:


  • A CMT pipeline runs at say 1.2GHz and has 4 threads sharing it
  • Therefore each thread only gets 1/4 the cycles and runs 300MHz
  • This makes it less performant than an old US II chip

This line of argument doesn't hold because most commercial code chases pointers and is constantly loading data structures. On average a commercial application stalls every 100 instructions for a variety of reasons such as TLB miss, I cache miss, Level 2 cache miss etc. When a thread stalls it is usually delayed for many cycles, an Icache miss for instance is 23 cycles. So even though a thread is running at 1.2GHz it usually spends 70% of its time stalled. This is why major processor manufacturers create ever deeper out-of-order pipelines in an effort to avoid this stall.

All this stalling is perfect for CMT. The hardware automatically switches out a thread when it stalls and shares its cycles amongst the other 3 threads on the pipeline masking the stall. With this technique we can utilize the pipeline 75% - 80 of the time provided there are enough threads to absorb the stall



4. Most Commercial applications have little or no floating point instructions.

As part of the UltraSPARC T1 rollout we introduced a tool called http://cooltools.sunsource.net/cooltst/index.html that gives an indication of the percentage of floating point in a current deployment environment. In reviewing the output from many hundreds of customer cooltst runs we rarely see a large floating point indication.

"Most" commercial application developers would rather not deal with floating point as it is more difficult to program. There are of course exceptions in the commercial space such as SAS and portions of the SAP stack. One big exception is Wall Street with such applications as Monte Carlo.

In UltraSPARC T2 we added a fully pipelined floating point unit per core shared by 8 threads. These FP units can deliver over 11 Giga Flops of Floating Point performance per second. So the Floating Point issue has been completely eliminated in T2.



5. One of the biggest gains in Java apps is moving to 1.5 or 1.6 JVM

Many Java applications today are running on JVM 1.4.2. The last build of this version was created in Dec 2003 and it has now officially entered the Sun EOL transition period described at
http://java.sun.com/j2se/1.4.2/.

One of the best ways to improve throughput performance of a Java Application on CMT servers is to upgrade the JVM to at least the latest 1.5 JVM or preferably 1.6. These versions of JVM have a host of new features and performance optimizations many targeted specifically at CMT. I have seen 15% - 30% increase in performance of Java applications wen migrated from 1.42 to 1.6

The issue complicated slightly by older versions of ISV software that are only supported on 1.4.2. Again we encourage customers to migrate to the newer version of the ISV stack.



6. There are a set of Applications where CMT really shines

We have worked with over 200 customers in the last 2 years and the following list covers a large portion of the applications where CMT showed excellent performance.

Webservers.

  • Sunone
  • Apache

J2SE Appservers.

  • BEA Weblogic
  • IBM Websphere
  • Glassfish (formerly SunOne Appserver)

Database Servers.

  • Oracle including RAC
  • DB2
  • Sybase
  • mySQL

Mail Servers.


  • Sendmail
  • Domino
  • Brightmail including Spamguard

Java Throughput Applications

Traditional Appservers.


  • Siebel
  • Peoplesoft

Net Backup

JES Stack


  • Directory, Portal, Access Manager





Comments:

Post a Comment:
Comments are closed for this entry.
About

denissheahan

Search

Categories
Archives
« April 2014
SunMonTueWedThuFriSat
  
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
   
       
Today
Bookmarks