Sizing CoolThreads Servers

The Sun Fire T1000/T2000 (aka "CoolThreads") server offers a lot of horsepower in a single chip: up to eight cores running at either 1000MHz or 1200MHz, each core with four hardware threads. But how should this SMP-in-a-chip be sized appropriately for real-world applications?

The published benchmarks show that the application throughput delivered by a single T2000 server is equivalent to the throughput delivered by multiple Xeon systems. And this isn't just marketing hype, either; the UltraSPARC T1 processor is a genuine breakthrough technology. But what are the practical considerations involved in replacing several Xeon servers with a single T1000 or T2000?

Preparing for CoolThreads

For starters, it's important to understand the design point of the UltraSPARC T1. If you need blazing single-thread performance, this isn't the system for you - the chip simply wasn't designed that way. And if you think that's bad, then I'm sorry to say your future is looking a little bleak. Every processor designer in the industry is moving to multiple cores, and one implication is that single thread performance will no longer be getting all the attention. Performance will be served up in smaller packages.

The UltraSPARC T1 is a chip oriented for throughput computing. With the multi-threading capablities of this chip Sun has done two things. The first is to push the envelope much further than anyone else anticipated. Not everyone will applaud this strategy, of course. (And just for fun, note the reactions carefully, and deduct points from competitors who bad-mouth Sun's strategy now, and later end up copying it!) More importantly, though, Sun has issued notice about the way applications need to be designed. In a world that increasingly delivers CPU power through multiple cores and threads, single-threaded applications don't make a whole lot of sense any more. The sooner you multi-thread your applications, the better off you'll be, regardless of your hardware vendor of choice.

That doesn't mean you'll be forced to rearchitect your applications before you can use the T1000/T2000, though. You can proceed provided your planned deployment has one or more of the following characteristics, any of which will allow it to take advantage of UltraSPARC T1's multiple cores and threads:

  • Multiple applications
  • Multiple user processes
  • Multi-threaded applications
  • Multi-process applications

In general, commercial software that runs well on SMP (Symmetric Multi-Processor) systems, will run well on T1000/T2000 (because one or more of the above already apply). Note that the Java JVM is already multi-threaded.

When to Walk Away

The other major consideration is floating point performance. The UltraSPARC T1 is not designed for floating-point intensive applications. This isn't as disastrous as it might sound. It turns out that a vast range of commercial applications, from ERP software like SAP through Java application servers, do very little floating point and run just fine on the T1000/T2000. If you're in any doubt about how to figure out the proportion of floating point instructions in your application, help is on the way. More on this in a future blog.


If you made it past the single-threaded and floating point questions, you're ready for some serious sizing. The first step is to see how busy your current servers are. Suppose you plan to consolidate applications from six Xeon servers onto a Sun Fire T2000 server. If the CPUs on each system are typically 30% busy and peak at 50%, then you will be migrating a peak load equivalent to three fully-utilized servers.

By far the best way to test the relative performance of the T1000/T2000 and your current servers is to run your own application on both. If that isn't possible, a crude starting point might be to compare published performance on a real-world workload. Check out the published T1000/T2000 benchmarks for further information. If you can't directly compare your intended applications, try to find something as close as possible (e.g. the CPU, network, and storage I/O resource usage should look at least vaguely similar to your actual workload). Benchmarks that use real ISV application code (e.g. SAP and Oracle Applications) are going to be more relevant to a throughput platform like the T1000/T2000 than artificial benchmarks designed to measure the performance of a traditional CPU. One important warning: don't try to draw final conclusions if you're not comparing the same application on both platforms! Extrapolations don't work well when the technologies are radically different (and the UltraSPARC T1 is simply different to anything else out there).

The next step is to figure out how to deploy the applications. You have four, six, or eight cores at your disposal (depending on the T1000/T2000 platform you've chosen). Should you simply let Solaris worry about the scheduling? Or should you figure out your resource management priorities in advance and carve up the available resources before deploying the applications? You might want to refer to my blog about Consolidating Applications onto a CoolThreads Server for more information on this topic.

Once you're ready to deploy, make sure you do some serious load testing before going live. Don't make the mistake of rushing into production without first finding out how well your application scales on the T1000/T2000 platform. I don't know about you, but I hate nasty surprises! And if you do encounter scaling issues, don't forget that Solaris 10 Dtrace is your friend. And check out DProfile, too.

Once you get your head around this technology, you're going to enjoy it! And that's even without mentioning the power, cooling, and rack space savings...


PS. If you're looking for more CoolThreads info direct from Sun engineers, Richard McDougall has put together an excellent overview of other relevant blogs.


Hi Allan, The niagra servers sound excellent and seem reasonably priced, too. I could do with a couple for some 'grid' dev/testing. Chs, Damian

Posted by Damian Guy on December 07, 2005 at 01:06 AM PST #

Question: If the way to scale-up throughput is through parallelism, what's does a Niagra server give me over multiple PC blade servers?

Posted by guest on December 11, 2005 at 06:08 PM PST #

Reply to A couple of thoughts:
  • Simpler system administration, since you only have one OS image.
  • Better scalability on apps that aren't perfectly parallelized. Some apps can be carved up into entirely independent chunks (SETI is an example). But many apps require some amount of inter-process or inter-thread communication, and scalability for such apps is typically better if you don't have to go outside the box (latencies are worse on an external network or switch).
  • Power consumption, cooling requirements, rack space requirements, and cost of ownership will all be lower as well.

HTH, Allan

Posted by allanp on December 12, 2005 at 07:35 AM PST #

Allan, Thanks for the info. I'm hoping to determine if our application will run well on the T2000, but I'm worried about it bottlenecking on the FPU. Above you mention that a future blog will have help for determining how much FP an app does. Has that info been published?

Posted by Pete on January 11, 2006 at 09:12 AM PST #

Hi Pete,

We have a tool that helps to determine how much FP an app does. Right now it's only available for Sun internal use, but the process of making it externally available is underway.

In the meantime, you might like to check Darryl Goves' blog titled Measuring floating point use in applications. It also points to a previous blog with a pointer to another tool.

HTH, Allan

Posted by allanp on February 05, 2006 at 09:58 AM PST #

Further to my previous reply to Pete, the FP detection tool is now available for multiple platforms, including Linux. Check it out at

There are other tools becoming available, too. Go to for more info.


Posted by allanp on March 02, 2006 at 08:08 AM PST #

Post a Comment:
  • HTML Syntax: NOT allowed

I'm a Principal Engineer in the Performance Technologies group at Sun. My current role is team lead for the MySQL Performance & Scalability Project.


« July 2016