Sizing CoolThreads Servers
By allanp on Dec 06, 2005
The Sun Fire T1000/T2000 (aka "CoolThreads") server offers a lot of horsepower in a single chip: up to eight cores running at either 1000MHz or 1200MHz, each core with four hardware threads. But how should this SMP-in-a-chip be sized appropriately for real-world applications?
The published benchmarks show that the application throughput delivered by a single T2000 server is equivalent to the throughput delivered by multiple Xeon systems. And this isn't just marketing hype, either; the UltraSPARC T1 processor is a genuine breakthrough technology. But what are the practical considerations involved in replacing several Xeon servers with a single T1000 or T2000?
Preparing for CoolThreads
For starters, it's important to understand the design point of the UltraSPARC T1. If you need blazing single-thread performance, this isn't the system for you - the chip simply wasn't designed that way. And if you think that's bad, then I'm sorry to say your future is looking a little bleak. Every processor designer in the industry is moving to multiple cores, and one implication is that single thread performance will no longer be getting all the attention. Performance will be served up in smaller packages.
The UltraSPARC T1 is a chip oriented for throughput computing. With the multi-threading capablities of this chip Sun has done two things. The first is to push the envelope much further than anyone else anticipated. Not everyone will applaud this strategy, of course. (And just for fun, note the reactions carefully, and deduct points from competitors who bad-mouth Sun's strategy now, and later end up copying it!) More importantly, though, Sun has issued notice about the way applications need to be designed. In a world that increasingly delivers CPU power through multiple cores and threads, single-threaded applications don't make a whole lot of sense any more. The sooner you multi-thread your applications, the better off you'll be, regardless of your hardware vendor of choice.
That doesn't mean you'll be forced to rearchitect your applications before you can use the T1000/T2000, though. You can proceed provided your planned deployment has one or more of the following characteristics, any of which will allow it to take advantage of UltraSPARC T1's multiple cores and threads:
- Multiple applications
- Multiple user processes
- Multi-threaded applications
- Multi-process applications
In general, commercial software that runs well on SMP (Symmetric Multi-Processor) systems, will run well on T1000/T2000 (because one or more of the above already apply). Note that the Java JVM is already multi-threaded.
When to Walk Away
The other major consideration is floating point performance. The UltraSPARC T1 is not designed for floating-point intensive applications. This isn't as disastrous as it might sound. It turns out that a vast range of commercial applications, from ERP software like SAP through Java application servers, do very little floating point and run just fine on the T1000/T2000. If you're in any doubt about how to figure out the proportion of floating point instructions in your application, help is on the way. More on this in a future blog.
If you made it past the single-threaded and floating point questions, you're ready for some serious sizing. The first step is to see how busy your current servers are. Suppose you plan to consolidate applications from six Xeon servers onto a Sun Fire T2000 server. If the CPUs on each system are typically 30% busy and peak at 50%, then you will be migrating a peak load equivalent to three fully-utilized servers.
By far the best way to test the relative performance of the T1000/T2000 and your current servers is to run your own application on both. If that isn't possible, a crude starting point might be to compare published performance on a real-world workload. Check out the published T1000/T2000 benchmarks for further information. If you can't directly compare your intended applications, try to find something as close as possible (e.g. the CPU, network, and storage I/O resource usage should look at least vaguely similar to your actual workload). Benchmarks that use real ISV application code (e.g. SAP and Oracle Applications) are going to be more relevant to a throughput platform like the T1000/T2000 than artificial benchmarks designed to measure the performance of a traditional CPU. One important warning: don't try to draw final conclusions if you're not comparing the same application on both platforms! Extrapolations don't work well when the technologies are radically different (and the UltraSPARC T1 is simply different to anything else out there).
The next step is to figure out how to deploy the applications. You have four, six, or eight cores at your disposal (depending on the T1000/T2000 platform you've chosen). Should you simply let Solaris worry about the scheduling? Or should you figure out your resource management priorities in advance and carve up the available resources before deploying the applications? You might want to refer to my blog about Consolidating Applications onto a CoolThreads Server for more information on this topic.
Once you're ready to deploy, make sure you do some serious load testing before going live. Don't make the mistake of rushing into production without first finding out how well your application scales on the T1000/T2000 platform. I don't know about you, but I hate nasty surprises! And if you do encounter scaling issues, don't forget that Solaris 10 Dtrace is your friend. And check out DProfile, too.
Once you get your head around this technology, you're going to enjoy it! And that's even without mentioning the power, cooling, and rack space savings...
PS. If you're looking for more CoolThreads info direct from Sun engineers, Richard McDougall has put together an excellent overview of other relevant blogs.