Sizing UltraSPARC T2 Servers

The newly released Sun SPARC Enterprise T5120, T5220 servers and the Sun Blade T6320 present an interesting challenge to the end user: how do you deploy a system that looks like an entry level server (it's either one or two rack units in size and comes with a single processor chip), yet has more processing power than the fastest 64-CPU Enterprise 10000 (Starfire) shipped by Sun? Oh, and in case you're wondering, the Starfire comparison is good for delivered application throughput, too, not just theoretical speeds and feeds.

The first issue is to figure out how you're going to use up all the CPU. There are a number of possibilities, including:

  1. You deploy a single application that consumes the entire system. This single application might have multiple threads, such as the Java VM, or multiple processes, like Oracle. When a T2000 is dedicated to a single application, such as Oracle, for example, best practice is to treat it like a standard 12-16 CPU system and tune accordingly. So a good starting point is probably to tune a T5120 or T5220 as a 24-32 CPU system. You will want to monitor the proportion of idle CPU with vmstat or mpstat (or corestat] if you'd like more information about how busy the cores are). If there's a lot of idle CPU, then you might need to tune for more CPUs.

    A single application wasn't the most common way of consuming 32-thread UltraSPARC T1 servers like the Sun Fire T2000, though. And it's even less likely to be typical on the 64-thread T2 servers, which are a little more than twice as powerful as T1 servers.

    Why isn't it typical to consume a T1-based system with a single application? The most common reason is because a single application often doesn't require that much CPU. Sometimes, too, a single application instance doesn't scale well enough to consume all 32 CPUs. We've particularly seen this with open source applications with mostly 1- or 2-CPU deployments. Configuring multiple application instances can sometimes overcome this limitation.

    It's worth noting that application developers will increasingly find themselves needing to solve this issue in the future. With all chip suppliers moving to quad-core implementations, it will soon be necessary for applications to perform well with 4- to 8-CPUs just to consume the CPU resource of a 1- or 2-chip system. Early adopters of T1000 and T2000 systems are in good shape, because it's likely they've already made this transition.

  2. You consume the entire system by deploying multiple applications. These applications can, in turn, be multi-threaded, multi-process, or multi-user. Virtualization can be an attractive way of managing multiple applications, and there are two available technologies on T2-based servers: Solaris Zones and Logical Domains (LDoms). They are complimentary technologies, too, so you can use either, or both together. Domains will already be familiar to many - Sun users have been carving up their systems into multiple domains since the days of the Starfire. The LDom implementation is different, but the concepts are very similar. Check out this link for pointers to blogs on LDoms.


In my blog on Sizing T1 Servers back in 2005, I made a number of suggestions about sizing and consolidation that also apply to the new systems. I also noted two caveats related to performance. The first related to floating-point intensive workloads. This caveat no longer applies on T2 servers - the floating point units included in each of the 8 cores deliver excellent floating point performance. The second caveat related to single-thread performance and the importance of understanding whether an application would run well in a multi-threaded environment. Is there, for example, a significant dependence on long-running single-threaded batch jobs? This question must still be asked of T2 servers, although the single-threaded performance of the T2 is improved relative to the T1. The Cooltst tool was created to help identify single-threaded behavior with applications running on existing Solaris (SPARC and x64/x86) and Linux systems (x64/x86). A new version of Cooltst will soon be available that supports T2 systems as well. For optimal throughput with T2-based servers, single-threaded applications should either be broken up, deployed as multiple application instances, or mixed with other applications.

The bottom line is that T5x20 servers will soon be replacing much larger systems, and delivering significant reductions in energy, cooling, and storage requirements.


Post a Comment:
Comments are closed for this entry.

I'm a Principal Engineer in the Performance Technologies group at Sun. My current role is team lead for the MySQL Performance & Scalability Project.


« July 2016