Thursday Dec 06, 2007
Thursday Oct 11, 2007
By allanp on Oct 11, 2007
Sun engineers give the inside scoop on the new UltraSPARC T2 systems
[ Update Jan 2008: Sun SPARC Enterprise T5120 and T5220 servers were awarded Product of the Year 2007. ]
Sun launched the Chip-Level MultiThreading (CMT) era back in December 2005 with the release of the highly successful UltraSPARC T1 (Niagara) chip, featured in the Sun Fire T2000 and T1000 systems. With 8 cores, each with 4 hardware strands (or threads), these systems presented 32 CPUs and delivered an unprecedented amount of processing power in compact, eco-friendly packaging. The systems were referred to as CoolThreads servers because of their low power and cooling requirements.
Today Sun introduces the second generation of Niagara systems: the Sun SPARC Enterprise T5120 and T5220 servers and the Sun Blade T6320. With 8 hardware strands in each of 8 cores plus networking, PCI, and cryptographic capabilities, all packed into a single chip, these new 64-CPU systems raise the bar even higher.
The new systems can probably be best described by some of the engineers who have developed them, tested them, and pushed them to their limits. Their blogs will be cross-referenced here, so if you're interested to learn more, come back from time to time. New blogs should appear in the next 24 hours, and more over the next few weeks.
Here's what the engineers have to say.
- UltraSPARC T2 Server Technology. Dwayne Lee gives us a quick overview of the new systems. Denis Sheahan blogs about UltraSPARC T2 floating point performance, offers a detailed T5120 and T5220 system overview, and shares insights into lessons learned from the UltraSPARC T1. Josh Simons offers us a glimpse under the covers. Stephan Hoerold gives us an illustration of the UltraSPARC T2 chip. Paul Sandhu gives us some insight into the MMU and shared contexts. Tim Bray blogs about the interesting challenges posed by a many-core future. Darryl Gove talks about T2 threads and cores. Tim Cook compares the UltraSPARC T2 to other recent SPARC processors. Phil Harman tests memory throughput on an UltraSPARC T2 system. Ariel Hendel, musing on CMT and evolution, evidences a philosophical bent.
- Performance. The inimitable bmseer gives us a bunch of good news about benchmark performance on the new systems - no shortage of world records, apparently! Peter Yakutis offers detailed PCI-E I/O performance data. Ganesh Ramamurthy muses on the implications of UltraSPARC T2 servers from the perspective of a senior performance engineering director.
- System Management. Find out about Lights Out Management (ILOM) from Tushar Katarki's blog.
- Networking. Alan Chiu gives us some insights into 10 Gigabit Ethernet performance and tuning on the UltraSPARC T2 systems.
- RAS. Richard Elling carries out a performability analysis of the T5120 and T5220 servers.
- Clusters. Ashutosh Tripathi discusses Solaris Cluster support in LDoms I/O domains.
- Virtualization. Learn about Logical Domains (LDoms) and the release of LDoms 1.0.1 from Honglin Su. Eric Sharakan has some more to say about LDoms and the UltraSPARC T2. Ashley Saulsbury presents a flash demo of 64 Logical Domains booting on an UltraSPARC T2 system. Find out why Sun xVM and Niagara 2 are the ultimate virtualization combo from Marc Hamilton.
- Security Performance. Ning Sun discusses Cryptography Acceleration on T2 systems. Glenn Brunette offers us a Security Geek's point of view on the T5x20 systems. Lawrence Spracklen has several posts on UltraSPARC T2 cryptographic acceleration. Martin Mueller proposes a UltraSPARC T2 system deployment designed to deliver a high performance, high security environment.
- Application Performance. Dileep Kumar talks about WebSphere Application Server performance with UltraSPARC T2 systems. Tim Bray shares some hands-on experiences testing a T5120.
- Java Performance. Dave Dagastine offers us insights into the HotSpot JVM on the T2 and Java performance on the new T2 servers.
- Web Applications. Murthy Chintalapati talks about web server performance. Constantin Gonzalez explores the implications of UltraSPARC T2 for Web 2.0 workloads. Shanti Subramanyam tells us that Cool Stack applications (including the AMP stack packages) are pre-loaded on all UltraSPARC T2-based servers.
- Open Source Community. Barton George explorers the implications of UltraSPARC T2 servers for the Ubuntu and Open Source community.
- Open Source Databases. Luojia Chen discusses MySQL tuning for Niagara servers.
- Customer Use Cases. Stephan Hoerold gives us some insight into experiences of Early Access customers. Stephan also shares what happened when STRATO put a T5120 to the test. It seems like STRATO also did some experimentation with the system.
- Sizing. I've posted an entry on Sizing UltraSPARC T2 Servers.
- Solaris features. Scott Davenport blogs on Predictive Self-Healing on the T5120. Steve Sistare gives us a lot of insight into features in Solaris to optimize the UltraSPARC T2 platforms. Walter Bays salutes the folks who reliably deliver consistent interfaces on the new systems.
- HPC & Compilers. Darryl Gove talks about compiler flags for T2 systems. Josh Simons talks about the relevance of the new servers to HPC applications. Ruud van der Pas measures T2 server performance with a suite of single-threaded technical-scientific applications. In another blog entry, Darryl Gove introduces us to performance counters on the T1 and T2.
- Tools. Darryl Gove points to the location of free pre-installed developer tools on UltraSPARC T2 systems. Nicolai Kosche describes the hardware features added to UltraSPARC T2 to improve the DProfile Architecture in Sun Studio 12 Performance Analyzer. Ravindra Talashikar brings us Corestat for UltraSPARC T2, a tool that measures core utilization to help users better understand processor utilization on UltraSPARC T2 systems.
FinallyGo check out the new UltraSPARC T2 systems, and save energy and rack space in the process.
Wednesday Oct 10, 2007
By allanp on Oct 10, 2007
The first issue is to figure out how you're going to use up all the CPU. There are a number of possibilities, including:
- You deploy a single application that consumes the entire system. This single application might have multiple threads, such as the Java VM, or multiple processes, like Oracle. When a T2000 is dedicated to a single application, such as Oracle, for example, best practice is to treat it like a standard 12-16 CPU system and tune accordingly. So a good starting point is probably to tune a T5120 or T5220 as a 24-32 CPU system. You will want to monitor the proportion of idle CPU with vmstat or mpstat (or corestat] if you'd like more information about how busy the cores are). If there's a lot of idle CPU, then you might need to tune for more CPUs.
A single application wasn't the most common way of consuming 32-thread UltraSPARC T1 servers like the Sun Fire T2000, though. And it's even less likely to be typical on the 64-thread T2 servers, which are a little more than twice as powerful as T1 servers.
Why isn't it typical to consume a T1-based system with a single application? The most common reason is because a single application often doesn't require that much CPU. Sometimes, too, a single application instance doesn't scale well enough to consume all 32 CPUs. We've particularly seen this with open source applications with mostly 1- or 2-CPU deployments. Configuring multiple application instances can sometimes overcome this limitation.
It's worth noting that application developers will increasingly find themselves needing to solve this issue in the future. With all chip suppliers moving to quad-core implementations, it will soon be necessary for applications to perform well with 4- to 8-CPUs just to consume the CPU resource of a 1- or 2-chip system. Early adopters of T1000 and T2000 systems are in good shape, because it's likely they've already made this transition.
- You consume the entire system by deploying multiple applications. These applications can, in turn, be multi-threaded, multi-process, or multi-user. Virtualization can be an attractive way of managing multiple applications, and there are two available technologies on T2-based servers: Solaris Zones and Logical Domains (LDoms). They are complimentary technologies, too, so you can use either, or both together. Domains will already be familiar to many - Sun users have been carving up their systems into multiple domains since the days of the Starfire. The LDom implementation is different, but the concepts are very similar. Check out this link for pointers to blogs on LDoms.
Caveats?In my blog on Sizing T1 Servers back in 2005, I made a number of suggestions about sizing and consolidation that also apply to the new systems. I also noted two caveats related to performance. The first related to floating-point intensive workloads. This caveat no longer applies on T2 servers - the floating point units included in each of the 8 cores deliver excellent floating point performance. The second caveat related to single-thread performance and the importance of understanding whether an application would run well in a multi-threaded environment. Is there, for example, a significant dependence on long-running single-threaded batch jobs? This question must still be asked of T2 servers, although the single-threaded performance of the T2 is improved relative to the T1. The Cooltst tool was created to help identify single-threaded behavior with applications running on existing Solaris (SPARC and x64/x86) and Linux systems (x64/x86). A new version of Cooltst will soon be available that supports T2 systems as well. For optimal throughput with T2-based servers, single-threaded applications should either be broken up, deployed as multiple application instances, or mixed with other applications.
The bottom line is that T5x20 servers will soon be replacing much larger systems, and delivering significant reductions in energy, cooling, and storage requirements.
Sunday May 07, 2006
By allanp on May 07, 2006
CoolThreads Consolidation: The Easy WaySolaris 10 and a CoolThreads server make a potent combination. Along with the raw horsepower, the low wattage, and the miserly rack requirements of the CoolThreads server, you get the robust, feature-rich, open source Solaris 10 operating system.
So far so good, but if you're new to Solaris 10 there's a lot to learn. Solaris Containers and Resource Management make a big difference for consolidation, but they take some getting used to. And then you need to figure out the implications of having four threads per core, for example, when configuring the Sun Fire T1000 or T2000. Is there a way of easing the transition for sysadmins who already have too much to think about?
Enter the Consolidation Tool for Sun Fire Servers V1.0, Sun Fire T1000 and T2000 Edition! This GUI tool is designed to simplify the task of consolidating applications onto CoolThreads servers. It also provides a friendly introduction to Solaris Containers, including Zones, resource pools, psets, and the FSS (FairShare) scheduling class.
Consolidation Tool IntroductionThe Consolidation Tool is free (it's open source under the GPL), unsupported, and light-weight, and focused on Solaris 10 and the Sun Fire T1000/T2000. It's ideal for the systems administrator who is considering migrating applications from multiple Xeon boxes running Linux to a T2000 running Solaris 10. The tool can also be used to offer the technically-minded sysadmin a simple introduction to the command line syntax needed to build zones/pools/psets (via the commands script it creates). Note that the Consolidation Tool expects to work with a new Solaris installation - it does not attempt to manage systems that are already using containers.
Consolidation Tool OverviewThe tool provides a simple, easy-to-use interface with context sensitive help. The user can choose between a Basic and an Expert mode, with the latter providing more control over the final configuration at the cost of greater complexity. Intelligent defaults are provided in both modes. The tool eases the user into defining and creating Solaris Containers without assuming any previous knowledge of that technology. It will deploy applications into processor sets where appropriate, and allocate "CPUs" (i.e. hardware threads) in a way that ensures all of a core's threads end up in the same processor set. The tool asks a series of user-friendly questions to determine whether to use full-root Zones, sparse Zones, or no Zones at all. The tool also optionally installs versions of key public domain software into the newly-established Zones on the target CoolThreads system.
The tool prepares a report summarizing the planned deployment, a commands script that is used to create the Zones, pools, and psets and install any specified public domain applications, and a file that stores the configuration data. This approach means that it isn't necessary to run the tool on the target CoolThreads system. Instead you can configure the consolidation environment in advance on a client of your choice. The final step is to run the commands script on the target Sun Fire T1000/T2000 system. Note that if you have elected within the tool to install public domain applications, you will need to put the full distribution onto the target system so that the script can find the public domain applications when you run it.
The tool can be run on any of the following client operating systems:
- Solaris on SPARC
- Solaris on x64/x86
- Linux on Intel/AMD
- MacOS X on PowerPC
Where Can I Get It?You can find the tool on BigAdmin and also under Cool Tools at OpenSPARC.net. Both locations will let you download a presentation introducing the tool, and point you to the tool download location at the Sun Download Center. Download options include:
- A 20MB tar.gz file, which provides the tool plus the necessary libraries for all clients, but none of the public domain packages
- A 130MB tar.gz file with the full distribution, which includes the tool, its dependent libraries, and several public domain applications.
Feedback and DiscussionIf you'd like to offer feedback on the Consolidation Tool, you can do so at email@example.com. This is an auto-responder alias, so don't expect a reply (other than confirmation that your email has been received). If you would like to discuss the tool with other users, check out the Cool Tools Forum.
Monday Dec 19, 2005
By allanp on Dec 19, 2005
The first, Consolidating the Sun Store onto Sun Fire T2000 Servers, documents the process of migrating the online Sun Store from a Sun Enterprise 10000 with 38 400MHz CPUs onto a pair of Sun Fire T2000 servers (they used two for high availability). The resulting environment took advantage of Solaris 10 Containers. They saw an overall reduction of approximately 90 percent in both input power and heat output! Pretty cool (literally)! The space savings were even more significant.
The second blueprint, Web Consolidation on the Sun Fire T1000 using Solaris Containers, gives a detailed hands-on description of the process of building a web-tier consolidation platform on a Sun Fire T1000 with Containers.
If you're planning a CoolThreads consolidation, go check them out. I think you'll find both papers useful.
Tuesday Dec 06, 2005
By allanp on Dec 06, 2005
The Sun Fire T1000/T2000 (aka "CoolThreads") server offers a lot of horsepower in a single chip: up to eight cores running at either 1000MHz or 1200MHz, each core with four hardware threads. But how should this SMP-in-a-chip be sized appropriately for real-world applications?
The published benchmarks show that the application throughput delivered by a single T2000 server is equivalent to the throughput delivered by multiple Xeon systems. And this isn't just marketing hype, either; the UltraSPARC T1 processor is a genuine breakthrough technology. But what are the practical considerations involved in replacing several Xeon servers with a single T1000 or T2000?
Preparing for CoolThreads
For starters, it's important to understand the design point of the UltraSPARC T1. If you need blazing single-thread performance, this isn't the system for you - the chip simply wasn't designed that way. And if you think that's bad, then I'm sorry to say your future is looking a little bleak. Every processor designer in the industry is moving to multiple cores, and one implication is that single thread performance will no longer be getting all the attention. Performance will be served up in smaller packages.
The UltraSPARC T1 is a chip oriented for throughput computing. With the multi-threading capablities of this chip Sun has done two things. The first is to push the envelope much further than anyone else anticipated. Not everyone will applaud this strategy, of course. (And just for fun, note the reactions carefully, and deduct points from competitors who bad-mouth Sun's strategy now, and later end up copying it!) More importantly, though, Sun has issued notice about the way applications need to be designed. In a world that increasingly delivers CPU power through multiple cores and threads, single-threaded applications don't make a whole lot of sense any more. The sooner you multi-thread your applications, the better off you'll be, regardless of your hardware vendor of choice.
That doesn't mean you'll be forced to rearchitect your applications before you can use the T1000/T2000, though. You can proceed provided your planned deployment has one or more of the following characteristics, any of which will allow it to take advantage of UltraSPARC T1's multiple cores and threads:
- Multiple applications
- Multiple user processes
- Multi-threaded applications
- Multi-process applications
In general, commercial software that runs well on SMP (Symmetric Multi-Processor) systems, will run well on T1000/T2000 (because one or more of the above already apply). Note that the Java JVM is already multi-threaded.
When to Walk Away
The other major consideration is floating point performance. The UltraSPARC T1 is not designed for floating-point intensive applications. This isn't as disastrous as it might sound. It turns out that a vast range of commercial applications, from ERP software like SAP through Java application servers, do very little floating point and run just fine on the T1000/T2000. If you're in any doubt about how to figure out the proportion of floating point instructions in your application, help is on the way. More on this in a future blog.
If you made it past the single-threaded and floating point questions, you're ready for some serious sizing. The first step is to see how busy your current servers are. Suppose you plan to consolidate applications from six Xeon servers onto a Sun Fire T2000 server. If the CPUs on each system are typically 30% busy and peak at 50%, then you will be migrating a peak load equivalent to three fully-utilized servers.
By far the best way to test the relative performance of the T1000/T2000 and your current servers is to run your own application on both. If that isn't possible, a crude starting point might be to compare published performance on a real-world workload. Check out the published T1000/T2000 benchmarks for further information. If you can't directly compare your intended applications, try to find something as close as possible (e.g. the CPU, network, and storage I/O resource usage should look at least vaguely similar to your actual workload). Benchmarks that use real ISV application code (e.g. SAP and Oracle Applications) are going to be more relevant to a throughput platform like the T1000/T2000 than artificial benchmarks designed to measure the performance of a traditional CPU. One important warning: don't try to draw final conclusions if you're not comparing the same application on both platforms! Extrapolations don't work well when the technologies are radically different (and the UltraSPARC T1 is simply different to anything else out there).
The next step is to figure out how to deploy the applications. You have four, six, or eight cores at your disposal (depending on the T1000/T2000 platform you've chosen). Should you simply let Solaris worry about the scheduling? Or should you figure out your resource management priorities in advance and carve up the available resources before deploying the applications? You might want to refer to my blog about Consolidating Applications onto a CoolThreads Server for more information on this topic.
Once you're ready to deploy, make sure you do some serious load testing before going live. Don't make the mistake of rushing into production without first finding out how well your application scales on the T1000/T2000 platform. I don't know about you, but I hate nasty surprises! And if you do encounter scaling issues, don't forget that Solaris 10 Dtrace is your friend. And check out DProfile, too.
Once you get your head around this technology, you're going to enjoy it! And that's even without mentioning the power, cooling, and rack space savings...
PS. If you're looking for more CoolThreads info direct from Sun engineers, Richard McDougall has put together an excellent overview of other relevant blogs.
By allanp on Dec 06, 2005
The ground-breaking Sun Fire T1000/T2000 servers, based on the UltraSPARC T1 (Niagara) chip, can provide an excellent platform for application and workload consolidation. An obvious target, for example, might be to consolidate several 1- or 2-CPU Xeon systems onto a single T1000/T2000 (aka "CoolThreads") server, with immediate savings in power consumption and rack space as well as in system administration costs. A wide range of software can be immediately run on the T1000/T2000, thanks to the large portfolio of both proprietary and public domain applications available for SPARC/Solaris.
Solaris 10 offers a number of features that are especially valuable in consolidation. Virtual servers can be created thanks to the Container technology bundled with Solaris 10. There are several key components to containers. The first is zones. A zone is a virtual Solaris instance, and one or more can be created to provide secure application environments with no access to or from other zones running on the same system. Zones do not automatically imply resource management; that's where resource pools come in. A pool can be created with its own dedicated CPU resources and scheduling class, and one or more zones can be optionally bound to the pool to take advantage of those resources. Psets (processor sets) can be created and associated with each pool where it is important to dedicate CPUs to a pool.
So the bottom line is that an arbitrary number of containers can be created on a CoolThreads system, each with its own secure environment and (optionally) its own dedicated CPU resource. The end result is a virtual server into which applications can be deployed. Each T1000/T2000 server employs a single chip that comes with four, six, or eight cores, and four hardware threads per core. For an 8-core system, Solaris sees 32 "CPUs" or virtual processors (eight cores multiplied by four hardware threads). For the purposes of consolidation, we recommend creating psets with multiples of four CPUs, each group of four CPUs corresponding to a single core. (If you only ever create psets with four CPUs, or multiples of four, Solaris will always give you four contiguous CPUs that map directly to the four threads in a single core). If a zone is not likely to require the resources of a full core, other zones can be bound to the same pool, thereby sharing the resources of the pset associated with that pool.
OK, so the technology is definitely there, but how do you get started with it? The good news is that we have developed a freeware tool, called the Sun Fire Consolidation Tool, to simplify the task of creating containers (zones, pools, psets). It is designed for system administrators who haven't yet been exposed to the intricacies of creating zones, resource pools, etc. Taking advantage of an easy-to-use GUI interface, the tool creates a script with all the necessary commands. The script can simply be run on the target system to create the requested containers, but it also comes complete with detailed comments, helpfully illustrating the necessary syntax for anyone interested in learning how to use the commands. It therefore caters to both the casual user and the system administrator wanting to get a head start in mastering the nuances of container management. The tool also optionally installs a number of popular public domain software applications into the newly-created containers.
Finally, the UltraSPARC T1 processor ushers in up a whole new world of price performance. And for maximum benefit, the inexpensive power of the T1000/T2000 hardware can be combined with the inexpensive power of open source software. Open source software continues to gain respect, and now covers almost the entire software stack, from web servers, application servers, and databases, through to the highly-regarded Solaris Operating System.
In summary, the new Sun Fire T1000/T2000 servers are an obvious platform for server consolidation. And probably the best way to make a start is with a pilot.
PS. If you're looking for more CoolThreads info direct from Sun engineers, Richard McDougall has put together an excellent overview of other relevant blogs.
Thursday Oct 06, 2005
By allanp on Oct 06, 2005
What would a second edition focus on? It isn't hard to come up with a short list. The IT industry has been moving in some new directions. In particular there's a lot of excitement building around open source databases; our customers are increasingly asking about them and beginning to deploy them in earnest. Solaris 10 opens up a lot of new possibilities (Containers/Zones and DTrace just for starters), and Open Solaris introduces all of these topics to a wider audience. There's nothing significant in the first edition about Oracle RAC, an omission that should be corrected. And for most of the products covered in the first edition, there are later versions, feature updates, and new insights to be explored.
If you think there should be a second edition, please let me know. And feel free to suggest the topics you'd most like to see covered.
I'll let you know if and when there's more news...
I'm a Principal Engineer in the Performance Technologies group at Sun. My current role is team lead for the MySQL Performance & Scalability Project.
- Life at Oracle
- MySQL 5.4 Scaling on Nehalem with Sysbench
- MySQL 5.4 Sysbench Scalability on 64-way CMT Servers
- MySQL 5.4 Scalability on 64-way CMT Servers
- MySQL Performance Optimizations
- Sun's 4-chip CMT system raises the bar
- Sizing a Sun Enterprise SPARC T5440 Server
- Dtrace with MySQL 6.0.5 - on a Mac
- Sun's CMT goes multi-chip
- Tuning MySQL on Linux