Sunday Nov 16, 2008

Using SPARC and Solaris for HPC: More of this, please!

Ken Edgecombe – Executive Director of HPCVL spoke today at the HPC Consortium Meeting in Austin about experiences with SPARC and HPC at his facility.

HPCVL has a massive amount of Sun gear, the newest of which includes a cluster of eight Sun SPARC Enterprise M9000 nodes, our largest SMP systems. Each node has 64 quad-core, dual-threaded SPARC64 processors and includes 2TB of RAM. With a total of 512 threads per node, the cluster has a peak performance of 20.5 TFLOPs. As you'd expect, these systems offer excellent performance for problems with large memory footprints or for those requiring extremely high bandwidths and low latencies between communicating processors.

In addition to their M9000 cluster, HPCVL has another new resource that consists of 78 Sun SPARC Enterprise T5140 (Maramba) nodes, each with two eight-core Niagara2+ processors (a.k.a. UltraSPARC T2plus). With eight threads per core, these systems make almost 10,000 hardware threads available to users at HPCVL.

Ken described some of the challenges of deploying the T5140 nodes in his HPC environment. The biggest issue is that researchers invariably first try running a serial job on these systems and then report they are very disappointed with the resulting performance. No surprise since these systems run at less that 1.5 GHz as compared to competing processors that run at over twice that rate. As Ken emphasized several times, the key educational issue is to re-orient users to thinking less about single-threaded performance and more about "getting more work done." In other words, throughput computing. For jobs that can scale to take advantage of more threads, excellent overall performance can be achieved my consuming more (slower) threads to complete the job in a competitive time. This works if one can either extract more parallelism from a single application, or run multiple instances of applications to make efficient use of the threads within these CMT systems. With 256 threads per node, there is a lot of parallelism available for getting work done.

As he closed, Ken reminded attendees of the 2009 High Performance Computing Symposium which will be held June 14-17 in Kingston, Ontario at HPCVL.

Thursday Nov 13, 2008

Big News for HPC Developers: More Free Stuff

'Tis the Season. Supercomputing season, that is. Every November the HPC community--users, researchers, and vendors--attend the world's biggest conference on HPC: Supercomputing. This year SC08 is being held in Austin Texas, to which I'll be flying in a few short hours.

As part of the seasonal rituals vendors often announce new products, showcase new technologies and generally strut their stuff at the show and even before the show in some cases. Sun is no exception as you will see if you visit our booth at the show and if you take note of two announcements we made today that should be seen as a Big Deal to HPC developers. The first concerns MPI and the second our Sun Studio developer tools.

The first announcement extends Sun's support of Open MPI to Linux with the release of ClusterTools 8.1. This is huge news for anyone looking for a pre-built and extensively tested version of Open MPI for RHEL 4 or 5, SLES 9 or 10, OpenSolaris, or Solaris 10. Support contracts are available for a fee if you need one, but you can download the CT 8.1 bits here for free and use them to your heart's content, no strings attached.

Here are some of the major features supported in ClusterTools 8.1:

  • Support for Linux (RHEL 4&5, SLES 9&10), Solaris 10, OpenSolaris
  • Support for Sun Studio compilers on Solaris and Linux, plus the GNU/gcc toolchain on Linux
  • MPI profiling support with Sun Studio Analyzer (see SSX 11.2008), plus support for VampirTrace and MPI PERUSE
  • InfiniBand multi-rail support
  • Mellanox ConnectX Infiniband support
  • DTrace provider support on Solaris
  • Enhanced performance and scalability, including processor affinity support
  • Support for InfiniBand, GbE, 10GbE, and Myrinet interconnects
  • Plug-ins for Sun Grid Engine (SGE) and Portable Batch System (PBS)
  • Full MPI-2 standard compliance, including MPI I/O and one sided communication

The second event was the release of Sun Studio Express 11/08, which among other enhancements adds complete support for the new OpenMP 3.0 specification, including tasking. If you are questing for ways to extract parallelism from your code to take advantage of multicore processors, you should be looking seriously at OpenMP. And you should do it with the Sun Studio suite, our free compilers and tools which really kick butt on OpenMP performance. You can download everything--the compilers, the debugger, the performance analyzer (including new MPI performance analysis support) and other tools for free from here. Solaris 10, OpenSolaris, and Linux (RHEL 5/SuSE 10/Ubuntu 8.04/CentOS 5.1) are all supported. That includes an extremely high-quality (and free) Fortran compiler among other goodies. (Is it sad that us HPC types still get a little giddy about Fortran? What can I say...)

The full list of capabilities in this Express release are too numerous to list here, so check out this feature list or visit the wiki.

Friday Jul 18, 2008

Two MEASURED TeraFLOPs in a Box: Now THAT is Big Iron!

I love the smell of Big Iron in the morning.

We just announced new versions of our M-series midrange and high-end SMPs, the M4000, M5000, M8000, and M9000 systems, that sport the latest Fujitsu quad-core, dual-threaded SPARC64 VII processor. These systems, a co-development effort between Sun and Fujitsu, are traditionally viewed as high-end enterprise-class systems. With up to 64 quad-core processors, up to 2 TBytes of memory, and up to 288 PCIe or PCI-X IO slots, these systems are clearly high-end datacenter workhorses. But they kick butt on HPC workloads as well. No surprise given the tight coupling of compute and memory in such an SMP system, which is especially valuable for computations involving large amounts of very fine-grained communication between cooperating parallel processes.

We've published world record benchmark numbers on a standard Open MP benchmark, besting the competition by some considerable margins. We've also shown new world record benchmarks on a prominent standard floating-point benchmark. My favorite result, however, is a LINPACK score of over 2 TeraFLOPs with a single M9000 system using Solaris 10 and our latest compilers, Sun Studio 12. This result is almost 2X higher with the new 2.52 GHz SPARC64 VII processor than with the previous 2.4 GHz SPARC64 VI processor. Impressive--and yet another example of why shopping based on processor clock speeds is an increasingly bad idea. In any case, you can read more details about these benchmark results and others here and here.

Tuesday Aug 07, 2007

The UltraSPARC T2 Processor: More of Everything, Please

Sun officially announced the UltraSPARC T2 processor today. Technical specifications, datasheets, etc. are available here.

The question is, who should care?

Fortunately, this is a question easily answered. :-)

Here is my unordered list of who I believe should pay attention to this announcement:

    Customers who like the T1, but need more horsepower. The T2 has 64 threads on a single chip, up from the T1's 32 threads. Couple a T2-based system with our SPARC virtualization technology (LDOMS) and you'll have quite a nice consolidation platform.

    Customers who like the idea of the T1, but who have workloads with floating point requirements. The T1 has one floating point unit on the chip to serve all 32 threads. The T2 has EIGHT floating point units--one per core. I expect some HPC customers with throughput requirements will find the UltraSPARC-T2 very interesting. Note the SPEC estimates cited in the announcement materials (estimated SPECint_rate2006: 78.3, estimated SPECfp_rate2006: 62.3.[\*]) Lots more performance data here.

    Customers who have significant networking requirements in addition to their throughput computing needs. The T2 comes with integrated, on-chip 10 GbE.

    Anyone who needs beefy crypto performance. Yep, our chip guys managed to cram per-core dedicated crytpography functions onto the chip as well.

    Educators and entrepreneurs who will be interested in using the UltraSPARC T2 design as the basis of their work. We expect to release the T2's design under open source, much as we've done already with the UltraSPARC T1. We've kickstarted some innovation with T1 and I expect to see even more interest with T2.

    And, last, anyone who enjoys saying things like this (you know who you are):

      Woo-eee, look at this 1 Gbyte flash drive I just bought for $10!

      I remember when we bought our first Fujitsu Eagle 1 Gbyte hard disk in the late 1980s. It cost $10K, fit in a 19" rack, and needed two people to lift it!

    Soon (before the end of this year) you'll be able to buy a T2-based system and say something like this:

      Woo-eee, look at this 1RU (1 rack-unit = 1.75 inches) server with 64 hardware threads, integrated 10 GbE networking, and onboard crypto I just bought!

      I remember when we bought that 64-CPU Starfire system back in the mid-1990s. It was six feet high, 40 inches wide and weighed about 1800 lbs!

There's actually a serious point buried in that last bit of silliness, but I'll leave that for a future blog entry.

[\*] All Sun UltraSPARC T2 SPEC CPU metrics quoted are from full “reportable” runs, but are nevertheless designated as “estimates” because they use preproduction systems. SPEC, SPECint, SPECfp registered trademarks of Standard Performance Evaluation Corporation. Sun UltraSPARC T2 1.4GHz (1 chip, 8 cores, 64 threads) 78.3 est. SPECint_rate2006, 62.3 est. SPECfp_rate2006.


Josh Simons


« April 2014