Wednesday Mar 18, 2009

More Free HPC Developer Tools for Solaris and Linux

The Sun Studio team just released the latest version of our HPC developer tools with so many enhancements and additions it's hard to know where to start this blog entry. I suppose with the basics: As usual, all of the software is free. And available for both Solaris and Linux, specifically Solaris, OpenSolaris, RHEL, SuSE, and Ubuntu. Frankly, Sun would like to be your preferred provider for high-performance Fortran, C, and C++ compilers and tools. Given the performance and capabilities we deliver for HPC with Sun Studio, that seems a pretty reasonable goal to me. We think the price has been set correctly to achieve that as well. :-)

I have to admit to being confused by the naming convention for this release, but it goes something like this. The release is an EA (Early Access) version of Sun Studio 12 Update 1 -- the first major update to Sun Studio 12 since it was released in the summer of 2007. Since Sun Studio's latest and greatest bits are released every three months as part of the Express program, this release can also be called Sun Studio Express 3/09. Different names, same bits. Don't worry about it -- just focus on the fact that they make great compilers and tools. :-)

Regardless of what they call it, the release can be downloaded here. Take it for a spin and let the developers know what you think on the forum or file a request for enhancement (RFE) or a bug report here.

For the full list of new features, go here. For my personal list of favorite new features, read on.

  • Full OpenMP 3.0 compiler and tools support. For those not familiar, OpenMP is the industry standard for directives-based threaded application parallelization. Or, the answer to the question, "So how do I use all the cores and threads in my spiffy new multicore processor?"
  • ScaLAPACK 1.8 is now included in the Sun Performance Library! It works with Sun's MPI (Sun HPC ClusterTools), which is based on Open MPI 1.3. The Perflib team has also made significant performance enhancements to BLAS, LAPACK, and the FFT routines, including support for the latest Intel and AMD processors. Nice.
  • MPI performance analysis integrated into the Sun Performance Analyzer. Analyzer has been for years a kick-butt performance tool for single-process applications. It has now been extended to help MPI programmers deal with message-passing related performance problems.
  • Continued, aggressive attention paid to optimizing for the latest SPARC, Intel, and AMD processors. C, C++, and Fortran performance will all benefit from these changes.
  • A new standalone GUI debugger. Go ahead, graduate from printf() and try a real debugger. It won't bite.

As I mentioned above, full details on these new features and many, many more are all documented on this wiki page. And, again, the bits are here.

Wednesday Jul 30, 2008

Fresh Bits: Attention all OpenMP and MPI Programmers!

The latest preview release of Sun's compiler and tools suite for C, C++, and FORTRAN users is now available for free download. Called Sun Studio Express 07/08, this release of Sun Studio marks an important advance for HPC customers and for any customer interested in extracting high performance from today's multi-threaded and multi-core processors. In addition to numerous compiler performance enhancements, the release includes beta-level support for the latest OpenMP standard, OpenMP 3.0. It also includes some nice Performance Analyzer enhancements that support simple and intuitive performance analysis of MPI jobs. More detail on both of these below.

As the industry-standard approach for achieving parallel performance on multi-CPU systems, OpenMP has long been a mainstay of the HPC developer community. Version 3.0, which is supported in this new Sun Studio preview release, is a major enhancement to the standard. Most notably it includes support for tasking, a major new feature that can help programmers achieve better performance and scalability with less effort than previous approaches using nested parallelism. There are a host of other enhancements as well. The OpenMP expert will find the latest specification useful. For those new to parallelism who have stumbled into a maze of twisty passages all alike, you may find Using OpenMP: Portable Shared Memory Parallel Programming to be a useful introduction to parallelism and OpenMP.

A parallel quicksort example, written using the new OpenMP tasking feature supported in Sun Studio Express 07/08

Sun Studio Express 07/08 also includes enhancements for programmers of parallel, distributed applications who use MPI. With this release of Sun Studio Express we have introduced tighter integration with Sun's MPI library (Sun HPC ClusterTools). Sun's Performance Analyzer has been enhanced to include the ability to examine the performance of MPI jobs by viewing information related to message transfers and messaging performance using a variety of visualization methods. This extends Analyzer's already-sophisticated on-node performance analysis capabilities. Some screenshots below give some idea of the types of information that can be viewed. You should note the idea of viewing "MPI states" (e.g. MPI Wait and MPI Work) to get a high level view of the performance of the MPI portion of an application: an ability to understand how much time is spent doing actual work versus sitting in a wait state can motivate useful insights into the performance of these parallel, distributed codes.

A source code viewer window augmented with several MPI-specific capabilities, one of which is illustrated here: the ability to quickly see how much work (or waiting) is performed within a function.

In addition to supporting direct viewing of specific MPI performance issues within an application, Analyzer now also supports a range of visualization tools useful for understanding the messaging portion of an MPI code. Zoomable timelines with MPI events are supported, as is an ability to map various metrics against the X and Y axis of a plotting area to display various interesting characteristics of the MPI run, as shown below.

Just one example of Sun Studio's new MPI charting capabilities. Shown here is a display showing the volume of messages transferred between communicating pairs of MPI processes during an application run.

This blog entry has barely scratched the surface of the new OpenMP and MPI capabilities available in this release. If you are a Solaris or Linux HPC programmer, please take these new capabilities for a test drive and let us know what you think. I know the engineering teams are excited by what they've accomplished and I hope you will share their enthusiasm once you've tried these new capabilities.

Sun Studio Express 07/08 is available for Solaris 9 & 10, OpenSolaris 2008.05, and Linux (SLES 9, RHEL 4) and can be downloaded here.

Tuesday Oct 09, 2007

CMT for HPC: Sun Launches UltraSPARC T2 Servers

[ultrasparc t2 chip]

Today we announced our first servers based on the UltraSPARC T2 (Niagara2) processor. They are officially named the Sun SPARC Enterprise T5120, the Sun SPARC Enterprise T5220, and the Sun Blade T6320. For those who enjoy code names, the rack servers are known internally as "Huron," following in the Great Lakes theme from our UltraSPARC T1-based systems. The blade is called "Glendale." For detailed specifications on these new machines, start here. UltraSHORT summary: 64 threads, eight floating point units, on-chip 10GbE, low power, 1RU or 2RU or blade form factors. And looking interesting for some HPC workloads.

The UltraSPARC T2 is Sun's second generation CMT (chip multithreaded) processor. The first-generation UltraSPARC T1, which has 32 threads and only one floating point unit, performs well on many throughput-oriented tasks, but isn't suitable as a general-purpose processor for High Performance Computing. Some HPC areas like life sciences and some parts of the intelligence community have integer-intensive workloads and can use the UltraSPARC T1 to advantage. For example, see the numerous entries on Lawrence Spracklen's blog.

So, what can we say about the UltraSPARC T2 and its platforms relative to HPC?

As usual, application performance will depend greatly on the specifics of your application, but having seen the results of several benchmarks on the UltraSPARC T2, I can make some observations. First, remember the primary value proposition of these CMT systems is throughput, and not single-thread performance. We use relatively low-performing cores, but give you eight of them on a single chip, each with multiple threads. Therefore your application or workload must benefit from lots of threads and from the CMT's ability to hide memory latency by performing real work while waiting for memory operations to complete.

I'll leave it to the benchmarking folks to give you the official story on exact results and instead make some general observations. First, these new systems generate leading performance numbers on a popular floating-point rate (i.e. throughput) benchmark. However, to achieve those numbers we obviously must run enough instances of the benchmark to make use of all of our threads, which increases the memory footprint and therefore the cost of the system. How much that matters to you in real life depends on how your application's memory footprint scales in practice.

Consider for example, an OpenMP application. Using OpenMP to parallelize an application leaves the memory footprint essentially unchanged and instead varies the number of threads used within the application. As you'd expect, the thread-rich T2-based systems deliver some very interesting OpenMP benchmark results.

Beyond performance issues, let's not lose sight of the fact that these tiny boxes have 64 hardware threads (eight FPUs), making them interesting platforms for HPC developers working on parallel algorithms, possibly even for MPI developers wanting to debug their distributed applications on a single machine. And, of course, you should expect to be able to cluster these machines for building larger HPC systems using either the on-board 10GbE or InfiniBand.

For other Sun blogger perspectives on these new systems, start with Allan Packer's cross-reference entry.


Josh Simons


« April 2014