Thursday Aug 27, 2009

Parallel Computing: Berkeley's Summer Bootcamp

Two weeks ago the Parallel Computing Laboratory at the University of California Berkeley ran an excellent three-day summer bootcamp on parallel computing. I was one of about 200 people who attended remotely while another large pile of people elected to attend in person on the UCB campus. This was an excellent opportunity to listen to some very well known and talented people in the HPC community. Video and presentation material is available on the web and I would recommend it to anyone interested in parallel computing or HPC. See below for details.

The bootcamp, which was called the 2009 Par Lab Boot Camp - Short Course on Parallel Programming covered a wide array of useful topics, including introductions to many of the current and emerging HPC parallel computing models (pthreads, OpenMP, MPI, UPC, CUDA, OpenCL, etc.), hands-on labs for in-person attendees, and some nice discussions on parallelism and how to find it with an emphasis on the motifs (patterns) of parallelism identified in The Landscape of Parallel Computing Research: A View From Berkeley. There was also a presentation on performance analysis tools and several application-level talks. It was an excellent event.

The bootcamp agenda is shown below. Session videos and PDF decks are available here.

talk title speaker
Introduction and Welcome Dave Patterson (UCB)
Introduction to Parallel Architectures John Kubiatowicz (UCB)
Shared Memory Programming with Pthreads, OpenMP and TBB Katherine Yelick (UCB & LBNL), Tim Mattson (Intel), Michael Wrinn (Intel)
Sources of parallelism and locality in simulation James Demmel (UCB)
Architecting Parallel Software Using Design Patterns Kurt Keutzer (UCB)
Data-Parallel Programming on Manycore Graphics Processors Bryan Catanzaro (UCB)
OpenCL Tim Mattson (Intel)
Computational Patterns of Parallel Programming James Demmel (UCB)
Building Parallel Applications Ras Bodik (UCB), Ras Bodik (UCB), Nelson Morgan (UCB)
Distributed Memory Programming in MPI and UPC Katherine Yelick (UCB & LBNL)
Performance Analysis Tools Karl Fuerlinger (UCB)
Cloud Computing Matei Zaharia (UCB)

Wednesday Jul 30, 2008

Fresh Bits: Attention all OpenMP and MPI Programmers!

The latest preview release of Sun's compiler and tools suite for C, C++, and FORTRAN users is now available for free download. Called Sun Studio Express 07/08, this release of Sun Studio marks an important advance for HPC customers and for any customer interested in extracting high performance from today's multi-threaded and multi-core processors. In addition to numerous compiler performance enhancements, the release includes beta-level support for the latest OpenMP standard, OpenMP 3.0. It also includes some nice Performance Analyzer enhancements that support simple and intuitive performance analysis of MPI jobs. More detail on both of these below.

As the industry-standard approach for achieving parallel performance on multi-CPU systems, OpenMP has long been a mainstay of the HPC developer community. Version 3.0, which is supported in this new Sun Studio preview release, is a major enhancement to the standard. Most notably it includes support for tasking, a major new feature that can help programmers achieve better performance and scalability with less effort than previous approaches using nested parallelism. There are a host of other enhancements as well. The OpenMP expert will find the latest specification useful. For those new to parallelism who have stumbled into a maze of twisty passages all alike, you may find Using OpenMP: Portable Shared Memory Parallel Programming to be a useful introduction to parallelism and OpenMP.

A parallel quicksort example, written using the new OpenMP tasking feature supported in Sun Studio Express 07/08

Sun Studio Express 07/08 also includes enhancements for programmers of parallel, distributed applications who use MPI. With this release of Sun Studio Express we have introduced tighter integration with Sun's MPI library (Sun HPC ClusterTools). Sun's Performance Analyzer has been enhanced to include the ability to examine the performance of MPI jobs by viewing information related to message transfers and messaging performance using a variety of visualization methods. This extends Analyzer's already-sophisticated on-node performance analysis capabilities. Some screenshots below give some idea of the types of information that can be viewed. You should note the idea of viewing "MPI states" (e.g. MPI Wait and MPI Work) to get a high level view of the performance of the MPI portion of an application: an ability to understand how much time is spent doing actual work versus sitting in a wait state can motivate useful insights into the performance of these parallel, distributed codes.

A source code viewer window augmented with several MPI-specific capabilities, one of which is illustrated here: the ability to quickly see how much work (or waiting) is performed within a function.

In addition to supporting direct viewing of specific MPI performance issues within an application, Analyzer now also supports a range of visualization tools useful for understanding the messaging portion of an MPI code. Zoomable timelines with MPI events are supported, as is an ability to map various metrics against the X and Y axis of a plotting area to display various interesting characteristics of the MPI run, as shown below.

Just one example of Sun Studio's new MPI charting capabilities. Shown here is a display showing the volume of messages transferred between communicating pairs of MPI processes during an application run.

This blog entry has barely scratched the surface of the new OpenMP and MPI capabilities available in this release. If you are a Solaris or Linux HPC programmer, please take these new capabilities for a test drive and let us know what you think. I know the engineering teams are excited by what they've accomplished and I hope you will share their enthusiasm once you've tried these new capabilities.

Sun Studio Express 07/08 is available for Solaris 9 & 10, OpenSolaris 2008.05, and Linux (SLES 9, RHEL 4) and can be downloaded here.

Monday Jun 16, 2008

HPC Consortium: Technical University of Denmark

Bernd Dammann, Associate Professor at the Technical University of Denmark spoke yesterday afternoon at the Sun HPC Consortium meeting here in Dresden. His talk focused on the benefits of Sun's Studio compiler and tools suite for education and HPC.

Sun Studio is an important tool for teaching students about HPC programming techniques at DTU, including data flow through cache-based systems, loop-based optimization techniques, pipelining, and general application tuning techniques. The particular programming course that Bernd described focuses on helping students understand how real computers work, how memory and CPUs are glued together -- the details they don't learn in more theoretical courses. However, once the students are exposed to these techniques they are surprised to learn that many of these techniques are applied automatically to their codes by modern compilers. Good news for ease of use, but frustrating for engineering students who typically don't like black boxes and prefer to understand internal details in such cases. Sun Studio solves this problem with its compiler commentary feature.

Compiler commentary allows the programmer to view their source code annotated with compiler-generated comments that describe in detail not just which optimizations were applied to the code, but also cases in which optimizations where not applied. Bernd showed several examples that illustrated how the feature works, including the use of the er_src utility to display source code with interleaved commentary.

The benefit, of course, of the compiler commentary is that a suitably educated programmer can use the information to find additional opportunities for performance improvement by making suitable changes to the code or by activating additional compiler features. For example, one of Bernd's examples showed the use of the -xrestrict compiler flag, which was not familiar to me. It allows the programmer to tell the compiler that pointers in the code are known not to overlap, which can potentially allow the compiler to significantly increase performance with additional optimizations. Cool.

Bernd noted that Sun's compiler commentary is "in the right place," namely in the binaries rather than in separate log files or merely displayed on a screen. By storing the commentary internally, it can be extracted and looked at later-- potentially long after compilation, which can be very useful.

Bernd then gave a brief overview of Sun's primary performance analysis tool, Sun Studio Performance Analyzer. He noted that it can display compiler commentary as well as a variety of application performance metrics (cpu usage, timelines, etc.) He praised it as a tool that both he and his students find very intuitive and easy to use.

Sun Studio is also used in a parallel programming course where it makes teaching OpenMP much easier. Being able to look at performance timelines for each thread to see OpenMP overheads and using the Thread Analyzer tool to detect data races were two examples. He also liked Sun's extension to OpenMP that allows the compiler to perform automatic scoping, which can be very useful in dealing with large, legacy codes with hundreds of variables.

Students at DTU have been using the Sun Studio tools for four years and like it a lot. They wished they could use the tools on Linux. Well, now they can.

Bernd ended his talk with some favorable comparisons of Sun Studio C against GCC and Intel C and showed an example of how easy it was to use the Sun tools to easily parallelize and debug a large, legacy Fortran code and get good parallel performance very quickly. He conclused with the observation that Sun Studio is a world-class product that is easy to learn and use. Bernd is now on my official list of favorite customers. :-)


Josh Simons


« July 2016