Friday Oct 16, 2009

Fortress: Parallel by Default

I gave a short talk about Fortress, a new parallel language, at the Sun HPC Workshop in Regensburg, Germany and thought I'd post the slides here with commentary. Since I'm certainly not a Fortress expert by any stretch of the imagination, my intent was to give the audience a feel for the language and its origins rather than attempt a deep dive in any particular area. Christine Flood from the SunLabs Programming Languages Research Group helped me with the slides. I also stole liberally from presentations and other materials created by other Fortress team members.

The unofficial Fortress tag line, inspired by Fortress's emphasis on programmer productivity. With Fortress, programmer/scientists express their algorithms in a mathematical notation that is much closer to their domain of expertise than the syntax of the typical programming language. We'll see numerous examples in the following slides.

At the highest level, there are two things to know about Fortress. First, that it started as a SunLabs research project, and, second, that the work is being done in the open under as the Project Fortress Community, whose website is here. Source code downloads, documentation, code samples, etc., are all available on the site.

Fortress was conceived as part of Sun's involvement in a DARPA program called High Productivity Computing Systems (HPCS,) which was designed to encourage the development of hardware and software approaches that would significantly increase the productivity of the application developers and users of High Performance Computing systems. Each of the three companies selected to continue past the introductory phase of the program proposed a language designed to meet these requirements. IBM chose essentially to extend Java for HPC, while both Cray and Sun proposed new object-oriented languages. Michèle Weiland at the University of Edinburgh has written a short technical report that offers a comparison of the three language approaches. It is available in PDF format here.

I've mentioned productivity, but not defined it. I recommend visiting Michael Van De Vanter's publications page for more insight. Michael was a member of the Sun HPCS team who focused with several colleagues on the issue of productivity in an HPCS context. His considerable publication list is here.

Because I don't believe Sun's HPCS proposal has ever been made public, I won't comment further on the specific scalability goals set for Fortress other than to say they were chosen to complement the proposed hardware approach. Because Sun was not selected to proceed to the final phase of the HPCS program, we have not built the proposed system. We have, however, continued the Fortress project and several other initiatives that we believe are of continuing value.

Growability was a philosophical decision made by Fortress designers and we'll talk about that later. For now, note that Fortress is implemented as a small core with an extensive and growing set of capabilities provided by libraries.

As mentioned earlier, Fortress is designed to accommodate the programmer/scientist by allowing algorithms to be expressed directly in familiar mathematical notation. It is also important to note that Fortress constructs are parallel by default, unlike many other languages which require an explicit declaration to create parallelism. Actually to be more precise, Fortress is "potentially parallel" by default. If parallelism can be found, it will be exploited.

Finally, some code. We will look at several versions of a factorial function over the next several slides to illustrate some features of Fortress. (For additional illustrative Fibonacci examples, go here.) The first version of the function is shown here beneath a concise, mathematical definition of factorial for reference.

The red underlines highlight two Fortressisms. First, the condition in the first conditional is written naturally as a single range rather than as the more conventional (and convoluted) two-clause condition. And, second, the actual recursion shows that juxtaposition can be used to imply multiplication as is common when writing mathematical statements.

This version defines a new operator, the "!" factorial operator, and then uses that operator in the recursive step. The code has also been run through the Fortress pretty printer that converts it from ASCII form to a more mathematically formatted representation. As you can see, the core logic of the code now closely mimics the mathematical definition of factorial.

This non-recursive version of the operator definition uses a loop to compute the factorial.

Since Fortress is parallel by default, all iterations of this loop could theoretically be executed in parallel, depending on the underlying platform. The "atomic" keyword ensures that the update of the variable result is performed atomically to ensure correct execution.

This slide shows an example of how Fortress code is written with a standard keyboard and what the code looks like after it is formatted with Fortify, the Fortress pretty printer. Several common mathematical operators are shown at the bottom of the slide along with their ASCII equivalents.

A few examples of Fortress operator precedence. Perhaps the most interesting point is the fact that white space matters to the Fortress parser. Since the spacing in the 2nd negative example implies a precedence different than the actual precedence, this statement would be rejected by Fortress on the theory that its execution would not compute the result intended by the programmer.

Don't go overboard with juxtaposition as a multiplication operator -- there is clearly still a role for parentheses in Fortress, especially when dealing with complex expressions. While these two statements are supposed to be equivalent, I should point out that the first statement actually has a typo and will be rejected by Fortress. Can you spot the error? It's the "3n" that's the problem because it isn't a valid Fortress number, illustrating one case in which a juxtaposition in everyday math isn't accepted by the language. Put a space between the "3" and the "n" to fix the problem.

Here is a larger example of Fortress code. On the left is the algorithmic description of the conjugate gradient (CG) component of the NAS parallel benchmarks, taken directly from the original 1994 technical report. On the right is the Fortress code. Or do I have that backwards? :-)

More Fortress code is available on the Fortress by Example page at the community site.

Several ways to express ranges in Fortress.

The first static array definition creates a one dimensional array of 1000 32-bit integers. The second definition creates a one dimensional array of length size, initialized to zero. I know it looks like a 2D array, but the 2nd instance of ZZ32 in the Array construct refers to the type of the index rather than specifying a 2nd array dimension.

The last array subexpression is interesting, since it is only partially specified. It extracts a 20x10 subarray from array b starting at its origin.

Tuple components can be evaluated in parallel, including arguments to functions. As with for loops, do clauses execute in parallel.

In Fortress, generators control how loops are run and generators generally run computations in any order, often in parallel. As an example, the sum reduction over X and Y is controlled by a generator that will cause the summation of the products to occur in parallel or at the least in a non-deterministic order if running on a single-processor machine.

In Fortress, when parallelism is generated the execution of that work is handled using a work stealing strategy similar to that used by Cilk. Essentially, when a compute resource finishes executing its tasks, it pulls work items from other processor's work queues, ensuring that compute resources stay busy by load balancing the available work across all processors.

Essentially, a restatement of an earlier point: In Fortress, generators play the role that iterators play in other languages. By relegating the details of how the index space is processed to the generator, it is natural to then also allow the generator to control how the enclosed processing steps are executed. A generator might execute computations serially or in parallel.

A generator could conceivably also control whether computations are done locally on a single system or distributed across a cluster, though the Fortress interpreter currently only executes within a single node. To me, the generator concept is one of the nicer aspects of Fortress.

Guy Steele, who is Fortress Principal Investigator along with Eric Allen, has been working in the programming languages area long enough to know the wisdom of these statements. Watch him live the reality of growing a language in his keynote at the 1998 ACM OOPSLA conference. Be amazed at the cleverness, but listen to the message as well.

The latest version of the Fortress interpreter (source and binary) is available here. If you would like to browse the source code online, do so here.

Some informational pointers. Christine also tells me that the team is working on an overview talk like this one. Except I expect it will be a lot better. :-) Though I only scratched the surface in a superficial way, I hope this brief overview has given you at least the flavor of what Project Fortress is about.

Wednesday Mar 18, 2009

More Free HPC Developer Tools for Solaris and Linux

The Sun Studio team just released the latest version of our HPC developer tools with so many enhancements and additions it's hard to know where to start this blog entry. I suppose with the basics: As usual, all of the software is free. And available for both Solaris and Linux, specifically Solaris, OpenSolaris, RHEL, SuSE, and Ubuntu. Frankly, Sun would like to be your preferred provider for high-performance Fortran, C, and C++ compilers and tools. Given the performance and capabilities we deliver for HPC with Sun Studio, that seems a pretty reasonable goal to me. We think the price has been set correctly to achieve that as well. :-)

I have to admit to being confused by the naming convention for this release, but it goes something like this. The release is an EA (Early Access) version of Sun Studio 12 Update 1 -- the first major update to Sun Studio 12 since it was released in the summer of 2007. Since Sun Studio's latest and greatest bits are released every three months as part of the Express program, this release can also be called Sun Studio Express 3/09. Different names, same bits. Don't worry about it -- just focus on the fact that they make great compilers and tools. :-)

Regardless of what they call it, the release can be downloaded here. Take it for a spin and let the developers know what you think on the forum or file a request for enhancement (RFE) or a bug report here.

For the full list of new features, go here. For my personal list of favorite new features, read on.

  • Full OpenMP 3.0 compiler and tools support. For those not familiar, OpenMP is the industry standard for directives-based threaded application parallelization. Or, the answer to the question, "So how do I use all the cores and threads in my spiffy new multicore processor?"
  • ScaLAPACK 1.8 is now included in the Sun Performance Library! It works with Sun's MPI (Sun HPC ClusterTools), which is based on Open MPI 1.3. The Perflib team has also made significant performance enhancements to BLAS, LAPACK, and the FFT routines, including support for the latest Intel and AMD processors. Nice.
  • MPI performance analysis integrated into the Sun Performance Analyzer. Analyzer has been for years a kick-butt performance tool for single-process applications. It has now been extended to help MPI programmers deal with message-passing related performance problems.
  • Continued, aggressive attention paid to optimizing for the latest SPARC, Intel, and AMD processors. C, C++, and Fortran performance will all benefit from these changes.
  • A new standalone GUI debugger. Go ahead, graduate from printf() and try a real debugger. It won't bite.

As I mentioned above, full details on these new features and many, many more are all documented on this wiki page. And, again, the bits are here.

Thursday Nov 13, 2008

Big News for HPC Developers: More Free Stuff

'Tis the Season. Supercomputing season, that is. Every November the HPC community--users, researchers, and vendors--attend the world's biggest conference on HPC: Supercomputing. This year SC08 is being held in Austin Texas, to which I'll be flying in a few short hours.

As part of the seasonal rituals vendors often announce new products, showcase new technologies and generally strut their stuff at the show and even before the show in some cases. Sun is no exception as you will see if you visit our booth at the show and if you take note of two announcements we made today that should be seen as a Big Deal to HPC developers. The first concerns MPI and the second our Sun Studio developer tools.

The first announcement extends Sun's support of Open MPI to Linux with the release of ClusterTools 8.1. This is huge news for anyone looking for a pre-built and extensively tested version of Open MPI for RHEL 4 or 5, SLES 9 or 10, OpenSolaris, or Solaris 10. Support contracts are available for a fee if you need one, but you can download the CT 8.1 bits here for free and use them to your heart's content, no strings attached.

Here are some of the major features supported in ClusterTools 8.1:

  • Support for Linux (RHEL 4&5, SLES 9&10), Solaris 10, OpenSolaris
  • Support for Sun Studio compilers on Solaris and Linux, plus the GNU/gcc toolchain on Linux
  • MPI profiling support with Sun Studio Analyzer (see SSX 11.2008), plus support for VampirTrace and MPI PERUSE
  • InfiniBand multi-rail support
  • Mellanox ConnectX Infiniband support
  • DTrace provider support on Solaris
  • Enhanced performance and scalability, including processor affinity support
  • Support for InfiniBand, GbE, 10GbE, and Myrinet interconnects
  • Plug-ins for Sun Grid Engine (SGE) and Portable Batch System (PBS)
  • Full MPI-2 standard compliance, including MPI I/O and one sided communication

The second event was the release of Sun Studio Express 11/08, which among other enhancements adds complete support for the new OpenMP 3.0 specification, including tasking. If you are questing for ways to extract parallelism from your code to take advantage of multicore processors, you should be looking seriously at OpenMP. And you should do it with the Sun Studio suite, our free compilers and tools which really kick butt on OpenMP performance. You can download everything--the compilers, the debugger, the performance analyzer (including new MPI performance analysis support) and other tools for free from here. Solaris 10, OpenSolaris, and Linux (RHEL 5/SuSE 10/Ubuntu 8.04/CentOS 5.1) are all supported. That includes an extremely high-quality (and free) Fortran compiler among other goodies. (Is it sad that us HPC types still get a little giddy about Fortran? What can I say...)

The full list of capabilities in this Express release are too numerous to list here, so check out this feature list or visit the wiki.

Wednesday Jul 30, 2008

Fresh Bits: Attention all OpenMP and MPI Programmers!

The latest preview release of Sun's compiler and tools suite for C, C++, and FORTRAN users is now available for free download. Called Sun Studio Express 07/08, this release of Sun Studio marks an important advance for HPC customers and for any customer interested in extracting high performance from today's multi-threaded and multi-core processors. In addition to numerous compiler performance enhancements, the release includes beta-level support for the latest OpenMP standard, OpenMP 3.0. It also includes some nice Performance Analyzer enhancements that support simple and intuitive performance analysis of MPI jobs. More detail on both of these below.

As the industry-standard approach for achieving parallel performance on multi-CPU systems, OpenMP has long been a mainstay of the HPC developer community. Version 3.0, which is supported in this new Sun Studio preview release, is a major enhancement to the standard. Most notably it includes support for tasking, a major new feature that can help programmers achieve better performance and scalability with less effort than previous approaches using nested parallelism. There are a host of other enhancements as well. The OpenMP expert will find the latest specification useful. For those new to parallelism who have stumbled into a maze of twisty passages all alike, you may find Using OpenMP: Portable Shared Memory Parallel Programming to be a useful introduction to parallelism and OpenMP.

A parallel quicksort example, written using the new OpenMP tasking feature supported in Sun Studio Express 07/08

Sun Studio Express 07/08 also includes enhancements for programmers of parallel, distributed applications who use MPI. With this release of Sun Studio Express we have introduced tighter integration with Sun's MPI library (Sun HPC ClusterTools). Sun's Performance Analyzer has been enhanced to include the ability to examine the performance of MPI jobs by viewing information related to message transfers and messaging performance using a variety of visualization methods. This extends Analyzer's already-sophisticated on-node performance analysis capabilities. Some screenshots below give some idea of the types of information that can be viewed. You should note the idea of viewing "MPI states" (e.g. MPI Wait and MPI Work) to get a high level view of the performance of the MPI portion of an application: an ability to understand how much time is spent doing actual work versus sitting in a wait state can motivate useful insights into the performance of these parallel, distributed codes.

A source code viewer window augmented with several MPI-specific capabilities, one of which is illustrated here: the ability to quickly see how much work (or waiting) is performed within a function.

In addition to supporting direct viewing of specific MPI performance issues within an application, Analyzer now also supports a range of visualization tools useful for understanding the messaging portion of an MPI code. Zoomable timelines with MPI events are supported, as is an ability to map various metrics against the X and Y axis of a plotting area to display various interesting characteristics of the MPI run, as shown below.

Just one example of Sun Studio's new MPI charting capabilities. Shown here is a display showing the volume of messages transferred between communicating pairs of MPI processes during an application run.

This blog entry has barely scratched the surface of the new OpenMP and MPI capabilities available in this release. If you are a Solaris or Linux HPC programmer, please take these new capabilities for a test drive and let us know what you think. I know the engineering teams are excited by what they've accomplished and I hope you will share their enthusiasm once you've tried these new capabilities.

Sun Studio Express 07/08 is available for Solaris 9 & 10, OpenSolaris 2008.05, and Linux (SLES 9, RHEL 4) and can be downloaded here.


Josh Simons


« July 2016