Friday Jan 15, 2010

Virtualization for HPC: The Heterogeneity Issue

I've been advocating for awhile now that virtualization has much to offer HPC customers (see here.) In this blog entry I'd like to focus on one specific use case, heterogeneity. It's an interesting case because while heterogeneity is either desirable or to be avoided, depending on your viewpoint, virtualization can help in either case.

The diagram above depicts a typical HPC cluster installation with each compute node running whichever distro was chosen as that site's standard OS. Homogeneity like this eases the administrative burden, but it does so at the cost of flexibility for end-users. Consider, for example, a shared compute resource like a national supercomputing center or a centralized cluster serving multiple departments within a company or other organization. Homogeneity can be a real problem for end-users whose applications only run on either other versions of the chosen cluster OS or, worse, on completely different operating systems. These users are generally not able to use these centralized facilities unless they can port their application to the appropriate OS or convinced their application provider to do so.

The situation with respect to heterogeneity for software providers, or ISVs -- independent software vendors, is quite different. These providers have been wrestling with expenses and other difficulties related to heterogeneity for years. For example, while ISVs typically develop their applications on a single platform (OS 0 above,) they must often port and support their application on several operating systems in order to address the needs of their customer base. Assuming the ISV decides correctly which operating systems should be supported to maximize revenue, it must still incur considerable expenses to continually qualify and re-qualify their application on each supported operating system version. And maintain a complex, multi-platform testing infrastructure and in-house expertise to support these efforts as well.

Imagine instead a virtualized world, as shown above. In such a world, cluster nodes run hypervisors on which pre-built and pre-configured software environments (virtual machines) are run. These virtual machines include the end-user's application and the operating system required to run that application. So far as I can see, everyone wins. Let's look at each constituency in turn:

  • End-users -- End-users have complete freedom to run any application using any operating system because all of that software is wrapped inside a virtual machine whose internal details are hidden. The VM could be supplied by an ISV, built by an open-source application's community, or created by the end-user. Because the VM is a black box from the cluster's perspective, the choice of application and operating system need no longer be restricted by cluster administrators.
  • Cluster admins -- In a virtualized world, cluster administrators are in the business of launching and managing the lifecycle of virtual machines on cluster nodes and no longer need deal with the complexities of OS upgrades, configuring software stacks, handling end-user special software requests, etc. Of course, a site might still opt to provide a set of pre-configured "standard" VMs for end-users who do not have a need for the flexibility of providing their own VMs. (If this all sounds familiar -- it should. Running a shared, virtualized HPC infrastructure would be very much like running a public cloud infrastructure like EC2. But that is a topic for another day.)
  • ISVs -- ISVs can now significantly reduce the complexity and cost of their business. Since ISV applications would be delivered wrapped within a virtual machine that also includes an operating system and other required software, ISVs would be free to select a single OS environment for developing, testing, AND deploying their application. Rather than basing their operating system choice on market share considerations, the decision could be made based on the quality of the development environment, or perhaps the stability or performance levels achievable with a particular OS, or perhaps on the ability to partner closely with an OS vendor to jointly deliver a highly-optimized, robust, and completely supported experience for end-customers.

Thursday Dec 18, 2008

Fresh Bits: InfiniBand Updates for Solaris 10

Fresh InfiniBand bits for Solaris 10 Update 6 have just been announced by the IB Engineering Team:

The Sun InfiniBand Team is pleased to announce the availability of the Solaris InfiniBand Updates 2.1. This comprises updates to the previously available Solaris InfiniBand Updates 2. InfiniBand Updates 2 has been removed from the current download pages. (Previous versions of InfiniBand Updates need to be carefully matched to the OS Update versions that they apply to.)

The primary deliverable of Solaris InfiniBand Updates 2.1 is a set of updates of the Solaris driver supporting HCAs based on Mellanox's 4th generation silicon, ConnectX. These updates include the fixes that have been added to the driver since its original delivery, and functionality in this driver is equivalent to what was delivered as part of OpenSolaris 2008.11. In addition, there continues to be a cxflash utility that allows Solaris users to update firmware on the ConnectX HCAs. This utility is only to be used for ConnectX HCAs.

Other updates include:

  • uDAPL InfiniBand service provider library for Solaris (compatible with Sun HPC ClusterTools MPI)
  • Tavor and Arbel/memfree drivers that are compatible with new interfaces in the uDAPL library
  • Documentation (README and man pages)
  • A renamed flash utility for Tavor-, Arbel memfull, Arbel memfree, and Sinai based HCAs. Instead of "fwflash" this utility is rename "ihflash" to avoid possible namespace conflicts with a general firmware flashing utility in Solaris

All are compatible with Solaris 10 10/08 (Solaris 10, Update 6), for both SPARC and X86.

You can download the package from the "Sun Downloads" A-Z page by visiting and scrolling down or searching for the link for "Solaris InfiniBand (IB) Updates 2.1" or alternatively use this link.

Please read the README before installing the updates. This contains both installation instructions and other information you will need to know before running this product.

Please note again that this Update package is for use on Solaris 10/08 (Solaris 10, Update 6) only. A version of the Hermon driver has also been integrated into Update 7 and will be available with that Update's release.

Congratulations to the Solaris IB Hermon project team and the extended IB team for their efforts in making this product available!

Tuesday Jun 17, 2008

Thomas Sterling: The Idea of Clusters

My notes from Thomas Sterling's talk in the Cluster session at ISC 2008 in Dresden.

The Idea of Clusters -- from a personal Beowulf perspective
Thomas Sterling
Louisiana State University

Where we are and how we got here, the drivers pushing commodity clusters forward, and clusters in the sunset of Moore's Law: they will still be clusters, but they will look different.

Definition of a Commodity Cluster. Distributed/parallel computing system, constructed entirely from commodity subsystems with two major subsystems (compute nodes and system area network.)

Use of Commodity Clusters Science and Engineering, Manufacturing, FInancial, Commerce, and a large role in Search Engines. And clusters dominate the TOP500 list--more than 70% of systems.

Early History of Cluster Highlights. SAGE for NORAD was essentially a cluster built in 1957. Ethernet in 1976. First NOW workstation cluster at UC Berkeley in 1993. Myrinet introduced in 1993. Beowulf in 1993. MPI standard 1994. Gordon-Bell prize for price-performance 1997...

UC Berkeley NOW Project. 32-40 SPARCStation 10 and 20 nodes. ATM interconnect and then later Myrinet. First cluster in the TOP500 list.

On the East Coast, NASA Beowulf Project. Three generations between 1994 and 1996. Wiglaf, Hrothgar, and Hyglac. 16 nodes each. Established the vision of low-cost HPC. Empowerment: Users took control and were no longer at the mercy of vendors.

Standardization of interfaces was an important driver of clustering. PCI standard. Replaced VESA and ISA. Fast and Gigabit Ethernet -- cost effective, multiple vendors, clustering able to directly leverage LAN technology and market. And then Myrinet appeared with low latency (11usec), scalable to thousands of hosts, though at a higher price point than ethernet.

Performance wasn't the best, but more scientists could get their hands on these systems. They could build it themselves and stick it in a closet. With considerable pain and effort, they could get the systems to work better. And the cost-performance was 10X better than vendor solutions.

Open Source Software, while not essential, became a motivator and driver of development of clusters. Allowed customers to build their own cluster software.

PVM was the first message-passing standard. And then came MPI though the community coming together to create a standard. It was a joining of the cluster and MPP communities at the software level--important.

More middleware was needed as clusters became more shared resources. Maui, PBS, etc, were developed as workload management systems with support for MPI. Condor for throughput computing.

Basic Principles: Performance to Cost (low hanging fruit), Flexibility (inmates are in control), and Leverage of Technology Opportunities (scum sucking bottom feeders.)

Key driver today is multi-core. All cluster nodes are now parallel computers. How we manage this is a real issue. InfiniBand is taking hold as price comes down, performance goes up. Heterogeneous accelerators like Clearspeed boards, nVIdia Tesla, AMD FireStream, etc.

There is also the potential of FPGAs. Run 10-100 times slower, but they can show exceptional speedups on certain applications.

New things that may be coming next. 3D packaging, lightweight cores, processors in or near memory (PNM), embedded heterogeneous architectures (combining PNM with streaming architectures), smarter memories (transactional memory.)

Clusters are in a Phase Change. Next phase change may be driven by clusters as we deal with model of computation, operating systems, and in programming models.

Goals of a new model of parallel computation. Address the dominant challenges: latency, overheard, starvation, resource contention, and programmability. ParalleX project held out as an exemplar of an approach that attempts to address these issues.

Clusters at Nanoscale. Clusters are forever. It took 15 years to dominance. Technology pressures will drive dramatic change -- component types, usage models, software stack, and programming methods. And classes of applications are about to go through significan change -- knowledge economy, machine intelligence, dynamic directed graphs.


Josh Simons


« February 2017