Thomas Sterling: The Idea of Clusters
By Josh Simons on Jun 17, 2008
My notes from Thomas Sterling's talk in the Cluster session at ISC 2008 in Dresden.
The Idea of Clusters -- from a personal Beowulf perspective Thomas Sterling Louisiana State University
Where we are and how we got here, the drivers pushing commodity clusters forward, and clusters in the sunset of Moore's Law: they will still be clusters, but they will look different.
Definition of a Commodity Cluster. Distributed/parallel computing system, constructed entirely from commodity subsystems with two major subsystems (compute nodes and system area network.)
Use of Commodity Clusters Science and Engineering, Manufacturing, FInancial, Commerce, and a large role in Search Engines. And clusters dominate the TOP500 list--more than 70% of systems.
Early History of Cluster Highlights. SAGE for NORAD was essentially a cluster built in 1957. Ethernet in 1976. First NOW workstation cluster at UC Berkeley in 1993. Myrinet introduced in 1993. Beowulf in 1993. MPI standard 1994. Gordon-Bell prize for price-performance 1997...
UC Berkeley NOW Project. 32-40 SPARCStation 10 and 20 nodes. ATM interconnect and then later Myrinet. First cluster in the TOP500 list.
On the East Coast, NASA Beowulf Project. Three generations between 1994 and 1996. Wiglaf, Hrothgar, and Hyglac. 16 nodes each. Established the vision of low-cost HPC. Empowerment: Users took control and were no longer at the mercy of vendors.
Standardization of interfaces was an important driver of clustering. PCI standard. Replaced VESA and ISA. Fast and Gigabit Ethernet -- cost effective, multiple vendors, clustering able to directly leverage LAN technology and market. And then Myrinet appeared with low latency (11usec), scalable to thousands of hosts, though at a higher price point than ethernet.
Performance wasn't the best, but more scientists could get their hands on these systems. They could build it themselves and stick it in a closet. With considerable pain and effort, they could get the systems to work better. And the cost-performance was 10X better than vendor solutions.
Open Source Software, while not essential, became a motivator and driver of development of clusters. Allowed customers to build their own cluster software.
PVM was the first message-passing standard. And then came MPI though the community coming together to create a standard. It was a joining of the cluster and MPP communities at the software level--important.
More middleware was needed as clusters became more shared resources. Maui, PBS, etc, were developed as workload management systems with support for MPI. Condor for throughput computing.
Basic Principles: Performance to Cost (low hanging fruit), Flexibility (inmates are in control), and Leverage of Technology Opportunities (scum sucking bottom feeders.)
Key driver today is multi-core. All cluster nodes are now parallel computers. How we manage this is a real issue. InfiniBand is taking hold as price comes down, performance goes up. Heterogeneous accelerators like Clearspeed boards, nVIdia Tesla, AMD FireStream, etc.
There is also the potential of FPGAs. Run 10-100 times slower, but they can show exceptional speedups on certain applications.
New things that may be coming next. 3D packaging, lightweight cores, processors in or near memory (PNM), embedded heterogeneous architectures (combining PNM with streaming architectures), smarter memories (transactional memory.)
Clusters are in a Phase Change. Next phase change may be driven by clusters as we deal with model of computation, operating systems, and in programming models.
Goals of a new model of parallel computation. Address the dominant challenges: latency, overheard, starvation, resource contention, and programmability. ParalleX project held out as an exemplar of an approach that attempts to address these issues.
Clusters at Nanoscale. Clusters are forever. It took 15 years to dominance. Technology pressures will drive dramatic change -- component types, usage models, software stack, and programming methods. And classes of applications are about to go through significan change -- knowledge economy, machine intelligence, dynamic directed graphs.