Power Outages in Los Angeles and Buffalo
By marchamilton on Aug 29, 2005
Someone at Buffalo clearly cares about power, because their web site has some nice pictures of the power cables being laid out in the computer room. According to Buffalo's aptly named hotpages, their cluster has a total of 800 Dell SC1425 compute nodes each with two 3.2 GHz Xeon processors. For whatever reason, Intel makes it rather difficult to find information on CPU power usage on their web site, but Dell has this nice Power Calculator on their site which shows the SC1425 uses 437 Watts, which works out to about 350 KWatts for the whole lot of 800. That doesn't count the Myrinet, Fiber Channel, or Gigabit Ethernet switches used in the cluster, or the power needed to cool the system, but lets ignore that for the time being. If Buffalo only has enough power to run the cluster at 60% of capacity, lets assume they have 210 KWatts available for compute nodes.
Usually, when someone buys a 1600 CPU cluster, one of their goals is to get on the Top500 list. To qualify for the Top500 list, you need to run a benchmark called Linpack. There are two figures reported in the Top500 list. The first is a simple calculation called Rpeak which is the maximum theoritical number of floating point operations per second. For Dell's SC1425 server, the figure is calculated as 3.2 GHz \* 2 CPUs/server \* 2 floating point units/CPU = 12.8 GFlops. For 800 servers you get a Rpeak of 10.24 TFlops. Now lets look at Sun's V20z with dual core AMD Opteron CPUs. A single Sun Fire V20z has an Rpeak of 2.2 GHz \* 2 CPUs \* 2 cores/CPU \* 2 floating point units/core = 17.6 GFlops. The same RPeak value as the 800 node Dell cluster could thus be obtained by 582 V20z servers.
Of course Rpeak is only a theoritical maximum, so the Top500 rankings are actually based on a second number, Rmax, which is the measured throughput using the Linpack benchmark. As can be seen by browsing the Top500 list, Rmax varies widely, with most systems achieving an Rmax of between 50% and 70% of Rpeak. Since Linpack codes have been tuned for many years, the Rmax efficiency is often higher than actual user codes would achieve. Many customer codes, when first run at our High Performance Computing Center in Hillsboro Oregon start out achieving 20% or less of RPeak. Thus, while Top500 is an interesting list, Sun recommends that customers with specifc processing requirements either benchmark their actual code or at least use multiple industry standard benchmarks, like those from the Standard Performance Evaluation Corporation, commonly referred to as SPEC. In published results, AMD's dual core x64 processors typically show 2x the floating point performance of comparitive Intel x64 CPUs while using less power. No wonder Intel has not responded to AMD's recent challenge to a duel. In addition, unlike Intel, AMD makes it very simple to find the max power usage of any of their CPUs.
So back to Buffalo's problem. How are they going to run their Top500 benchmark if they can't turn on all their systems? Assuming they aren't able to afford a new power transformer (I doubt that was figured in the $2.3M purchase price of the Dell cluster), what can they do? They could wait for winter and save power by turning off some of their air conditioners. That might work in December, but the Top500 benchmark is due October 1. Even Buffalo doesn't get that cold in October. Would they have enough power if they replaced their 800 Dell servers with 582 Sun Fire V20z servers? Each of the V20z's dual core CPUs uses 95 watts max. Add in power for the server's memory, disk, and other components and a more typical system power consumption is about 325 watts. So 582 nodes \* 325 watts = 190 KWatts, comfortably under the 210 KWatts calculated above as being available! A few less Myrinet, Fiber Channel, and Gigabit Ethernet switches would also be needed, making further power available. If Buffalo would like us to size the system based on actual application performance, we would be happy to benchmark their code and I expect would be able to get the same performance as the 800 Dell systems with fewer than 582 nodes, saving even more power.
Now let's look at what the total cost of ownership (TCO) difference would be had Buffalo gone with 582 Sun Fire V20z compute nodes instead of the 800 Dell boxes. Buffalo didn't break down the $2.3M price, but lets just call the acquisition costs for the servers equal. Lets look at the other components needed by the 218 extra Dell servers:
7 extra racks @ average $5K = $35K
218 extra Myrinet cards + switch ports @ average $1K = $218K
For simplicity, lets ignore the other extra components except for the power cost. The Dell system would require 24 \* 365 \* 437 Watt \* 800 = 3062 MWatt/year.
The Sun system would require 24 \* 365 \* 325 Watt \* 582 = 1657 MWatt/year.
I'm not sure what University of Buffalo pays for power today, but I found this 2002 article explaining how the university was going to save $70,000 a year by self-generating 2000 MWatts/year. I expect electricity prices have gone up since the 2002 date of the article, but using those figures the three year savings of 4215 MWatt hours would be at least an additional $147,525.
Since the university is a good Sun customer, I'll spare stating the obvious. However, this story is a great illustration that performance/watt is becoming increasingly important and you can no longer calculate simple acquisition price/performance without looking at your total cost of ownership. Our high performance computing group at Sun has architected 800 node and larger clusters for many academic, research, and commercial customers, including several universities and financial institutions right in New York. It is a shame the university can't use all their new Dell systems because of lack of power. However, if the university doesn't want to wait until winter to turn on all those Dell space heaters, and doesn't want to blow next year's faculty salary increase budget on a new power generator, they might want to check out our Dell trade in allowance and attend next month's Network Computing Launch where we will announce even more new x64-based systems.