Wednesday May 21, 2008

xVM Ops Center at KISTI

In March 2007, Sun Korea was selected as the vendor to provide Massively Parallel Processing (MPP) for the fourth supercomputer project at KISTI (Korea Institute of Science and Technology Information). I understand this project will be completed in 2 phases. When both phases are completed, this supercomputer at KISTI will be the largest in Asia.  Recently, I found out that Phase 1 is completed and the system will go into production after completion of the remaining stability tests.

Now, here is the cool thing. With TACC, Sun built the largest supercomputer in America  and with KISTI, we will repeat that feat in Asia !!

And guess what, both supercomputers have xVM Ops Center in their software stack, to manage the compute nodes. The scale managed by xVM Ops Center in these two HPC supercomputer projects is vastly different though. At TACC, the number of compute nodes is a little shy of 4000 blades housed in 82 Sun Constellation (C48) racks, whereas at KISTI in Phase 1, we have four C48 racks that amount to 188 compute nodes.



Saturday Mar 08, 2008

TACC - Sun xVM Ops Center Early Adopter

In February 2008, we announced General Availability of Sun xVM Ops Center, a highly scalable data center management tool that will allow customers to greatly automate time consuming, routine system administration tasks, such as firmware updates, bare-metal operating-system provisioning, patching and updating - making it easier to manage thousands of IT assets. Scaling Ops Center outside our limited-scale lab environment was vitally important to it's success and to ensure it holds up to massive scale that Ops Center will be deployed to and used in real world HPC environments. For this, we were fortunate to have, as one of xVM Ops Center's early adopters, today's fastest supercomputer in the world at Texas Advanced Computing Center (TACC) in UT Austin.

The 504 TeraFlop Ranger supercomputer at TACC is Sun's Constellation system architecture at its finest moment, consisting of 82 compute racks, 12 management racks and 2 switches, neatly laid out in rows (see figure to the left), in a data center approximately half the size of a basketball court !!

Each compute rack consists of 48 Sun 4-socket blades, each socket populated by a quad core 2.0 GHz AMD Barcelona processor. Management racks consists of Thumpers (X4500) and X4600M2's. Two switches are Sun's 3456-port Magnum and system interconnect is Infiniband using Mellanox's Infiniband cards.

Ops Center is installed on two of the X4600M2's in one of the management racks. Each X4600M2 is a 4-socket, dual core AMD, 16GB Memory and 2 x 70GB disks. Both Ops Center installations have satellite and proxy collocated on their respective X4600M2 box. Both instances of Ops Center are currently configured to manage 15 racks (720 blades and corresponding chassis). Short-term plan is to implement the entire Ranger (all compute nodes in 82 racks and management nodes) across four X4600M2's. The rationale is to get Ops Center in use at TACC for some time, while work on scaling to a single or two X4600M2.

TACC became an early adopter in October 2007. xVM Ops Center beta engineering builds were used at TACC to discover and inventory Constellation chassis and Blade service processors, collect basic system telemetry, perform lights-out management tasks such as power and indicator light cycling and provision latest firmware to Thumpers and Sun Blades. These test drives on Ranger with early Ops Center bits helped validate the product's scalability as we found and fixed issues in chassis-blade association during power cycling, Mbean normalization, job weights, server grouping and locking & synchronization.

We are currently a little more than one-third of the way there (= ~ 30 out of 82 racks; ~ 1440 out of 3936 blades). There is still work to be done. With the recent release of Sun xVM Ops Center 1.0, our team is gearing up to scale the remaining two-thirds, to become the first HPC cluster management product to scale today's highest Supercomputing peak !!

 Stay tuned for more updates as we continue this fascinating journey.

Sunday Nov 11, 2007

Sun Constellation HPC Software (CHS) Stack

Big day for Sun HPC Software today. At Day 2 of the HPC Consortium in Reno (prelude to Supercomputing 07), Sun announced a pre-built, tested and fully supported HPC Software stack on Solaris and Linux !! This will be available in Spring 2008. The stack consists of all Sun's HPC software components such as xVM Ops Center, Sun Studio, Cluster tools, Cluster File systems, Grid Engine, and more. Based on customer needs, other open source components can also be integrated into the stack. Requirements for this stack will be primarily driven by customer input and experiences. The CHS stack will also be available with a test suite that will allow customers to validate the stack in their own environments. Training and support are key focus areas for CHS stack program. Sun will certify CHS stack on Sun hardware (turnkey HPC solutions) and will make the certification tools available for partner solutions and customer driven deployments. The CHS team will have consultancy resources with Linux kernel and IB expertise. This is HUGE win for Sun HPC customers.

Prasad Pai's Weblog


« April 2014