Sunday Sep 28, 2008

Sun xVM Ops Center at TACC

In a previous blog, I talked about xVM Ops Center's early deployment at Texas Advanced Computing Center (TACC) in Austin, Texas. When we first started at TACC, late last year, we deployed xVM Ops Center 1.0 Early Access (EA) version. Recently, we upgraded to Ops Center 1.1.1 at TACC.

Ops Center monitors the 3936 four-socket blades of the TACC supercomputer known as Ranger. Physically, the compute blades occupy 82 racks that are spread across 6 aisles. Dongju Choi at TACC wrote a set of scripts that extract sensor data every 15 minutes from Ops Center and then plots it against a 2D matrix that represents the physical layout of Ranger. Thanks DJ !

Attached are some sample charts that show system fan speed (per blade) and system temperature (averaged per rack). Pretty neat !

Sunday Jun 22, 2008

xVM Ops Center on Linux

The recently shipped Sun xVM Ops Center 1.1 enables to run management tiers (Satellite and Proxy) on Linux, in addition to Solaris that came with xVM Ops Center 1.0. At the recently concluded International Supercomputing conference (IS08) in Dresden, Germany, Sun announced Sun xVM Ops Center 1.1 along with the availability of a number of hardware and software for the HPC market, including

  • the 72 port DDR Infiniband Sun Data Center Switch 3x24,
  • the 4 socket Sun Blade X6450 Intel Blade capable of delivering 7.37 Tflops peak performance in a fully populated 6048 chassis,
  • an integrated Sun HPC Software Linux Edition 1.0,
  • an integrated toolkit for developers Sun HPC Cluster Tools 8.0 containing the latest Open MPI 1.3, and
  • Sun's scalable visualization software, Sun Visualization Software 1.1.

All of the above were displayed at the ISC08 Sun booth last week and received lots of interest from customers and partners.

 Here is a screenshot from our demo with xVM Ops Center 1.1 installed on CentOS 5

And another screenshot with xVM Ops Center provisioning Sun Grid Engine 6.1 

See here and here for detailed instructions on provisioning CentOS 5 and Sun Grid Engine 6.1, respectively.

Wednesday Jun 11, 2008

See you at International Supercomputing' 08

After posting this blog entry, I pack my bags and head to the second biggest High Performance Computing event of the year, International Supercomputing' 08, in Dresden, Germany.

As is the tradition, we host The Sun HPC Consortium few days prior to the conference. The Consortium meetings are typically packed with attendees ranging from Sun HPC customers and partners, Sun field and product engineers, as well as marketing folks. Consortium agenda this year includes talks by Sun customers and partners, as well as Sun product engineering and marketing teams.

Last year, was my first time at the Consortium. It was held before Supercomputing' 07 in Reno, Nevada, USA. I found it to be an awesome opportunity to talk directly with customers on business problems they are solving with HPC clusters and their requirements for solutions to manage these clusters at peak performance and maximum thruput. Another fascinating aspect about last year's Consortium and Conference were Customer Whisper Suites, where key customers get a peek (hence probably the term "whisper" ) at our strategic roadmap and direction and we get a peek at business problems on their long-term radar. At some of these sessions, it was fascinating to hear how customers view HPC as providing them a competitive advantage in their market and more importantly, their plans for building the next generation computing infrastructure today, for the competitive edge they will achieve several years down the road. I look forward to Whisper Suites this year too.

After the consortium ends, the Supercomputing conference begins. As usual, Sun has many exciting events and announcements planned this year. Don't miss the Sun booth, where we will display our latest and greatest hardware and software technologies and solutions. You definitely don't want to miss Sun xVM Ops Center 1.1 demo at the Sun booth showing Ops Center server on Red Hat Linux and showcases among many features, provisioning of (a) Firmware (b) CentOS 5 Operating System and (c) Sun's best-of-class Distributed Resource Management (DRM) product, Sun Grid Engine 6.1

Hope to see you at ISC08 !

Wednesday May 21, 2008

xVM Ops Center at KISTI

In March 2007, Sun Korea was selected as the vendor to provide Massively Parallel Processing (MPP) for the fourth supercomputer project at KISTI (Korea Institute of Science and Technology Information). I understand this project will be completed in 2 phases. When both phases are completed, this supercomputer at KISTI will be the largest in Asia.  Recently, I found out that Phase 1 is completed and the system will go into production after completion of the remaining stability tests.

Now, here is the cool thing. With TACC, Sun built the largest supercomputer in America  and with KISTI, we will repeat that feat in Asia !!

And guess what, both supercomputers have xVM Ops Center in their software stack, to manage the compute nodes. The scale managed by xVM Ops Center in these two HPC supercomputer projects is vastly different though. At TACC, the number of compute nodes is a little shy of 4000 blades housed in 82 Sun Constellation (C48) racks, whereas at KISTI in Phase 1, we have four C48 racks that amount to 188 compute nodes.



Saturday Mar 08, 2008

TACC - Sun xVM Ops Center Early Adopter

In February 2008, we announced General Availability of Sun xVM Ops Center, a highly scalable data center management tool that will allow customers to greatly automate time consuming, routine system administration tasks, such as firmware updates, bare-metal operating-system provisioning, patching and updating - making it easier to manage thousands of IT assets. Scaling Ops Center outside our limited-scale lab environment was vitally important to it's success and to ensure it holds up to massive scale that Ops Center will be deployed to and used in real world HPC environments. For this, we were fortunate to have, as one of xVM Ops Center's early adopters, today's fastest supercomputer in the world at Texas Advanced Computing Center (TACC) in UT Austin.

The 504 TeraFlop Ranger supercomputer at TACC is Sun's Constellation system architecture at its finest moment, consisting of 82 compute racks, 12 management racks and 2 switches, neatly laid out in rows (see figure to the left), in a data center approximately half the size of a basketball court !!

Each compute rack consists of 48 Sun 4-socket blades, each socket populated by a quad core 2.0 GHz AMD Barcelona processor. Management racks consists of Thumpers (X4500) and X4600M2's. Two switches are Sun's 3456-port Magnum and system interconnect is Infiniband using Mellanox's Infiniband cards.

Ops Center is installed on two of the X4600M2's in one of the management racks. Each X4600M2 is a 4-socket, dual core AMD, 16GB Memory and 2 x 70GB disks. Both Ops Center installations have satellite and proxy collocated on their respective X4600M2 box. Both instances of Ops Center are currently configured to manage 15 racks (720 blades and corresponding chassis). Short-term plan is to implement the entire Ranger (all compute nodes in 82 racks and management nodes) across four X4600M2's. The rationale is to get Ops Center in use at TACC for some time, while work on scaling to a single or two X4600M2.

TACC became an early adopter in October 2007. xVM Ops Center beta engineering builds were used at TACC to discover and inventory Constellation chassis and Blade service processors, collect basic system telemetry, perform lights-out management tasks such as power and indicator light cycling and provision latest firmware to Thumpers and Sun Blades. These test drives on Ranger with early Ops Center bits helped validate the product's scalability as we found and fixed issues in chassis-blade association during power cycling, Mbean normalization, job weights, server grouping and locking & synchronization.

We are currently a little more than one-third of the way there (= ~ 30 out of 82 racks; ~ 1440 out of 3936 blades). There is still work to be done. With the recent release of Sun xVM Ops Center 1.0, our team is gearing up to scale the remaining two-thirds, to become the first HPC cluster management product to scale today's highest Supercomputing peak !!

 Stay tuned for more updates as we continue this fascinating journey.

Tuesday Mar 04, 2008

Come, Participate in Ops Center Extensibility Projects

Have you checked out recently. We have recently added a couple candidate projects aimed at extending the functional reach of Sun xVM Ops Center and enable interoperability with other popular solutions, particularly in the HPC space.

Let me briefly introduce these two integration projects. 

  • Integration of Sun Grid Engine (SGE) and Sun xVM Ops Center. SGE is based on the open source Grid Engine project that provides policy-based workload management and dynamic provisioning of application workloads. Grid Engine open source software is used at thousands of HPC sites worldwide.
  • Integration of Rocks and Sun xVM Ops Center. Rocks is an open source Linux cluster distribution that enables end users to easily build computational clusters, grid endpoints and visualization tiled-display walls. Rocks' popularity in the HPC community is evidenced by several Rocks deployments around the world.
More information on these projects and how to get involved is available by clicking on the 3rd category in the left pane titled "Integration Projects" here. Look forward to working with you on these projects

Monday Feb 25, 2008

Blades management with Sun xVM Ops Center

Recently we put together a cool demo of Sun xVM Ops Center's Sun Blade 6000 management capabilities for a large HPC financial customer. The demo showcased Ops Center's full suite of lifecycle management capabilities with X6250 blades including discovery, lights-out management such as power and locator lights cycling, firmware upgrade and provisioning of Solaris 10 Update 4 operating system.

Other really cool aspects in the demo are features not available in Sun's previous Systems Management product. Most notably among them are:

  • Firmware upgrade thru the "Compliance" feature of Sun xVM Ops Center. This "compliance report" feature in firmware provisioning brings a high degree of automation in the not-so-glamorous but critical step of server lifecycle management in data centers, i.e., firmware provisioning.
  • OS provisioning by leveraging Jumpstart thru JET (Jumpstart Enterprise Toolkit). Enabling serial port for remote console during OS provisioning and grabbing MAC addresses off managed nodes to facilitate OS provisioning are other features in the demo that the customer was interested in.

If you are at Immersion Week , you should check this demo out at the two BOF sessions on Monday and Thursday evenings. If you can't make it to immersion week, no sweat. We are working on getting a richer version of this blades management demo to a Sun Solution Center near you.

Hope you like it.

Sunday Nov 11, 2007

Sun Constellation HPC Software (CHS) Stack

Big day for Sun HPC Software today. At Day 2 of the HPC Consortium in Reno (prelude to Supercomputing 07), Sun announced a pre-built, tested and fully supported HPC Software stack on Solaris and Linux !! This will be available in Spring 2008. The stack consists of all Sun's HPC software components such as xVM Ops Center, Sun Studio, Cluster tools, Cluster File systems, Grid Engine, and more. Based on customer needs, other open source components can also be integrated into the stack. Requirements for this stack will be primarily driven by customer input and experiences. The CHS stack will also be available with a test suite that will allow customers to validate the stack in their own environments. Training and support are key focus areas for CHS stack program. Sun will certify CHS stack on Sun hardware (turnkey HPC solutions) and will make the certification tools available for partner solutions and customer driven deployments. The CHS team will have consultancy resources with Linux kernel and IB expertise. This is HUGE win for Sun HPC customers.

Friday Nov 09, 2007

Supercomputing 07, Reno, Nevada, USA

Last time I attended this conference was as a grad student doing research on parallel algorithms and numerical computing. Just landed in Reno. As I was driving to my hotel I couldn't help thinking how Reno, over the next week, will be swarming with some of world's brightest scientists talking about PetaFLOPS and MIPS, intermingled with the regulars to this town that talk poker and blackjack. Some interesting discussions will be happening in Reno bars this weekend. "Can you program these slot machines to better my odds of winning the jackpot" "Let me explain to you how probability works ...."

I ran into Bjorn Andersson (HPC Growth Target owner) on my flight to Reno and we had some interesting discussions about HPC; it's history, where we are today with our systems and positioning in HPC market. We also talked about our cluster management product, xVM Ops Center, and the positive feedback I got from my pitch to Tech Ambassadors yesterday. Bjorn feels that we are in a much much better position today compared to the same time last year (Supercomputing 06 in Florida, USA). We have great products, compelling strategy, key design wins and executive sponsorship to highly perform.

Somehow, the timing feels right to me too, in terms of having number of pieces in the HPC puzzle in just the right place, for Sun to drive success in HPC - incredibly cool systems, Magnum IB, Studio tools, cluster file systems, MPI cluster tools, shared visualization tools, Solaris and Linux, Grid Engine and xVM Ops Center !! Saved the best for last :-)


Prasad Pai's Weblog


« February 2017