Friday Feb 12, 2010

Mirror, Mirror

Today is my last day at Sun, so I write to say goodbye to all my friends and colleagues who I will dearly miss.

I believe Oracle plans to maintain all previous blog content, but just in case I have made a copy of this entire blog, which is called, fittingly, the Mirror of the Navel of Narcissus. Yes, even more references to self-indulgence. :-)

Follow the link to find out what I'm doing next!

Monday Feb 01, 2010

DTrace Deep Dive in Boston this Week!

Jim Mauro will be doing a two-hour deep dive on DTrace at this week's NEOSUG (New England OpenSolaris Users Group) meeting. And Shannon Sylvia from Northeastern University will give a talk on using LDOMs and ZFS. The NEOSUG meeting will be held in two locations with the same agenda -- pick the date and location that works best for you. And please do RSVP so we have a rough head count. See below for details.

Where and When:

  • Tues Feb 2nd, 6-9pm, Sun Microsystems Burlington Campus, One Network Drive, Burlington, MA
  • Wed Feb 3rd, 6-9pm, Boston University, Electrical and Computer Engineering Department Photonics Center -- Room PHO 339, 8 Saint Mary's Street, Boston, MA 02215

Registration Required: RSVP to Linda Wendlandt: lwendlandt at


    6:00-6:20: Registration, Pizza and Beverages

    6:20-6:30: Introductions: Peter Galvin, CTO, Corporate Technologies

    6:30-8:30: Solaris Dynamic Tracing - DTrace – Jim Mauro, Principal Engineer, Sun Microsystems

    8:30-9:00: LDOM Domains and ZFS: An example of creating a ZFS bootable root LDOM domain using jumpstart - Shannon Sylvia, Sysadmin, Northeastern University

    9:00 Q&A and Discussion

Also we’ll be giving out official NEOSUG T-Shirts and other trinkets, and copies of the OpenSolaris CD and instruction manual.

The Talks:

Solaris Dynamic Tracing – Dtrace

DTrace is a revolutionary observability tool introduced in Solaris 10, and currently available in all Solaris 10 releases, OpenSolaris, Mac OS X 10.5 and FreeBSD 7.2. DTrace provides unprecedented observability of the kernel and the entire application software stack without requiring code modifications. It is completely dynamic, and introduces zero probe effect when no DTrace probes are enabled.

This talk will introduce the basic components of DTrace - Providers, Probes, Predicates, The D Language, Actions and Subroutines and DTrace variables. We will then dive into examples of DTrace one-liners and scripts that demonstrate the use of DTrace of understanding and root-causing system and application performance issues.

Jim Mauro is a Principal Engineer in Sun Microsystems Systems Group, where he focuses on performance of volume commercial workloads on Sun technology. Jim co-authored Solaris Internals (1s Ed), Solaris Internals (2nd Ed), Solaris Performance and Tools (1st Ed) and is currently working on a DTrace book.

LDOM Domains and ZFS: An example of creating a ZFS bootable root LDOM domain using jumpstart

Using Version 10.1009 of Sun Solaris on a SPARC T5120 with LDOM 1.2, Shannon Sylvia creates guest domains that are each independent of each other. Each guest domain contains its own separately configured operating system and its own virtual disks. Using a “cookbook” approach, new guest domains can be easily added and configured, or removed without affecting the control domain or any of the other guest domains. Each domain is created using ZFS as the root, bootable volume. Shannon will provide examples on how the control domain, the jumpstart/boot server, and the guest domains should be configured.

Shannon Sylvia has 15+ years experience as a Unix Systems Administrator. She is responsible for installing and maintaining Solaris, AIX, and Linux at Northeastern University. In addition, she is an adjunct professor at Northeastern University's College of Professional Studies. She has a strong interest in IT in the health field, and has recently completed 2 1/2 years of nursing school and clinicals. She is currently involved in volunteer work including Salesforce and website development. She earned a bachelor's degree in Computer Science from National University, a bachelor's degree in English from San Diego State University, and a Master's Degree in Computer Information Systems from Boston University.

Monday Jan 25, 2010

Sun Microsystems Alumni

I did a double take yesterday when I received a colleague's invitation to join the Sun Microsystems Alumni Facebook group. Hey, I thought, I haven't left Sun! But then I realized we are all soon leaving Sun one way or another.

Friday Jan 15, 2010

Rest in Peace

Marguerite Handfield Simons
09/17/1937 - 09/11/2009
Julie Simons Droney
10/27/1967 - 12/06/2009

It has been an especially bad time for my family over the last few months with the loss of both my mother and my sister. Thank you everyone for your support.

Barbie's Next Career

While I don't follow her myself, I'm told Barbie has had over 120 "careers" since her introduction in 1959. Well, it is time for her to choose another, and Mattel wants to hear from you. Please vote for Computer Engineer Barbie! That is clearly much cooler than any of the other choices offered. Vote here.

Igniting the Earth's Atmosphere

As part of background research for a blog entry I'm working on, I went looking for the name of the Manhattan Project scientist who was tasked with calculating whether an atomic detonation could ignite the Earth's atmosphere and burn everyone on the planet to cinders. His name was Hans Bethe and he apparently concluded the bomb would not ignite the atmosphere. But according to the Wikipedia article on the Manhattan Project, Edward Teller co-authored a paper that also examined this question.

That paper, Ignition of the Atmosphere with Nuclear Bombs, was declassified in the 1970s and it is available as a PDF for your perusal here. I recommend reading the Abstract on Page 3 and the three concluding paragraphs on Page 18. The final paragraph, which I hereby nominate as a monumental understatement, reads as follows:

"One may conclude that the arguments of this paper make it unreasonable to expect that the N + N reaction could propagate. An unlimited propagation is even less likely. However, the complexity of the argument and the absence of satisfactory experimental foundations makes further work on the subject highly desirable."

Apparently, the "satisfactory experimental foundations" were achieved at Trinity site. Had that gone wrong, it would have brought an entirely new meaning to the term "test coverage."

[This just gets worse: As my friend Monty points out, the paper is dated August 1946. The Trinity detonation occurred a year earlier, in July 1945.]

Virtualization for HPC: The Heterogeneity Issue

I've been advocating for awhile now that virtualization has much to offer HPC customers (see here.) In this blog entry I'd like to focus on one specific use case, heterogeneity. It's an interesting case because while heterogeneity is either desirable or to be avoided, depending on your viewpoint, virtualization can help in either case.

The diagram above depicts a typical HPC cluster installation with each compute node running whichever distro was chosen as that site's standard OS. Homogeneity like this eases the administrative burden, but it does so at the cost of flexibility for end-users. Consider, for example, a shared compute resource like a national supercomputing center or a centralized cluster serving multiple departments within a company or other organization. Homogeneity can be a real problem for end-users whose applications only run on either other versions of the chosen cluster OS or, worse, on completely different operating systems. These users are generally not able to use these centralized facilities unless they can port their application to the appropriate OS or convinced their application provider to do so.

The situation with respect to heterogeneity for software providers, or ISVs -- independent software vendors, is quite different. These providers have been wrestling with expenses and other difficulties related to heterogeneity for years. For example, while ISVs typically develop their applications on a single platform (OS 0 above,) they must often port and support their application on several operating systems in order to address the needs of their customer base. Assuming the ISV decides correctly which operating systems should be supported to maximize revenue, it must still incur considerable expenses to continually qualify and re-qualify their application on each supported operating system version. And maintain a complex, multi-platform testing infrastructure and in-house expertise to support these efforts as well.

Imagine instead a virtualized world, as shown above. In such a world, cluster nodes run hypervisors on which pre-built and pre-configured software environments (virtual machines) are run. These virtual machines include the end-user's application and the operating system required to run that application. So far as I can see, everyone wins. Let's look at each constituency in turn:

  • End-users -- End-users have complete freedom to run any application using any operating system because all of that software is wrapped inside a virtual machine whose internal details are hidden. The VM could be supplied by an ISV, built by an open-source application's community, or created by the end-user. Because the VM is a black box from the cluster's perspective, the choice of application and operating system need no longer be restricted by cluster administrators.
  • Cluster admins -- In a virtualized world, cluster administrators are in the business of launching and managing the lifecycle of virtual machines on cluster nodes and no longer need deal with the complexities of OS upgrades, configuring software stacks, handling end-user special software requests, etc. Of course, a site might still opt to provide a set of pre-configured "standard" VMs for end-users who do not have a need for the flexibility of providing their own VMs. (If this all sounds familiar -- it should. Running a shared, virtualized HPC infrastructure would be very much like running a public cloud infrastructure like EC2. But that is a topic for another day.)
  • ISVs -- ISVs can now significantly reduce the complexity and cost of their business. Since ISV applications would be delivered wrapped within a virtual machine that also includes an operating system and other required software, ISVs would be free to select a single OS environment for developing, testing, AND deploying their application. Rather than basing their operating system choice on market share considerations, the decision could be made based on the quality of the development environment, or perhaps the stability or performance levels achievable with a particular OS, or perhaps on the ability to partner closely with an OS vendor to jointly deliver a highly-optimized, robust, and completely supported experience for end-customers.

Thursday Jan 14, 2010

Sun Grid Engine: Still Firing on All Cylinders

The Sun Grid Engine team has just released the latest version of SGE, humbly called Sun Grid Engine 6.2 update 5. It's a yawner of a name for a release that actually contains some substantial new features and improvements to Sun's distributed resource management software, among them Hadoop integration, topology-aware scheduling at the node level (think NUMA), and improved cloud integration and power management capabilities.

You can get the bits directly here. Or you can visit Dan's blog for more details first. And then get the bits.

Monday Dec 21, 2009

Sun HPC Consortium Videos Now Available

Thanks to Rich Brueckner and Deirdré Straughan, videos and PDFs are now available from the Sun HPC Consortium meeting held just prior to Supercomputing '09 in Portland, Oregon. Go here to see a variety of talks from Sun, Sun partners, and Sun customers on all things HPC. Highlights for me included Dr. Happy Sithole's presentation on Africa's largest HPC cluster (PDF|video), Marc Parizeau's talk about CLUMEQ's Collossus system and its unique datacenter design (PDF|video), and Tom Verbiscer's talk describing Univa UD's approach to HPC and virtualization, including some real application benchmark numbers illustrating the viability of the approach (PDF|video).

My talk, HPC Trends, Challenges, and Virtualization (PDF|video) is an evolution of a talk I gave earlier this year in Germany. The primary purposes of the talk were to illustrate the increasing number of common challenges faced by enterprise, cloud, and HPC users and to highlight some of the potential benefits of this convergence to the HPC community. Virtualization is specifically discussed as one such opportunity.

Thursday Nov 19, 2009

You Put Your HPC Cluster in a...WHAT??

Judging from a quick look at the survey results from this weekend's Sun HPC Consortium meeting in Portland, Oregon, Marc Parizeau's talk was a favorite with both customers and Sun employees.

Marc is Deputy Director of CLUMEQ and a professor at Université Laval in Québec City. His talk, Colossus: A cool HPC tower! [PDF, 10MB], describes with many photos how a 1960s era Van de Graaff generator facility was turned into an innovative, state of the art, supercomputing installation featuring Sun Constellation hardware. Very much worth a look.

A nicely-produced CLUMEQ / Constellation video that describes the creation of this computing facility is also available on YouTube.

Wednesday Nov 18, 2009

Climate Modeling: How much computing required to run a Century Experiment?

Henry Tufo from NCAR and CU-Boulder spoke this weekend at the Sun HPC Consortium meeting here in Portland, OR. As part of his talk, More than a Big Machine: Why We Need Multiple Breakthroughs to Tackle Cloud Resolving Climate [PDF], he estimated the number of floating-point operations (FLOPs) needed to compute a climate model over a one-century time scale with a 1 km atmosphere model.

His answer was the highlight of the Consortium for me: A Century Experiment requires about a mole of FLOPs. :-)

Friday Nov 13, 2009

Thank you, Google

I'm at Logan Airport waiting for my flight to O'Hare and then to Portland, Oregon for Sun's HPC Consortium this weekend and SC09 next week.

Google is sponsoring free wifi access at Logan through January 15th, which is how I'm able to write this blog entry -- I would not usually pay the usual $10 fee since my flight is leaving in only an hour.

After clicking through the landing page to access the Internet, I was redirected to a Give Back site that lets me make a donation to either Engineers Without Borders USA, One Economy Corporation, or Climate Savers Computing. Even better, Google will match any donation I choose to make.

I wanted to make a donation, but I didn't. Why? Because making the donation requires I create a Google Checkout account. I have a Paypal account already and I'm trying to reduce my credit card exposure on the web whenever possible, so I opted not to sign up.

Thursday Nov 12, 2009

Uh, Do You Offer Express Shipping?

On November 3rd, I received an email congratulations about my upcoming 20th anniversary with Sun (for those keeping score at home, the 20 includes some credit for time at Thinking Machines prior to our arrival at Sun) and an invitation to select a commemorative gift of my choice. My immediate thought was that I should place the order immediately, given all the current craziness and future uncertainties. My recognition award arrived via FedEx yesterday. Parrot not included.

(Wondering what's in the box?)

Wednesday Nov 11, 2009

Apple of My Eye

Once again, I am delighted by Apple's customer service.

After having many problems with my original Macbook Pro, which Apple eventually replaced, my system has been stable and problem-free for quite awhile. Until my screen started losing pixels about a month ago.

Every other vertical line on the display became light grey, making it nearly impossible to read the screen. The problem briefly appeared and then disappeared about a month ago, but it happened again last week and stayed broken for over 12 hours despite reboots, PRAM/NVRAM resets, and SMC resets. I made the problem go away eventually by scheduling a Genius appointment at my local Apple store --- the display spontaneously started working again within an hour of making the appointment. But of course! However, not trusting the machine and needing it for an upcoming business trip, I decided to keep my appointment at the Apple store.

Without being able to actually see the problem at the store, the Genius couldn't make an absolute diagnosis, but we both felt the MBP's display was probably flaky. This conclusion was partly influenced by the fact that when I ran the system in dual screen mode, the problem was only visible on the built-in LCD -- the external monitor did not show the problem. While there still might be a logic board(\*) or other problem, I felt comfortable enough to request that the screen (actually, the clamshell assembly -- the top part of the laptop, including the cables that run from the clamshell to various locations on the motherboard) be replaced. Since the MBP was no longer covered by AppleCare, I was going to have to pay for this repair myself.

I learned Apple has two repair programs. I could either opt to have the machine shipped to an Apple repair depot and expect to receive the machine in 7-10 days, shipped directly to my house, or I could have the machine repaired at the Apple store and it would likely be ready the next day if the parts were available. The depot option has a fixed price -- about $300 regardless of what the problem is or what parts need to be replaced. The in-store option is generally more expensive since you pay for the required parts and for labor. In my case, the in-store option would cost about $600 or twice as much as the depot option. What to do? I needed to work on my presentation for an upcoming conference and would be leaving for that conference in seven days. The depot might ship my machine back earlier than 7-10 days, but I'd be taking a risk.

Because I was able to make arrangements to use another laptop, I decided to opt for the cheaper depot option and wait the 7-10 days. Imagine my surprise when I got a call the next afternoon informing me that my repair had been completed. Apple had opted to do the repair in their store and they honored the depot rate I had been quoted. How cool is that?

So far, I've not had a recurrence of the problem. As a side benefit, this new display is much more evenly illuminated than the old one so even in the unlikely event the problem turns out to be something else, my machine has a nice, new LCD display that to me is worth the $300 I've paid so far. Not that I expect the problem to recur, of course.

(\*) If you have this problem with your machine, look carefully at the cursor. Does it seem to "float above" the bad display or is it also affected by the dropped vertical lines? Noticing this can help diagnose the problem, since an unaffected cursor means it is more likely that the problem is either at the logic board or earlier, while an affected cursor pushes the diagnosis more towards the screen/clamshell.

NEOSUG at Boston University TONIGHT!

The New England OpenSolaris User Group is holding its first meeting at Boston University this evening, hosted by the BU Department of Electrical & Computer Engineering. It is open anyone interested in learning more about OpenSolaris -- both students and professionals are welcome. This first meeting features three talks: What's So Cool About OpenSolaris Anyway, OpenSolaris: Clusters and Clouds from your Laptop, and OpenSolaris as a Research and Teaching Tool.

The meeting runs from 6-9pm tonight (Wed, Nov 11th, 2009) at the BU Photonics Center Building. Follow this link for directions, full agenda details, etc. If you think you'll be coming, please RSVP so we have a rough headcount for food.

See you there -- I'm bringing the pizza!


Josh Simons


« February 2015