Wednesday Nov 11, 2009

NEOSUG at Boston University TONIGHT!

The New England OpenSolaris User Group is holding its first meeting at Boston University this evening, hosted by the BU Department of Electrical & Computer Engineering. It is open anyone interested in learning more about OpenSolaris -- both students and professionals are welcome. This first meeting features three talks: What's So Cool About OpenSolaris Anyway, OpenSolaris: Clusters and Clouds from your Laptop, and OpenSolaris as a Research and Teaching Tool.

The meeting runs from 6-9pm tonight (Wed, Nov 11th, 2009) at the BU Photonics Center Building. Follow this link for directions, full agenda details, etc. If you think you'll be coming, please RSVP so we have a rough headcount for food.

See you there -- I'm bringing the pizza!

Wednesday Jul 01, 2009

Run an HPC Cluster...On your Laptop

With one free download, you can now turn your laptop into a virtual three-node HPC cluster that can be used to develop and run HPC applications, including MPI apps. We've created a pre-configured virtual machine that includes all the components you need:

Sun Studio C, C++, and Fortran compilers with performance analysis, debugging tools, and high-performance math library; Sun HPC ClusterTools -- MPI and runtime based on Open MPI; and Sun Grid Engine -- Distributed resource management and cloud connectivity

Inside the virtual machine, we use OpenSolaris 2009.06, the latest release of OpenSolaris, to create a virtual cluster using Solaris zones technology and have pre-configured Sun Grid Engine to manage it so you don't need to. MPI is ready to go as well---we've configured everything in advance.

If you haven't tried OpenSolaris before, this will also give you a chance to play with ZFS, with DTrace, with Time Slider (like Apple's Time Machine, but without the external disk) and a host of other cool new OpenSolaris capabilities.

For full details on Sun HPC Software, Developer Edition for OpenSolaris check out the wiki.

To download the virtual image for VMware, go here. (VirtualBox image coming soon.)

If you have comments or questions, send us a note at hpcdev-discuss@opensolaris.org.


Tuesday Jun 02, 2009

Building Packages for OpenSolaris: Easier than Ever

In a previous entry I documented in detail how I contributed an open-source package (Ploticus) to OpenSolaris using SourceJuicer, starting with how to write a spec file and ending with the inclusion of the package in the contrib repository. In truth, at the time I published the information I had not actually taken the last step to promote the package from the pending repository to the contrib repository due to a Ploticus bug I discovered during testing. Ploticus ran okay, but it was not configured as I had wanted. It took me some time to create appropriate patch files, rebuild the package, re-test it, etc.

In retrospect, I'm glad I was delayed because in the meantime OpenSolaris 2009.06 and SourceJuicer 1.2.0 were both released, which gave me a chance to see if any improvements had been made in the contribution process. I am happy to report that improvements were definitely made. Read on for details.

Most important, SourceJuicer documentation has been much improved. See, for example, How to Use OpenSolaris SourceJuicer for a good overview of the submission process. In addition, the short (9 min) video below, which walks through the mechanics of submitting files using SourceJuicer, is also an excellent resource:

SourceJuicer itself has also been improved significantly with this latest release. For example, it is now possible to delete a submitted file if it is no longer needed---I was able to use SourceJuicer 1.2.0 to remove an incorrect copyright file I had created when I first submitted Ploticus. While I appreciated that improvement, I found the following much more intriguing:

The screendump above shows the results of recent SourceJuicer builds, including Ploticus. I was happy to see Ploticus built successfully with the patches I had created on my first try. I was also curious about the implied promise of the new Install column. Since I next wanted to install and test this latest package on my 2009.06 system, I clicked on the Install link. And saw this:

Hey, cool. Firefox knows it should invoke the Package Manager to handle my request. How? With OpenSolaris 2009.06 we've enhanced the Package Manager to support a web installer mode and created a new mime type (application/vnd.pkg5.info) to pass package installation requests from a web page to Package Manager. This works from any web browser so long as the web server is configured to handle .p5i files correctly. See John Rice's blog entry on 2009.06 Package Manager enhancements for more details.

I clicked OK and then saw:

Package Manager promises to not only install the requested package, but to automatically add the required repository to my configuration as well. Surely it can't be this simple. I clicked on Proceed:

Apparently, it can be that simple. :-)

I've now tested my patched version of Ploticus on 2009.06 and requested the package be promoted to contrib by sending a note to sw-porters-discuss@opensolaris.org. I'm hopeful Ploticus will soon be available to the entire OpenSolaris community.

Wednesday May 27, 2009

CommunityOne 2009: Taking the Plunge with OpenSolaris Deep Dives

I was hoping to attend CommunityOne in San Francisco next week (June 1-3), but I'll be beavering away here in Boston instead. C1 is the big, blow-out community event that covers all things OpenSolaris for the technical crowd --- developers and users -- with piles of technical sessions, lightning talks, labs and a host of other activities.

There are several registration options, including one free option that gives you access to two Deep Dive technical tracks on Tuesday as well as some free sessions on Monday. The Tuesday tracks are Developing IN OpenSolaris and Deploying OpenSolaris in Your Datacenter. Topics covered:

If you are interested in dropping by the Moscone Center next Monday or Tuesday for these tech talks, complete the free registration here. For details on the entire C1 event, see the event website or the wiki.

Monday May 25, 2009

SourceJuicer: How to contribute a package to OpenSolaris

[UPDATE: A few small errors fixed and some clarifications added. See Comments for details.]

I tried recently to add a package to the OpenSolaris contrib repository, but quickly learned I didn't have enough packaging experience to understand the directions provided at SourceJuicer so I did some homework, asked some questions, and eventually did successfully contribute a package. I've documented in this entry everything I've learned hoping it will be helpful to others who want to build and submit OpenSolaris packages. Specifically, I'll describe how I wrote the spec file for Ploticus (my favorite open source plotting/graphing utility) and how I submitted the package to OpenSolaris.

I used SourceJuicer to submit my package because it is the easiest way for a community member to contribute. Before getting into details, a few words about the overall submission process. Packages are first submitted to the pending repository, which is basically a holding area for packages on their way to the contrib repository, the primary repository for community-contributed packages. Once a package has been validated and successfully built, it can then be moved into /contrib. I'll cover all of this below.

On to the details.

To submit a package to SourceJuicer, you need to supply two files: a text file containing copyright information and a spec file. The spec file contains the information SourceJuicer needs to create a final binary package starting from source code. Ideally the OpenSolaris package will be buildable from the standard, community-released source code without changes, which may require asking the community to adopt changes necessary to build the code for OpenSolaris. In practice, this will often not be necessary since many packages are designed to build on several Unix versions. In cases where changes must be made and those changes have not been accepted by the community, it is possible to specify patches that should be applied to the community source code during the build process. Though not desirable, it is sometimes necessary to do this. I'll supply pointers to information on how to do this below.

Spec files are not an OpenSolaris invention--they have been used for a long time to build RPM packages. This is good news because there are several excellent web resources that document spec files in detail. I recommend Maximum RPM by Edward Bailey as a detailed reference. One complication: It seems that OpenSolaris spec files are not exactly the same as RPM spec files. However, for the purposes of this exercise, don't worry about this -- the Ploticus example below should give you enough information to create a valid OpenSolaris spec file in most cases. However, if you insist on worrying, you can read the information I found here and here. If anyone knows of a better explanation of the differences, let me know and I will include a pointer here.

Okay, lets get to it. I started with a spec file template and created the following file for Ploticus. My commentary includes all of the tips and other information I discovered during the process of writing the spec file for this particular open source package. While I've attempted to give pointers to additional information throughout, this is not meant to be the definitive guide to the full capabilities of spec files. There should, however, be enough information here to allow typical open source apps to be packaged and contributed to OpenSolaris. Consult Maximum RPM for additional details.

spec filecommentary
#
# spec file for package: ploticus
#
# This file and all modifications and additions to the pristine
# package are under the same license as the package itself.
#
# include module(s): ploticus
#
This is all boilerplate commentary. Insert the name of your package twice.

%include Solaris.inc

Required for all OpenSolaris packages. For the curious, the source is here.
Name: ploticus

Once you specify the name of your package, you can use the macro %{name} to refer to it later in the spec file. As you will see below, there are other predefined macros available that you will use to write your spec file. You can also define your own macros using the syntax:

%define macro_name macro definition

Summary: ploticus -- creates plots, charts, and graphics from data

Summary is a one-line description of the package that will be displayed by the OpenSolaris Package Manager.
Version: 2.41
The version number can be referenced as %{version} later in the spec file, which can often be used to generalize file and directory names. In the case of Ploticus the version number string (e.g. "2.41") happens not to be used as part of its filenames (e.g. ploticus241src) so I do not use %{version} in this example, except in one instance of boilerplate.
License: GPLv2
Free text field describing code's open source license. I've seen all of these used: GPL, GPLv2, GPLv3, BSD, LGPLv2.1, New BSD License. If GPL, be explicit if you can: GPLv2 or GPLv3. The "or later" licenses might be appropriate as well, e.g. GPLv2-or-later, GPLv3-or-later, etc. There is a nice discussion here about the pros and cons of "or later" licenses.
Source: http://voxel.dl.sourceforge.net/sourceforge/ploticus/pl241src.tar.gz

The source tag specifies the location of the source-code tarball (possibly gzip'ed) that should be downloaded to build the package. Because Ploticus is hosted on sourceforge I had to specify a manual download URL rather than that of the automated download site (downloads.sourceforge.net.)

Note that the source location can also be specified as an ftp:// address.

URL: http://ploticus.sourceforge.net

The open-source community's web address.
Group:  Applications/Graphics and Imaging
The group tag describes the kind of software in the package and will be used by the OpenSolaris Package Manager to categorize the package hierarchically. I chose a group name based on the package classifications listed here.
Distribution:	OpenSolaris
Vendor: OpenSolaris Community

%include default-depend.inc

Boilerplate.
BuildRequires: SUNWxorg-headers, SUNWzlib, SUNWgcc

These are other OpenSolaris packages that must be available on the build system in order to correctly create the binary package. In this case, I am building Ploticus with X-Windows capabilities, so I need to ensure the X client header files are available. I am also enabling a Ploticus compression option so zlib is needed as well. And, to be safe, I've specified which compiler is required. I could have used Sun Studio, but I know for sure that Ploticus compiles with gcc so I've used that.

You can find these package names by searching in the Package Manager on your local OpenSolaris system.

Requires: SUNWzlib
This section lists packages that must be installed on the end-user system for the software to work correctly. In this case, Ploticus will be dynamically-linked against zlib so I need to make sure the Package Manager knows about this dependency. When the users asks for Ploticus from the repository, the Package Manager will know it also needs to download and install the SUNWzlib package as well.
BuildRoot:      %{_tmppath}/%{name}-%{version}-build
SUNW_Basedir:   %{_basedir}

This is boilerplate. The intent of BuildRoot is to define a user- and application-specific path that can be used as the root of an area in which your package will be installed on the build server, allowing the build server to support simultaneous builds of multiple packages by multiple users without interference. Note, however, that I do not use BuildRoot in this spec file because this conversation indicates that $RPM_BUILD_ROOT is the officially supported way to refer to the top of a package install area. I don't know if this is true in the OpenSolaris world as well, but most spec files I've seen for OpenSolaris use $RPM_BUILD_ROOT so I have opted to use that as well.

Note that while $RPM_BUILD_ROOT (and BuildRoot) refers to the root of the installation area on the build server, the top of the build area itself -- the location where your package will actually be untar'ed and built -- is referred to as %{_builddir}.

I do not know how SUNW_Basedir is used.

SUNW_Copyright: %{name}.copyright
This is the name of the copyright file you will upload to SourceJuicer along with this spec file. It must be named as shown (ploticus.copyright in my case.) You will typically find this copyright file on the community's website and/or included within the community's source tarball. In the case of Ploticus, the tarball contains a file in src called Copyright, which I have copied, renamed to ploticus.copyright and then edited to remove html markup. This is the file I will then upload to SourceJuicer. The original src/Copyright file is ignored by SourceJuicer. Update: The preceding was actually not sufficient for my package to be validated. I was asked to append the file GPL.txt, which was also in the tarball's src directory, to ploticus.copyright so that the actual text of the GPL v2 copyright was in the file. The original version of the copyright file (src/Copyright) only refers to the GPL copyleft, it does not include the copyright itself.
Meta(info.upstream): Steve Grubb <ploticus@yahoogroups.com>
Meta(info.maintainer):  Josh Simons <josh.simons@sun.com>

These fields are specific to OpenSolaris's packaging system. The upstream field contains the name and address of the individual or group that creates and supports the open-source software. The maintainer field contains the name and email address of the individual responsible for the OpenSolaris packaging of the open-source project. The preferred format is as shown in these examples.

Additional info fields that can be included are documented here.

%description
A free, GPL, non-interactive software package for producing plots, 
charts, and graphics from data. It was developed in a Unix/C 
environment and runs on various Unix, Linux, and win32 systems. 
ploticus is good for automated or just-in-time graph generation, 
handles date and time data nicely, and has basic statistical capabilities. 
It allows significant user control over colors, styles, options and details. 
Ploticus is a mature package, available since 1999, and version 2.40 has 
more than 12,000 downloads to date.

A more detailed description of the open source software. This description was taken from the Ploticus web page.
%prep
%setup -q -n pl241src

Now we begin specifying what actions are required to build the software. The %setup macro cd's into the build directory, removes any cruft left over from earlier builds, unzips the source tarball (which will have been downloaded at this point), and then untars the sources into the build directory. It then cd's into the package's top-level directory. All of this is done with %{_builddir} as the root directory as described earlier.

Note that %setup assumes the top-level directory specified in the tarball is named %{name}-%{version}. If this is not true for your package, use the -n option to specify the correct name. For Ploticus, all files in the tarball are in the pl241src directory, so I've used the -n option to specify this.

See this page for more details about the %setup macro. The %patch macro, which can also be used in the %prep phase, can be used to apply patches prior to building the binaries if the standard community source code needs to be modified in some way to build successfully on OpenSolaris. See the same page for %patch information. Note that you should try to have your OpenSolaris changes accepted by the community to avoid having to apply these patches.

I don't know what the -q option does.

%build

cd src
make NOX11= XLIBS='-L/usr/openwin/lib -lX11' XOBJ='x11.o interact.o'  \\
     XINCLUDEDIR=-I/usr/openwin/include WALL= ZLIB=-lz ZFLAG=-DWZ \\
     PREFABS_DIR=/usr/lib/ploticus/prefabs pl

The %build section contains the commands needed to build the package binaries. At the end of the %prep phase we were left sitting in the top-level directory of the source tarball. Since the Ploticus makefile and sources are one level down from this (pl241src/src), I cd into src before invoking the correct make command for OpenSolaris.

Assuming the make ran correctly, we exit this phase with the binaries and other files all built on the build server in a sub-directory under %{_builddir}.

%install

mkdir -p $RPM_BUILD_ROOT%{_mandir}/man1
cp man/man1/pl.1 $RPM_BUILD_ROOT%{_mandir}/man1/pl.1
mkdir -p $RPM_BUILD_ROOT%{_bindir}
cp src/pl $RPM_BUILD_ROOT%{_bindir}
mkdir -p $RPM_BUILD_ROOT%{_libdir}/%{name}
cp -r prefabs $RPM_BUILD_ROOT%{_libdir}/%{name}


In the install phase, we execute a "make install" or equivalent, moving all files that will be included in the binary package to their final installed locations, but relative to $RPM_BUILD_ROOT rather than to "/" to avoid collisions on the build server. Because the Ploticus "make install" action doesn't do exactly what I need, I instead manually move each required file to its final location. For many projects, something similar to "make DESTDIR=$RPM_BUILD_ROOT install" would be appropriate in this phase.

If you are moving files manually, do not assume directories exist -- make them before you use them. And use the predefined directory macros (e.g. %{_mandir} ) to reference standard installation locations. Others are documented here.

%clean
rm -rf $RPM_BUILD_ROOT

This is boilerplate clean-up code. Insert other commands as necessary.
%files

%defattr(-,root,bin)
%attr(0755, root, bin) %dir  %{_bindir}
%attr(0755, root, bin) %dir  %{_mandir}
%attr(0755, root, bin) %dir  %{_mandir}/man1
%attr(0755, root, bin) %dir  %{_libdir}
%attr(0755, root, bin) %dir  %{_libdir}/%{name}
%attr(0755, root, bin) %dir  %{_libdir}/%{name}/prefabs
%{_bindir}/\*
%{_libdir}/%{name}/prefabs/\*
%{_mandir}/\*/\*

This can be a complicated section so I suggest reading the Max RPM %files section.

The %files section specifies the locations and attributes of all files that will be placed onto the end-user's system when the binary package is installed. The %attr directive is used to specify permissions and ownership for files and directories. The %dir directive identifies directories. Multiple directives can be applied to objects by including them on the same line.

The first line specifies default mode, default user ID and default group ID for all files created during the build process. The dash ("-") means that a default is not set explicitly for that field. Note that failure to include this line in your spec file will cause an obscure error to be generated when an end-user tries to install your package. That would be very bad.

The next four lines specify the directories in which Ploticus-related files will reside. The last three ensure that the Ploticus binary, all of the Ploticus prefabs config files, and the man page will be included in the binary package. Note again the use of macros to specify standard installation directories.

%changelog
\* Tues Apr 28 2009 - Josh Simons <josh.simons@sun.com>
- initial version

Add any changelog information you desire here.

Once you've created your spec file, it is time to feed it to SourceJuicer for syntax and other checking and then iterate as necessary until your spec file is correct and has passed validation. The basic flow is shown in the diagram below.

The first step is to submit the spec file to SourceJuicer along with the project's copyright file. To do so, go to the SourceJuicer Submit page (login required.) Assign a descriptive name to your upload (I used 'ploticus') and then specify your spec file. Use 'add another file' to add your copyright file. Add whatever other files you may need (see 'more help' on the Submit page.) Click Submit and you will see a page like this:


The summary page includes an indication that my spec file successfully passed a syntax check. If an error occurs at this point, make the necessary corrections and use the ReSubmit tab (not shown) at the bottom of this page to upload new versions of your copyright and spec files.

Looking under Reviews, I can see my package has not yet been validated, which means my submission hasn't yet been checked by someone to ensure my copyright file is appropriate, that someone else has not already packaged this program for OpenSolaris, etc.

The next day I receive two email messages with comments from reviewers. When I log back into SourceJuicer and look at the Review tab, I see the two comments that were submitted. The fact that the package is still marked as not validated means I have issues to address:

Clicking on the "[review]" link takes me to the page with detailed information about the Ploticus review. I can also view this page by visiting the MyJuicer tab and then clicking on the appropriate link under My Submissions. This second method is better since it can be difficult to find your review on the main Review page. In any case, the page looks like this:

As you can see from Amanda and Christian's comments, I did not use the correct naming convention for the copyright file I uploaded to SourceJuicer. Rather than "Copyright", the file should have been named "ploticus.copyright" (more generally, %{name}.copyright). Also, Amanda hopes I can remove the html that is for some reason embedded in the standard Ploticus copyright file.

Using this same review page, I submit a clarifying question back to the reviewers to ensure I address their issues. I am not clear on the relationship between the copyright file that is submitted manually to SourceJuicer and the copyright file in the source tarball that is described with the "SUNW_Copyright" tag in the spec file.

Now that I understand the copyright issue and have adjusted my spec file and copyright file appropriately (and also updated the spec file and annotations in this blog entry--meaning you never saw that I had initially called my copyright file "Copyright"), I use the same Review page to Resubmit the spec file and copyright file. Use the tab at the bottom of the Review page to do this:

As of this writing, there is no way to remove a file that has been submitted to SourceJuicer so all three files (Copyright, ploticus.copyright, and ploticus.spec) are associated with the project even though Copyright is now extraneous. Until removal is possible, just ignore the extra files. [UPDATE: As of SJ 1.2.0, files can removed by visiting the MyJuicer review page for the appropriate package.]

I resubmitted the files, the package was subsequently validated, and then it was automatically scheduled to be built on the build server. I did not receive a notification when the build attempt occurred so you need to check status periodically (use the MyJuicer tab). When I checked, I saw my build had completed successfully on the first attempt:

Had the build not succeeded, I would have followed the Log link to view the build log, found the problem, fixed the spec file, and then Resubmitted. The package would then be rescheduled for another build automatically with no need for re-validation.

With the Ploticus build successfully completed, it is now very important to verify that the package installs correctly and that the software actually works. Though I don't cover it here, my first Ploticus package did not work correctly on my test system. I had to make changes to my spec file, rebuild the package, and reinstall it. Therefore, please do install and test your software!

To do the test installation, I first added the pending repository as a package authority on my 2008.11 system. Note carefully the location of this repository; I had expected it to be http://pkg.opensolaris.org/pending, but that is not correct:

% pfexec pkg set-authority -O  http://jucr.opensolaris.org/pending pending

I then started the Package Manager, selected the Pending repository and did a search for Ploticus. Voila! The package is available:

After selecting the package and clicking on Install/Update, the installation proceeds smoothly. I then start a terminal window and verify that Ploticus does, in fact, work correctly:

Once you are sure your package installs and runs correctly, send an email to sw-porters-discuss@opensolaris.org requesting that the package be promoted from the pending repository to the contrib repository. Note that you'll need to subscribe to this mailing list before you can post to it. To subscribe, go here.

Once the package is available in contrib, users will be able to install your package on their systems.

FIN!

[See my later blog entry for additional information about SourceJuicer and OpenSolaris improvements that make package contributions even easier.]


Wednesday Apr 15, 2009

Tickless Clock for OpenSolaris

I've been talking a lot to people about the convergence we see happening between Enterprise and HPC IT requirements and how developments in each area can bring real benefits to the other. I should probably do an entire blog entry on specific aspects of this convergence, but for now I'd like to talk about the Tickless Clock OpenSolaris project.

Tickless kernel architectures will be familiar to HPC experts as one method for reducing application jitter on large clusters. For those not familiar with the issue, "jitter" refers to variability in the running time of application code due to underlying kernel activity, daemons, and other stray workloads. Since MPI programs typically run in alternating compute and communication phases and develop a natural synchonization as they do so, applications can be slowed down significantly when some nodes arrive late at these synchronization points. The larger the MPI job, the more likely the this type of noise will cause a problem. Measurements have shown surprisingly large slowdowns associated with jitter.

Jitter can be lessened by reducing the number of daemons running on a system, by turning off all non-essential kernel services, etc. Even with these changes, however, there are other sources of jitter. One notable source is the clock interrupt used in virtually all current operating systems. This interrupt, which fires 100 times per second, is used to periodically perform housekeeping chores required by the OS. This interrupt is a known contributor to jitter. It is for this reason that IBM has implemented a tickless kernel on their Blue Gene systems to reduce application jitter.

Sun is starting a Tickless Clock project in OpenSolaris to completely remove the clock interrupt and switch to an event-based architecture for OpenSolaris. While I expect this will be very useful for HPC users of OpenSolaris, HPC is not the primary motivator of this project.

As you'll hear in the video interview with Eric Saxe, Senior Staff Engineer in Sun's Kernel Engineering group, the primary reasons he is looking at Tickless Clock are power management and virtualization. For power management, it is important that when the system is idle, it really IS idle and not waking up 100 times per second to do nothing since this wastes power and will prevent the system from entering deeper power saving states. For virtualization, since multiple OS instances may share the same physical server resources, it is important that guest OSes that are idle really do stay idle. Again, waking up 100 times per second to do nothing will steal cycles from active guest OS instances, thereby reducing performance in a virtualized environment.

While it is true I would argue that both power management and virtualization will become increasingly important to HPC users (more of that convergence thing), it is interesting to me to see that these traditional enterprise issues are stimulating new projects that will benefit both enterprise and HPC customers in the future.

Interested in getting involved with implementing a tickless architecture for OpenSolaris? The project page is here.


Friday Apr 03, 2009

HPC in Second Life (and Second Life in HPC)



We held an HPC panel session yesterday in Second Life for Sun employees interested in learning more about HPC. Our speakers were Cheryl Martin, Director of HPC Marketing; Peter Bojanic, Director for Lustre; Mike Vildibill, Director of Sun's Strategic Engagement Team (SET); and myself. We covered several aspects of HPC: what it is, why it is important, and how Sun views it from a business perspective. We also talked about some of the hardware and software technologies and products that are key enablers for HPC: Constellation, Lustre, MPI, etc.

As we were all in-world at the time, I thought it would be interesting to ponder whether Second Life itself could be described as "HPC" and whether we were in fact holding the HPC meeting within an HPC application. Having viewed this excellent SL Architecture talk given by Ian (Wilkes) Linden, VP of Systems Engineering at Linden Lab, I conclude that SL is definitely an HPC application. Consider the following information taken from Ian's presentation.


As you can see, the geography of SL has been exploding in size over the last 5-6 years. As of Dec 2008 that geography is simulated using more than 15K instances of the SL simulator process that in addition to computing the physics of SL also run an average of 30 million simultaneous server-side scripts to create additional aspects of the SL user experience. And look at the size of their dataset: 100TB is very respectable from an HPC perspective. And a billion files! Many HPC sites are worrying what will happen when they get to that level of scale while Linden Lab is already dealing with it. I was surprised they aren't using Lustre, since I assume their storage needs are exploding as well. But I digress.


The SL simulator described above would be familiar to any HPC programmer. It's a big C++ code. The problem space (the geography of SL) has been decomposed into 256m X 256m chunks that are each assigned to once instance of the simulator. Each simulator process runs on its own CPU core and "adjacent" simulator instances exchange edge data to ensure consistency across sub-domain boundaries. And it's a high-level physics simulation. Smells like HPC to me.


Wednesday Dec 10, 2008

A Quantum of Solaris


We emitted our latest wad of Solaris goodness today with the official release of OpenSolaris 2008.11. Lest you think engineering used a partially undenary nomenclature for the release name, rest assured the bits were in fact done and ready to go in November. The official announcement was delayed slightly due to other proximate product announcements.

I've been running 2008.11 for several weeks, having taken part in the internal testing cycles at Sun. I found and reported several mostly minor problems, but have generally found the 2008.11 experience to be quite good. The Live CD boot and install to disk all worked smoothly within VirtualBox, our free desktop virtualization product, on my MacBook Pro. With VirtualBox extensions installed, I can use 2008.11 in fullscreen mode and with mouse integration enabled.

While my primary interest in OpenSolaris is as a substrate on which we are building a full, integrated HPC software stack I can't help but note a few generally cool things about this release.

First is Time Slider. Yes, okay, Apple did it first with Time Machine. But try THIS with Time Machine: I turned on Time Slider and then immediately deleted a file from my Desktop without first doing any kind of back up. I then recovered the file using the TS slider on a File Browser window. This works because Time Slider is built on top of ZFS, which uses copy-on-write for safety and which is also used to implement an immediate snapshot facility. I was able to recover my file because when it was deleted (meaning "when the metadata representing the directory in which the file was located was changed"), the metadata was copied, modified and then written. But with snapshots enabled by Time Slider, the old metadata is retained as well, making it possible to slide back in time and recover deleted or altered files by revisiting the state of the file system at any earlier time. Nifty.

My second pick is perhaps somewhat esoteric, but I thought it was cool: managing boot environments with OpenSolaris. I think much of this was available in 2008.05, but it is new to me, so I've included it. In any case, managing multiple boot environments has been completely demystified as you can see in this article. Yet another admin burden removed through use of ZFS. For full documentation on boot environments, go here.

We've also made significant progress supporting Suspend/Resume, which is frankly an absolute requirement for any bare-metal OS one might run on a laptop. For me it isn't so important because I run OpenSolaris as a guest OS in VirtualBox. For those doing bare metal installations, this page details the requirements and limitations of the current Suspend/Resume support in 2008.11.

Putting my HPC hat back on for this last item, I note that a prototype release of the Automated Installer (AI) Project has been included in 2008.11. AI is basically the Jumpstart replacement for OpenSolaris--the mechanism that will be used to install OpenSolaris onto servers, including large numbers of servers hence my interest from an HPC perspective. For more information on AI, check out the design documents or, better, install the SUNWinstalladm-tools package using the Package Manager and then read the installadm man page. Full installation details are here. AI is still a work in process so feel free to pitch in if this area interests you: all of the action happens on the Caiman mailing list, which you can subscribe to here.


Thursday Nov 13, 2008

Big News for HPC Developers: More Free Stuff

'Tis the Season. Supercomputing season, that is. Every November the HPC community--users, researchers, and vendors--attend the world's biggest conference on HPC: Supercomputing. This year SC08 is being held in Austin Texas, to which I'll be flying in a few short hours.

As part of the seasonal rituals vendors often announce new products, showcase new technologies and generally strut their stuff at the show and even before the show in some cases. Sun is no exception as you will see if you visit our booth at the show and if you take note of two announcements we made today that should be seen as a Big Deal to HPC developers. The first concerns MPI and the second our Sun Studio developer tools.

The first announcement extends Sun's support of Open MPI to Linux with the release of ClusterTools 8.1. This is huge news for anyone looking for a pre-built and extensively tested version of Open MPI for RHEL 4 or 5, SLES 9 or 10, OpenSolaris, or Solaris 10. Support contracts are available for a fee if you need one, but you can download the CT 8.1 bits here for free and use them to your heart's content, no strings attached.

Here are some of the major features supported in ClusterTools 8.1:

  • Support for Linux (RHEL 4&5, SLES 9&10), Solaris 10, OpenSolaris
  • Support for Sun Studio compilers on Solaris and Linux, plus the GNU/gcc toolchain on Linux
  • MPI profiling support with Sun Studio Analyzer (see SSX 11.2008), plus support for VampirTrace and MPI PERUSE
  • InfiniBand multi-rail support
  • Mellanox ConnectX Infiniband support
  • DTrace provider support on Solaris
  • Enhanced performance and scalability, including processor affinity support
  • Support for InfiniBand, GbE, 10GbE, and Myrinet interconnects
  • Plug-ins for Sun Grid Engine (SGE) and Portable Batch System (PBS)
  • Full MPI-2 standard compliance, including MPI I/O and one sided communication

The second event was the release of Sun Studio Express 11/08, which among other enhancements adds complete support for the new OpenMP 3.0 specification, including tasking. If you are questing for ways to extract parallelism from your code to take advantage of multicore processors, you should be looking seriously at OpenMP. And you should do it with the Sun Studio suite, our free compilers and tools which really kick butt on OpenMP performance. You can download everything--the compilers, the debugger, the performance analyzer (including new MPI performance analysis support) and other tools for free from here. Solaris 10, OpenSolaris, and Linux (RHEL 5/SuSE 10/Ubuntu 8.04/CentOS 5.1) are all supported. That includes an extremely high-quality (and free) Fortran compiler among other goodies. (Is it sad that us HPC types still get a little giddy about Fortran? What can I say...)

The full list of capabilities in this Express release are too numerous to list here, so check out this feature list or visit the wiki.


Thursday Oct 09, 2008

LISA '08: What Every Admin Needs to Know About Solaris

Admins, fasten your seatbelts: The 22nd Large Installation System Administration (LISA '08) Conference promises to be as jammed with useful and interesting technical content as ever and at least as much fun. Come to San Diego from Nov 9-14 to find out!

For those of you looking to dig deeper into Solaris or for those looking to understand what the fuss is all about, there is a ton of Solaris and OpenSolaris related content scheduled at LISA thanks to a lot of hard work by people both inside and outside of Sun. Here are some of the highlights.

Jim Mauro is doing a full-day POD training session. That's Performance, Observability, and Debugging. If you only make it to one Solaris session, pick this one. Jim is a very knowledgeable and engaging speaker and the material is excellent. I enjoyed Jim's presentation of a much compressed version of this at a recent NEOSUG meeting--it was excellent. You will definitely emerge 1) exhausted, and 2) with a much better understanding of how to use a variety of Solaris tools to solve performance problems and to better understand your systems' workloads. Jim will lead you on a foray into the depths of the various Solaris tools that let you look at all aspects of system performance, including DTrace. Whether you are a seasoned UNIX admin, but new to Solaris, or just wondering what all the DTrace fuss is about, you will find this taste-o-DTrace pretty exciting. And if you really want to know a lot more about DTrace, Jim is also doing an all-day DTrace training session at the conference.

Peter Galvin, long-time Solaris expert and trainer and also chair of NEOSUG, and Marc Staveley will be giving a two-day Solaris workshop that has been broken into four half-days sessions. The sessions are Administration, Virtualization, File Systems, and Security. These are all hands-on sessions so Peter and Marc recommend you bring a laptop. Solaris installation not required--the instructors will supply a Solaris machine for remote access.

For something higher level and more strategic, Jim Hughes (Chief Technologist for Solaris) will give an invited talk on OpenSolaris and the Direction of Future Operating Systems. And Janice Gelb will also deliver an invited talk provocatively titled, WTFM: Documentation and the System Administrator.

There will be two Solaris-focused Guru sessions at LISA as well. Scott Davenport and Louis Tsien will cover Solaris Fault Management, while Richard Elling will speak about ZFS. These both promise to be interesting sessions with technical people who really know their stuff.

Solaris Containers are an innovative virtualization technology that is built right into Solaris and Jeff Victor will be leading a full-day workshop to take attendees on a detailed tour of this capability. Check out Resource Management with Solaris Containers.

There will also be a full-day deep dive workshop on ZFS offered by Richard Elling. Many people have heard about this new file system, but you won't really understand exactly why it is getting so much attention until you experience how it changes the administrative experience around file systems.

Sun will also be hosting a vendor BOF to talk about BigAdmin, the mega-hub for metrics tons of useful and very detailed information for administrators. If you aren't familiar with BigAdmin, check out the BOF or at the very least pop over to the website for a peek. Cool stuff.

Sun will also have a booth in the exhibit area. Booth 52, I believe. Stop by for some good conversation and maybe some giveaways.


Wednesday Sep 03, 2008

New England OpenSolaris User Group Meeting: Wednesday, September 10th!

The fifth meeting of NEOSUG (New England OpenSolaris User Group) will be held next Wednesday, September 10th at Sun's Burlington, Massachusetts site. The featured speaker will be Jim Mauro, who will talk about Solaris 10 and OpenSolaris Performance, Observability, and Debugging. Full details below.

The New England Open Solaris User Group (NEOSUG) Meeting

Topic for this meeting:

Solaris 10 and OpenSolaris Performance, Observability and Debugging (The Abridged Version)

Who should attend? : UNIX Developers, Solaris users, System Managers and System Administrators.

AGENDA:

New England OpenSolaris User Group Meeting (NEOSUG)
Sept 10, 2008 6:30-9:30 pm (registration opens @5:30)
Sun Microsystems
One Network Drive
Burlington, MA

5:30-6:30: Registration, Refreshments
6:30-6:40: Introductions, Peter Galvin
6:40-8:30: Solaris 10 and OpenSolaris Performance, Jim Mauro, Sun Microsystems
8:30-9:00: Questions and Discussion

Please RSVP at : https://www.suneventreg.com//cgi-bin/register.pl?EventID=2341

TALK DESCRIPTION:

Solaris 10 and OpenSolaris Performance, Observability and Debugging (The Abridged Version)

The observability toolbox in Solaris 10 and OpenSolaris is loaded with powerful tools and utilities for analyzing applications and the underlying system. Solaris Dynamic Tracing (DTrace), allows you to connect the dots between the process and thread-centric tools, and the system utilization tools, and get a complete picture on what your applications are doing, how they are interacting with the kernel, and to what extent they are consuming hardware resources (CPU, Mem, etc).

This two hour talk walks through the tools, utilities and methods for analyzing workloads on your Solaris systems.

NEOSUG BIOs:

Peter Galvin : Chief Technologist, Corporate Technologies Inc.
Peter Baer Galvin is the Chief Technologist for Corporate Technologies, Inc., a systems integrator and VAR, and was the Systems Manager for Brown University’s Computer Science Department. He has written articles for Byte and other magazines. He wrote the Pete’s Wicked World and Pete’s Super Systems columns at SunWorld Magazine. He is currently contributing editor for SysAdmin Magazine, where he managed the Solaris Corner. Peter is co-author of the Operating Systems Concepts and Applied Operating Systems Concepts texbooks. Blog: http://pbgalvin.wordpress.com

Jim Mauro: Principal Engineer in the Systems Group, Sun Microsystems, Inc.
Jim Mauro works on improving delivered application performance on Sun hardware and Solaris. Jim's recent project work includes Solaris performance as a guest operating system on Xen and VMware virtual machines, Solaris large memory page performance, and Solaris performance on large SPARC systems. Jim co-authored Solaris Internals (1st Ed, Oct 2000), Solaris Internals (2nd Ed, June 2006) and Solaris Performance and Tools (1st Ed, June 2006).

ug-neosug mailing list: ug-neosug@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/ug-neosug


Thursday Jun 05, 2008

Compilers for OpenSolaris 2008.05 [UPDATED]

[Thanks to Michal Bielicki for pointing out an error in the original post. The correct name of the Sun Studio package is "ss-dev". I have fixed the text and graphics below to reflect this.]

If you want to install either the Sun compilers and developer tools or the GNU developer tools onto OpenSolaris 2008.05, I summarize the process below. Currently, the procedure is somewhat less than obvious, hence this blog entry.

First, start the package manager: System -> Administration -> Package Manager. You will see the following:


Looking for the compilers, you might select Developer Tools. However, doing so will show the following:

Unfortunately, the compiler packages were not categorized correctly and therefore they do not show up under this package category. We will fix this, but the good news is that the packages are available, if you know where to look.

With the 'All' category selected, enter 'gcc' into the search field. You do not need to press return. The interface is slow, but it will eventually update to show the following:

To install the GNU tools, select the gcc-dev package and click on Install/Update in the toolbar. Once the download (about 120MB) and the installation complete, you can open a new terminal window and type 'gcc' to verify the software has been installed correctly. The installation script has created links from /usr/bin/gcc to /usr/sfw/bin/gcc as a convenience.

To find the Sun compilers and tools, type "ss-dev" ("ss" stands for Sun Studio, which is what we call our compiler and tools suite) and you will see the following:

Select the "ss-dev" package and then click on Install/Update on the toolbar. Once the download (over 600MB) and the installation complete, you can open a terminal window and try the 'cc' command. Don't panic when it fails. As you can see below, the compilers have been installed in /opt. You will need to either modify your startup files to include this directory on your execution path, or create the appropriate links from /usr/bin into this directory. I've been told we will fix this inconvenience soon. A wonderful benefit of network-based package management is that we can fix this relatively quickly and then subsequent downloaders of the package will see the new behavior automatically.


Monday May 05, 2008

OpenSolaris binary distro now available

Today Sun announced the availability of our first OpenSolaris binary distro, OpenSolaris 2008.05, which is built from the OpenSolaris Project's open-source code base. You can download the LiveCD bits and read more at http://www.opensolaris.com. With LiveCD you can boot the distro without writing anything to disk and decide later whether you want to install it or not.

I've filed this under my HPC category. Why? Because future versions of this distro will form the substrate on which we intend to build a full HPC distro that will include both the OS and layered products and will address both application development and deployment. Components will include compilers, MPI library, developer tools, distributed resource management, as well as provisioning, management, and monitoring capabilities.

The definition and creation of this HPC distro will be run as an OpenSolaris project, which we will be starting in earnest soon. We invite any interested party to join the HPC Developer Community to get involved.


Saturday Apr 26, 2008

HPC User Forum: Operating System Panel

As I mentioned in an earlier entry, I participated in the HPC interconnect panel discussion at IDC's HPC User Forum meeting in Norfolk, Virginia last week. I also sat on the Operating System panel, the subject of this blog entry.

Because the organizers opted against panel member presentations, the following slides were not actually shown at the conference, though they will be included in the conference proceedings. I use them here to highlight some of the main points I made during the panel discussion.

[os panel slide 1]

My fellow panelists during this session were Kenneth Rozendal from IBM, John Hesterberg from SGI, Benoit Marchand from eXludus, Ron Brightwell from Sandia National Laboratory, John Vert from Microsoft, Ramesh Joginpaulli from AMD, and Richard Walsh from IDC.

The framing topic areas suggested by Richard Walsh prior to the conference were used to guide the discussion:

  • Ensuring scheduling efficiency on fat nodes
  • Managing cache and bandwidth resources on multi-core chips
  • Linux and windows: strengths, weaknesses and alternatives
  • OS scalability and resiliency requirements of petascale systems

As you'll see, we covered a wider array of topics during the course of the panel session.

[os panel slide 2]

Beowulf clusters have been popular with the HPC community since about 1998. The idea arose in part as a reaction against expensive, "fat" SMP systems and proprietary, expensive software. Typical Beowulf clusters were built of "thin" nodes (typically one or two single-CPU sockets), commodity ethernet, and an open source software stack customized for HPC environments.

With multi-core and multi-threaded processors now becoming the norm, nodes are maintaining their svelte one or two rack unit form factors, but become much beefier internally. As an extreme example, consider Sun's new SPARC Enterprise T5140 server, which crams 128 hardware threads, 16 FPUs, 64 GB of memory, and close to 600 GBs of storage into a single rack unit (1.75") form factor. Or the two rack-unit version (the T5240) that doubles the memory to 128 GB and supports up to almost 2.4 TB of local disk storage in the chassis. I call nodes like these Sparta nodes because they are slim and trim...and very powerful. Intel and AMD's embracing of multicore ensures that future systems will generally become more Spartan over time.

Clusters need an interconnect. While traditional Beowulf clusters have used commodity Ethernet, they have often done so at the expense of performance for distributed applications that have significant bandwidth and/or latency requirements. As was discussed in the interconnect panel session at the HPC User Forum, InfiniBand (IB) is now making significant inroads into HPC at attractive price points and will continue to do so. Ethernet will also continue to play a role, but commodity 1 GbE is not at all in the same league with IB with respect to either bandwidth or latency. And InfiniBand currently enjoys a significant price advantage over 10 GbE, which does offer (at least currently) comparable bandwidths to IB, though without a latency solution. The use of IB in Sparta clusters allows the nodes to be more tightly coupled in the sense that a broader range of distributed applications will perform well on these systems due to the increased bandwidth and much lower latencies achievable with InfiniBand OS bypass capabilities.

This trend towards beefier nodes will have profound effects on HPC operating system requirements. Or said in a different way, this trend (and others discussed below) will alter the view of the HPC community towards operating systems. The traditional HPC view of an OS is one of "software that gets in the way of my application." In this new world, while we must still pay attention to OS overhead and deliver good application performance, the role of the OS will expand and deliver significant value for HPC.

[os panel slide 3]

The above is a photo I shot of the T5240 I described earlier. This is the 2RU server that recently set a new two-socket SPEComp record as well as a SPECcpu record. Details on the benchmarks are here. If you'd like a quick walkthrough of this system's physical layout, check out my annotated version of the above photo here.

[os panel slide 4]

The industry shift towards multicore processors has created concern within the HPC community and more broadly as well. There are several challenges to be addressed if the value and power of these processors are to be realized.

The increased number of CPUs and hardware threads within these systems will require careful attention be paid to operating system scalability to ensure that application performance does not suffer due to inefficiencies in the underlying OS. Vendors like Sun, IBM, SGI, etc, who have worked on OS scaling issues for many years have experience in this area, but there will doubtless be continuing scalability and performance challenges as these more closely coupled hardware complexes become available with ever large memory configurations and with ever faster IO subsystems.

There was some disagreement within the panel session over the ramifications to application architectures of these beefier nodes when they are used as part of an HPC cluster. Will users continue to run one MPI process per CPU or thread, or will a fewer number of MPI processes be used per node with each process then consuming additional on-node parallelism via OpenMP or some other threading model? I am of the opinion that the mixed/hybrid style (combined MPI and threads) will be necessary for scaling to very large-size clusters because at some point scaling MPI will become problematic. In addition, regardless of the cluster size under consideration, using MPI within a node is not very efficient. MPI libraries can be optimized in how they use shared memory segments for transferring message data between MPI processes, but any data transfers are much less efficient than using a threading model which takes full advantage of the fact that all of the memory on a node is immediately accessible to all of the threads within one address space.

The tradeoff is that this shift from pure MPI programming to hybrid programming does require application changes and the mixed model can be more difficult since it requires thinking about two levels of parallelism. If this shift to multi-core and multi-threaded processors were not such a fundamental sea change, I would agree that recoding would not be worthwhile. However, I do view this as profound a shift as that which caused the HPC community to move to distributed programming models with PVM and MPI and to recode their applications at that time.

Another challenge is that of efficient use of available memory bandwidth, both between sockets and between sockets and memory. As more computational power is crammed into a socket, it becomes more important for 1) processor and system designers to increase available memory bandwidth, 2) operating system designers to provide efficient and effective capabilities to allow applications to make effective use of bandwidth, and 3) for tool vendors to provide visibility into application memory utilization to help application programmers optimized their use of the memory subsystem. In many case, memory performance will become the gating factor on performance rather than CPU.

As the compute, memory, and IO capacities of these beefier nodes continues to grow, the resiliency of these nodes will become a more important factor. With more applications and more state within a node, downtime will be less acceptable within the HPC community. This will be especially true in commercial HPC environments where many ISV applications are commonly used and where these applications may often be able to run within a single beefy node. In such circumstances, OS capabilities like proactive fault management which identifies nascent problems and takes action to avoid system interruption become much more important to HPC customers. An interesting development, since capabilities like fault management have traditionally been developed for the enterprise computing market.

The last item--interconnect pressures--is fairly obvious. As nodes get beefier and perform more work, they put a larger demand on the cluster interconnect, both for compute-related communication and for storage data transfers. InfiniBand, with its aggressive bandwidth roadmap, and an ability to construct multi-rail fabrics, will play an important role in maintaining system balance. Well-crafted low level software (OS, IB stack, MPI) will be needed to handle the larger loads at scale.

[os panel slide 5]

Beyond the challenges of multi-core and multi-threading, there are opportunities. For all but the highest end customers, beefier nodes will allow node counts to grow more slowly, allowing datacenter footprints to grow more slowly, decreasing the rate of complexity growth and scaling issues.

Much more important, however, is that with more hardware resources per node it will now be possible to dedicate some small to moderate amount of processing power to handling OS tasks while minimizing the impact of that processing on application performance. The ability to fence off application workload from OS processing using concepts like processor sets, processor bindings, and the ability to turn off device interrupt processing, etc. should allow applications to be run with reduced jitter while still supporting a full, standard OS environment on each node, and not having to resort to microkernel or other approaches to deliver low jitter. Using standard OSes (possible stripped down by removing or disabling unneeded functions) is very important for several reasons.

As mentioned earlier, OS capabilities are becoming more important, not less. I've mentioned fault management and scalability. In a few slides we'll talk about power management, virtualization and other capabilities that will be needed for successful HPC installations in the future. Attempt to build all of that into a microkernel and you end up with a kernel. You might as well start with all of the learning and innovation that has accrued to standard OSes and minimize and improve where necessary rather than building one-off or few-off custom software environments.

I worry about how well-served the very high end of the HPC market will be in the future. It isn't a large market or one that is growing like other segments of HPC. While it is a segment that solves incredibly important problems, it is also quite a difficult market for vendors to satisfy. The systems are huge, the software scaling issues tremendously difficult, and, frankly, both the volume and the margins are low. This segment has argued for many years that meeting their scaling and other requirements guarantees that a vendor will be well-positioned to satisfy any other HPC customer's requirements. It is essentially the "scale down" argument. But that argument is in jeopardy to the extent that the high-end community embraces a different approach than is needed for the bulk of the HPC market. Commercial HPC customers want and need a full instance of Solaris or Linux on their compute nodes because they have both throughput and capability problems to run and because they run lots of ISV applications. They don't want a microkernel or some other funky software environment.

I absolutely understand and respect the software work being done at our national labs and elsewhere to take advantage of the large-scale systems they have deployed. But this does not stop me from worrying about the ramifications of the high-end delaminating from the rest of the HPC market.

[os panel slide 6]

The HPC community is accustomed to being on the leading/bleeding edge, creating new technologies that eventually flow into the larger IT markets. InfiniBand is one such example that is still in process. Parallel computing techniques may be another as the need for parallelization begins to be felt far beyond HPC due to the tailing off of clock speed increases and the emergence of multi-core CPUs.

Virtualization is an example of the opposite. This is a trend that is taking the enterprise computing markets by storm. The HPC community to date has not been interested in a technology that in their view simply adds another layer of "stuff" between the hardware and their applications, reducing performance. I would argue that virtualization is coming and will be ubiquitous and the HPC community needs to actively engaged to 1) influence virtualization technology to align it with HPC needs, and 2) find ways in which virtualization can be used to advantage within the HPC community rather than simply being victimized by it. It is coming, so let's embrace it.

The two immediate challenges are both performance related: base performance of applications in a virtualized environment and virtualizing the InfiniBand (or Ethernet) interconnect while still being able to deliver high performance on distributed applications, including both compute and storage. The first issue may not be a large one since most HPC codes are compute intensive and such code should run at nearly full speed in a virtualized environment. And early research on virtualized IB, for example by DK Panda's Network-based Computing Laboratory at OSU, has shown promising results. In addition, the PCI-IOV (IO Virtualization) standard will add hardware support for PCI virtualization that should help achieve high performance.

What about the potential benefits of virtualization for HPC? I can think of several possibilities:

  • Coupling live migration [PDF] with fault management to dynamically shift a running guest OS instance off of a failing node, thereby avoiding application interruptions.
  • Using the clean interface between hypervisor and Guest OS instances to perform checkpointing of a guest OS instance (or many instances in the case of an MPI job) rather than attempting to checkpoint individual processes within an OS instance. The HPC community has tried for many years to create the latter capability, but there are always limitations. Perhaps we can do a more complete job by working at this lower level.
  • Virtualization can enable higher utilization of available resources (those beefy nodes again) while maintaining a security and failure barrier between applications and users. This is ideal in academic or other service environments in which multi-tenancy is an issue, for example in cases where academic users and industry partners with privacy or security concerns share the same compute resources.
  • Virtualization can also be used to decrease the administrative burden on system administration staff and allow it to be more responsive to the needs of its user population. For example, a Solaris or Linux-based HPC installation could easily allow virtualized Windows-based ISV applications to be run dynamically on its systems without having to permanently maintain Windows systems in the environment.

They key point is that virtualization is coming and we as a community should find the best ways of using the technology to our advantage. The above are some ideas--I'd like to hear others.

[os panel slide 7]

The power and cooling challenges are straightforward; the solutions are not. We must deliver effective power management capabilities for compute, storage, and interconnect that support high performance, but also deliver significant improvements over current practice. To do this effectively requires work across the entire hardware and software stack. Processor and system design. Operating system design. System management framework design. And it will require a very comprehensive review of coding practices at all levels of the stack. Polling is not eco-efficient, for example.

Power management issues are yet another reason why operating systems become more important to HPC as capabilities developed primarily for the much larger enterprise computing markets gain relevance for HPC customers.

[os panel slide 8]

Here I sketch a little of Sun's approach with respect to OSes for HPC. First, we will offer both Linux and Solaris-based HPC solutions, including a full stack of HPC software on top of the base operating systems. We recognize quite clearly the position currently held by Linux in HPC and see no reason why we should not be a preferred provider of such systems. At the same time, we believe there is a strong value proposition for Solaris in HPC and that we can deliver performance along an array of increasingly relevant enterprise-derived capabilities that will benefit the HPC community. We also realize it is incumbent upon us to prove this point to you and we intend to do so.

I will finish by commenting on one bullet on this final slide. For the other products and technologies, I will defer to future blog posts. The item I want to end with is Project Indiana and OpenSolaris due to its relevance to HPC customers.

In 2005, Sun created an open source and open development effort based on the source code for the Solaris Operating System. Called OpenSolaris, the community now numbers well over 75,000 members and it continues to grow.

Project Indiana is an OpenSolaris project whose goal is to produce OpenSolaris binary distributions that will be made freely available to anyone with optional for-fee support available from Sun. An important part of this project is a modernization effort that moves Solaris to a network-based package management system, updates many of the open-source utilities that are included in the distro, and adds open-source programs and utilities that are commonly expected to be present. To those familiar with Linux, the OpenSolaris user experience should become much more familiar as these changes roll out. In my view, this was a necessary and long-overdue step towards lowering the barrier for Linux (and other) users, enabling them to more easily step into the Solaris environment and benefit from the many innovations we've introduced (see slide for some examples.)

In addition to OpenSolaris binary distros, you will see other derivative distros appearing. In particular, we are working to define an OpenSolaris-based distro that will include a full HPC software stack and will address both developers and deployers. This effort has been running within Sun for awhile and will soon transition to an OpenSolaris project so we can more easily solicit community involvement in this effort. This Solaris HPC distro is meant to complement similar work being done by a Linux-focused engineering team within our expanded Lustre group, which is also doing its work in the open and also encourages community involvement.

There was some grumbling at the HPC User Forum about the general Linux community and its lack of focus or interest in HPC. While clearly there have been some successes (for example, some of the Linux scaling work done by SGI), there is frustration. One specific example mentioned was the difficulty in getting InfiniBand support into the Linux kernel. My comment on that? We don't need to ask Linus' permission to put HPC-enabling features into Solaris. In fact, with our CEO making it very clear that HPC is one of the top three strategic focus areas for Sun [PDF, 2MB], we welcome HPC community involvement in OpenSolaris. It's free, it's open, and we want Solaris to be your operating system of choice for HPC.

About

Josh Simons

Search

Archives
« April 2014
SunMonTueWedThuFriSat
  
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
   
       
Today