Monday May 03, 2010

Low Latency Scheduling with Oracle Coherence

I have been too caught up with my transition to Oracle, so have not been able to spend much time in writing my blog. As I'm now a part of Oracle, I have been spending some of my free time getting familiar with Oracle products, in particularly Oracle Coherence.

Coherence provides distributed data memory, something like a shared memory space across multiple systems. Existing similar competing products are Gigaspaces, Gemfire and Terracotta. Coherence is extremely versatile and can be use for different purpose, e.g. with application servers to scale your web application, in financial trading for low latency transaction processing and high performance data intensive computing.

Coherence provides a way for low-latency scheduling of small independent tasks. Traditional HPC job schedulers like Grid Engine schedules jobs in a fixed interval (usually in several seconds), it is fine for jobs that run for a few hours but not for many small tasks that needs to be processed quickly. One way that Coherence is able to address this is to use the WorkManager interface.
[Read More]

Tuesday Jul 28, 2009

A New GUI monitoring tool for SGE

For many years I've been begging for a web/GUI-based monitoring tool for SGE besides Qmon. While the xml-qstat by the BioTeam is a great tool, however SGE needed something that is officially supported for our customers. Well, the latest SGE 6.2 update 3 release includes a new monitoring tool call Inspect which is developed purely in Java and uses JMX, so your SGE installation has to enable JMX.

Here are some screenshots of the Inspect GUI provided by Chris Dag. Besides SGE, Inspect can also monitor SDM as well. Now, the last thing that is lacking is a GUI-based job submission interface...

Tuesday Mar 24, 2009

New SGE screencasts

I just scanned through the SGE user community mailing list and saw a postings of two new SGE related screencast that I thought that I'll share in this blog entry. The first is a video by Lubomir Petrik
demonstrating the new GUI installer from the latest SGE 6.2u2 version. You can see the screencast at his blog.

The other screencast is by James Coomer, Sun HPC architect, showing the use of Sun Secure Global Desktop working with SGE and integrating xml-qstat for monitoring. The neat demo can be viewed on YouTube here.

Wednesday Jan 28, 2009

Eli Lilly uses Cloud Computing

For anyone that think that cloud computing is not real and no one really uses it seriously should think again. Here is an article on InformationWeek reporting that Eli Lilly "uses Amazon Web Services and other cloud services to provide high-performance computing, as needed, to hundreds of its scientists". The advantage of using Amazon WS is that "a new server can be up and running in three minutes... and a 64-node Linux cluster can be online in five minutes (compared with three months internally)". I'd advise everyone to take what it said with a pinch of salt, however it's still cool to know about commercial uses of AWS and what Eli Lilly is planning next.

Read the full article here.

Monday Jan 19, 2009

SGE GUI Installation

One of the nice new feature in the upcoming update, SGE 6.2u2, will include a GUI installer. The beta version of the update is available now and I was playing around with the GUI installer. For those who have done automatic installation of SGE in a large cluster will know about the installation config file, but now with the GUI installer it's becoming more straightforward to do mass installation. However, there is still the usual preparation that needs to be done before installation:

1. Create a SGE admin user - not mandatory, but recommended.
2. Copy/install the SGE binaries to all the hosts in the same directory if you're not using a shared directory.
3. Setup passwordless SSH. The GUI installer uses SSH to invoke $SGE_ROOT/util/arch in the remote hosts to get the type of hosts. If your host(s) is listed as "unreachable", that means the installer is unable to execute the command to get the host types. Try running the SSH command manually to see what's wrong.
4. If the SGE directory is not on a shared directory, then after installing the qmaster, you still have to copy the contents in SGE cell's common directory (e.g. if your cell's name is default, it will be $SGE_ROOT/default/common) manually to all the hosts that you are going to install as execution hosts.

Some screen shots of the GUI installer:

Monday Dec 22, 2008

Solution Factory and HPC

I'm a little late in posting this, so this may be old news to some of you. Two quarters ago, I moved to a new team that is put together to developed new solutions that can be easily repeatable, pre-configured, pre-tested and quickly deployable. What this means to customer is less risk, reduced costs and higher efficiency. We have announced three new solutions, branded as the Sun Rapid Solutions, targeted at specific customer network infrastructure requirements around global web buildout, datacenter efficiency and high performance computing.

I'm part of the team that work on the HPC solution call Sun Compute Cluster. The solution is designed to provide customers with a simple, flexible, and scalable approach to addressing their compute-intensive environments. Thereby, enabling customers to quickly deploy a tested, reliable, and efficient architecture into production. It uses a Linux software stack comprises of Sun software like Sun Grid Engine, xVM Ops Center, HPC Clustertools and Sun Studio. Customer can also chooses to use the new Sun HPC Software, Linux edition, or any other software stack they wishes by engaging Sun Professional Service or partners for customization.

Thursday Nov 27, 2008

Gigaspaces integrates with Sun Grid Engine

I was at the Supercomputing last week and got to talk to a rep from Gigaspaces who is at our Sun booths. I found out from him that they have integrated Gigaspaces XAP with Sun Grid Engine. One of the thing missing in SGE is the capability to do low-latency scheduling where many small transaction needs to be dispatched and the results return very quickly. Example is trading in finance where performance is measured by number of transaction per seconds. Gigaspaces provides a scalable platform that can fills this gap. Using our Sun SPARC T5240 server, Gigaspaces is able to achieve very impressive benchmark result.

The integration allows SGE to manage Gigaspaces XAP instances and dynamically provision new instances to satisfy the SLA if the load is too high. Here is a video of a short presentation that shows a demo of using SGE to automatically provision Gigaspaces XAP.

Tuesday Oct 28, 2008

IDC: Cloud Services a $42B Market by 2012

Tuesday Aug 05, 2008

The hype of the cloud

I just read a candid article on Linux magazine about the hype of cloud computing and I find it quite funny. A nice read for those who are also caught up in th fad, like me. :)

EDIT: Dell is trying to trademark the term Cloud Computing:

Friday Aug 01, 2008

Sun Shared Visualization System

Recently, one of the local IT architect asked about the Shared Visualization System and if Sun Rays can be used to view 3D graphics. And the answer is YES! The general idea about the Shared Visualization solution is that you can have a central pool of graphic resource, which potentially can be a grid of multiple, different systems with accelerated graphics capabilities. Users can then have the ability to remotely access and share 3D visualization applications on a variety of client platforms.

The main software that is used to enable this is virtualGL, an open source program which redirects the 3D rendering commands from Unix and Linux OpenGL applications to 3D accelerator hardware in a dedicated server and displays the rendered output interactively to a thin client located elsewhere on the network. The thin client can therefore be a Sun Ray, but I believe that a plug-in is required to be installed at the Sun Ray server. Sun Grid Engine is also part of the software stack to manage access to the graphic resources.

See also the slides by my Sun colleague Torben, who presented the solution at one of a Grid conference.

Friday Jun 27, 2008

IDC: Software is #1 roadblock

The recent IDC presentation at the ISC2008 conference state that software is the biggest roadblock for HPC users now... I couldn't agree more. As clusters grow larger and more complex, better management tools are required. Managing and monitoring HPC cluster typically uses different bits and pieces of different tools; setting up and operating the cluster becomes very difficult. Sun has also recognize this and also taking a very serious look at this, which is why we have the Sun HPC Software, Linux Edition. It is currently based on CentOS and still needs more work to be the complete easy-to-use management solution. We have also started the HPC-Stack project under OpenSolaris community for the OpenSolaris edition of the software stack.

Management software is only one piece of the puzzle. The other piece is the development tools and parallel libraries. As processor gets more cores and number of cores per cluster grows, many applications will need to be redesign and new programming paradigms are needed to ease development and improve efficiency.

Wednesday May 28, 2008

Project Hydrazine

During the recent JavaOne, Sun announced a new project call Project Hydrazine that allows for the rapid creation and deployment of hosted services across multiple device types. Looking at this diagram suggests that this new service will be deployed on the existing compute infrastructure of I guess that it will be used by Sun to offer hosting services, something akin to's Compute Utility except that it'll not just be for compute.

Some people (here and here) have referred it as Sun's new Cloud Computing platform. While the definition of Cloud Computing is subjected to much debate, I'm still very excited about it. However, I'm very curious on how does Project Caroline fit into all this, assuming there is any relation at all?

Read more about: "Sun's do-it-all cloud" from eWeek.

Video introduction to Project Hydrazine:

Monday May 05, 2008

Apache Hadoop

Apache Hadoop is gaining a lot of attention in the web community, especially support from Yahoo. It has a distributed filesystem and supports data intensive distributed application using the MapReduce computational model. It is been viewed as an important piece of the puzzle in Cloud computing, but can also be very useful to datamining type of applications. I think it won't be long before it catches attention in HPC, if it hasn't yet. With it's high scalability and fault tolerant nature, I think it has a lot of uses in HPC. Due to the data intensive nature, I wonder if there can be any value with using Hadoop with Lustre. If anyone has any insight to the I/O characteristics, I'll be glad to hear about it.

Tuesday Apr 22, 2008

Sun's Cloud Computing for SaaS?

Recently there was much discussion in a new research project from Sunlab. Its call Project Caroline, a new horizontal scalable platform for SaaS that sounded a lot like cloud computing. Just what can Project Caroline do? Quoting from the article "Platform as a Service":

A hosting platform by Project Caroline enabled SaaS providers to:

1. Access a wide range of open source tools and resources through high-level abstractions (language-level Virtual Machines, networks, and network-accessible file systems and databases) to increase developer productivity while insulating code from infrastructure changes
2. Launch the service across performance-tuned, load-balanced infrastructure
3. Programmatically allocate, monitor, and control virtualized compute, storage, and networking resources
4. Automate service updates and platform usage dynamically—without human intervention
5. Draw on single-system view of a horizontally scaled pool of resources in order to meet the allocation requests of multiple applications

Project Caroline is opensourced under the GPLv2 license and the source code can be downloaded from the project website. It is still very much a research project, but I really hope that it will eventually be developed into a real Sun product.

Wednesday Nov 07, 2007

Virtualization and Grid Computing

Today, Solaris 10 already comes with virtualization technology like Container and LDOM. Recently, the build 75 of Solaris Express Community Edition is enabled with the xVM hypervisor. I'm quite excited about xVM as I've been wanting to test it out long time ago. Virtualization has generated a lot of attention but not as much in HPC. There will be performance overhead when running applications in hypervisor and the benefits like server consolidation to improve utilizations does not really apply to HPC. However, virtualization can still be very useful, especially in a heterogeneous environment. Different HPC applications may support different OSes, so its easier to manage the OSes in virtual machines (VMs). We can also limit the resources of a virtual machine so that we can have greater control and flexibility in managing the workload. But to me the best thing is the ability to migrate or move your VM around. A faster machine becomes available? Just migrate that VM over. A more important job needs to be run? Save the state of the VM, suspend it and start another VM to run the higher priority job. This paper gives a very good justification of why use virtualization for Grid computing.

Already, there has been several research in Grid Computing that uses virtualization. Globus Virtual Workspace allows the Grid user to dynamically create a virtualized execution environment in a Grid resource. The idea is that you can create your own workspace in someone else machine, do whatever you want with it, and destroy it once you're done. GridHypervisor is another project that is building a VM management and provisioning layer on a Grid infrastructure. The project aims to enable the large-scale, reliable, efficient and dynamic deployment and re-allocation of virtual machines between different administrative domains, each with its own security policy and local virtualization management technology.


Melvin Koh


« April 2014