Oracle Grid Engine on AWS Cluster Compute Instances

Amazon Web Services (AWS) today announced a big step forward for customers who want to run HPC applications in the cloud with their new Compute Cluster Instances. No surprise, Oracle Grid Engine fans like BioTeam didn't take long to notice and try it out. Lets dig a little deeper into the new AWS Compute Cluster Instance and see what folks are so excited about and why Oracle Grid Engine is almost a must-have for customers wanting to take advantage of Compute Cluster Instances.

To put things in perspective, the new Compute Cluster Instances should be compared to other AWS instance types. According to Amazon, a standard AWS EC2 compute unit is normalized to "the equivalent CPU capacity of a 1.0-1.2 GHz 2007 Opteron or 2007 Xeon processor". The new Compute Cluster Instance is equivalent to 33.5 EC2 compute units. On the surface, that isn't that much more powerful than the previous 26 EC2 compute unit High-Memory Quadruple Extra Large Instance (although the name is certainly simpler). What is different is the Compute Cluster Instance architecture. You can cluster up to 8 Compute Cluster Instances for 64 cores or 268 EC2 compute units. With the Compute Cluster Instance, Amazon provides additional details on the physical implementation, calling out "2 x Intel Xeon X5570, quad-core Nehalem architecture" cores per instance. Perhaps more importantly, while other AWS instance types only specify IO capability as "moderate" or "high", the Compute Cluster Instance comes with "full bisection 10 Gbps bandwidth between instances". While there is a certain value in consistency in advertising compute instances as standard EC2 compute units and IO bandwidth as moderate or high, I applaud Amazon on their increased transparency in calling out both the specific Intel x5570 CPU and the specific 10GbE IO bandwidth of the new Compute Cluster Instances.

So what about Oracle Grid Engine makes it so useful for the new Compute Cluster Instances. AWS already offers customers a broad range of Oracle software on EC2 ranging from Oracle Enterprise Linux to Oracle Database and Oracle WebLogic server and you can download pre-built AWS instances directly from Oracle. Don't take my word for it, read about what joint Oracle/AWS customers like Harvard Medical School are doing with Oracle software on AWS. But back to Oracle Grid Engine. Oracle Grid Engine software is a distributed resource management (DRM) system that manages the distribution of users' workloads to available compute resources. Some of the world's largest supercomputers, like the Sun Constellation System at the Texas Advanced Computing Center use Oracle Grid Engine to schedule jobs across more than 60,000 processing cores. You can now use the same software to schedule jobs across a 64 core AWS Cluster Compute Instance.

Of course, many customers won't use only AWS or only their own compute cluster. A natural evolution of grid to cloud computing is so-called Hybrid Clouds that combine resources across public and private clouds. Oracle Grid Engine already handles that too, enabling you to automatically provision additional resources from the Amazon EC2 service to process peak application workloads, reducing the need to provision datacenter capacity according to peak demand. This so-called cloud bursting feature of Oracle Grid Engine is not new, its just that you can now cloud burst onto a much more powerful AWS Compute Cluster Instance.

One of Oracle's partners who has been doing a lot of work with Oracle Grid Engine in the cloud is Univa UD. I had the opportunity to speak to Univa's new CEO, Gary Tyreman today about how they are helping customers build private and hybrid clouds using Oracle Grid Engine running on top of Oracle VM and Oracle Enterprise Linux. Gary told me Univa has been beta testing the AWS Compute Cluster Instance for several months and that it has worked flawlessly with Oracle Grid Engine and Oracle Enterprise Linux. Gary also noted that they are working with a number of Electronic Design Automation (EDA) customers that need even more powerful virtual servers than the ones available on AWS today. We have several joint customers that are evaluating the new Sun Fire x4800 running Oracle VM as supernodes for running EDA applications in private clouds. To put it in perspective, a single x4800 running Oracle VM can support up to 64 cores and 1 TB of memory. That is as much CPU power and many times the memory of a full 8 node AWS Compute Cluster Instance in a single 5RU server! Now that is a powerful cloud computing platform.

If you want to hear more from Gary about what Univa is doing with some of their cloud computing customers, download his Executive Roundtable video. I'd love to hear from some additional customers who are using Oracle Grid Engine on the new AWS Compute Cluster Instances. Who knows, maybe in the future Amazon will even offer a Super Duper Quadruple Extra Large Cluster Compute Instance based on the a singe 64 core, 1 TB server like the Sun Fire x4800. Meanwhile, you can easily take advantage of both Compute Cluster Instances and the x4800 by building your own hybrid cloud with Oracle Grid Engine.


Post a Comment:
Comments are closed for this entry.



« April 2014