Thursday Jul 15, 2010

Oracle Powers High Tech Archeological Research

While archeology often conjures up images of dusty treks across the Dead Sea searching for artifacts, the InscriptiFact system from USC makes use of advanced Oracle technology including Oracle Database and Oracle Weblogic Products, along with high tech Reflectance Transformation Imaging (RTI) to bring images of not only the Dead Sea Scrolls, but inscriptions from 1000's of other archeological artifacts to students and researchers worldwide. I spent the afternoon yesterday at USC visiting with Professor Bruce Zuckerman, the brainchild behind InscriptiFact. While getting to see in real life some of USC's archeological collections was amazing in itself, viewing the same artifacts in the InscriptiFact system was even more amazing.

While InscriptiFact includes artifact images dating to the early 1900's (the artifacts themselves are often 1000's of years old), some of the most amazing images are relatively new RTI images. Understanding how RTI images are created is best done by showing the Melzian Dome used to capture the images.

The dome has 32 computer-controlled LED lights and multiple exposures are taken of the same artifact using different lighting combinations and then merged into a single image file. Using the InscriptiFact viewer, a Java application that can run on any PC or laptop, a user can dynamically change the lighting on the image being viewed. Seeing is believing, so lets take a look at an example.

InscriptiFact provides the ability to compare conventional images along-side RTI images. Illustrated above is an Aramaic tablet from Persepolis, ancient Persia, with a seal impression. The images on the left are visible light and infrared images taken with high-end digital scanning back. The images on the right are versions of an RTI image, one showing the natural color of the object, the other using specular enhancement. Even to the untrained eye, one can clearly understand the power of RTI to bring often better than lifelike detail to ancient artifacts.

While the RTI images are visually the most powerful aspect of InscriptiFact, the real value of the system goes much farther based on the power of the InscriptiFact user interface and underlying Oracle Database. Take for instance the spacial search feature. This feature allows researchers to drag a box on a reference image and retrieve all images that intersect the box.

InscriptiFact is designed to incorporate, integrate and index all existing image data in a quick and intuitive fashion regardless of what repository or collection the artifact (or fragments, thereof) exist in. In the example below, the original table on which an ancient myth was written was broken, and pieces ended up in two different museums. Using InscriptiFact, a researcher can easily retrieve images of all the images for viewing on a single screen.

Not only is InscriptiFact a powerful tool in its own right for anyone from post-grad archeologists to grade school students, its a wonderful example of what is possible through the integration of advanced imaging, advanced database and Java technology, and the Internet to span both space and time. Visit the InscriptiFact web site to learn more.

Tuesday Jul 13, 2010

Oracle Grid Engine on AWS Cluster Compute Instances

Amazon Web Services (AWS) today announced a big step forward for customers who want to run HPC applications in the cloud with their new Compute Cluster Instances. No surprise, Oracle Grid Engine fans like BioTeam didn't take long to notice and try it out. Lets dig a little deeper into the new AWS Compute Cluster Instance and see what folks are so excited about and why Oracle Grid Engine is almost a must-have for customers wanting to take advantage of Compute Cluster Instances.

To put things in perspective, the new Compute Cluster Instances should be compared to other AWS instance types. According to Amazon, a standard AWS EC2 compute unit is normalized to "the equivalent CPU capacity of a 1.0-1.2 GHz 2007 Opteron or 2007 Xeon processor". The new Compute Cluster Instance is equivalent to 33.5 EC2 compute units. On the surface, that isn't that much more powerful than the previous 26 EC2 compute unit High-Memory Quadruple Extra Large Instance (although the name is certainly simpler). What is different is the Compute Cluster Instance architecture. You can cluster up to 8 Compute Cluster Instances for 64 cores or 268 EC2 compute units. With the Compute Cluster Instance, Amazon provides additional details on the physical implementation, calling out "2 x Intel Xeon X5570, quad-core Nehalem architecture" cores per instance. Perhaps more importantly, while other AWS instance types only specify IO capability as "moderate" or "high", the Compute Cluster Instance comes with "full bisection 10 Gbps bandwidth between instances". While there is a certain value in consistency in advertising compute instances as standard EC2 compute units and IO bandwidth as moderate or high, I applaud Amazon on their increased transparency in calling out both the specific Intel x5570 CPU and the specific 10GbE IO bandwidth of the new Compute Cluster Instances.

So what about Oracle Grid Engine makes it so useful for the new Compute Cluster Instances. AWS already offers customers a broad range of Oracle software on EC2 ranging from Oracle Enterprise Linux to Oracle Database and Oracle WebLogic server and you can download pre-built AWS instances directly from Oracle. Don't take my word for it, read about what joint Oracle/AWS customers like Harvard Medical School are doing with Oracle software on AWS. But back to Oracle Grid Engine. Oracle Grid Engine software is a distributed resource management (DRM) system that manages the distribution of users' workloads to available compute resources. Some of the world's largest supercomputers, like the Sun Constellation System at the Texas Advanced Computing Center use Oracle Grid Engine to schedule jobs across more than 60,000 processing cores. You can now use the same software to schedule jobs across a 64 core AWS Cluster Compute Instance.

Of course, many customers won't use only AWS or only their own compute cluster. A natural evolution of grid to cloud computing is so-called Hybrid Clouds that combine resources across public and private clouds. Oracle Grid Engine already handles that too, enabling you to automatically provision additional resources from the Amazon EC2 service to process peak application workloads, reducing the need to provision datacenter capacity according to peak demand. This so-called cloud bursting feature of Oracle Grid Engine is not new, its just that you can now cloud burst onto a much more powerful AWS Compute Cluster Instance.

One of Oracle's partners who has been doing a lot of work with Oracle Grid Engine in the cloud is Univa UD. I had the opportunity to speak to Univa's new CEO, Gary Tyreman today about how they are helping customers build private and hybrid clouds using Oracle Grid Engine running on top of Oracle VM and Oracle Enterprise Linux. Gary told me Univa has been beta testing the AWS Compute Cluster Instance for several months and that it has worked flawlessly with Oracle Grid Engine and Oracle Enterprise Linux. Gary also noted that they are working with a number of Electronic Design Automation (EDA) customers that need even more powerful virtual servers than the ones available on AWS today. We have several joint customers that are evaluating the new Sun Fire x4800 running Oracle VM as supernodes for running EDA applications in private clouds. To put it in perspective, a single x4800 running Oracle VM can support up to 64 cores and 1 TB of memory. That is as much CPU power and many times the memory of a full 8 node AWS Compute Cluster Instance in a single 5RU server! Now that is a powerful cloud computing platform.

If you want to hear more from Gary about what Univa is doing with some of their cloud computing customers, download his Executive Roundtable video. I'd love to hear from some additional customers who are using Oracle Grid Engine on the new AWS Compute Cluster Instances. Who knows, maybe in the future Amazon will even offer a Super Duper Quadruple Extra Large Cluster Compute Instance based on the a singe 64 core, 1 TB server like the Sun Fire x4800. Meanwhile, you can easily take advantage of both Compute Cluster Instances and the x4800 by building your own hybrid cloud with Oracle Grid Engine.

Monday Jul 05, 2010

Oracle PASIG Meeting

I had the pleasure of spending the day at the Oracle's Preservation and Archiving Special Interest Group (PASIG) meeting today in beautiful Madrid, in advance of this week's Open Repositories conference. Any mental images of a classic librarian should be cast aside as practitioners from many of the world's leading digital libaries came together to discuss preservation and archiving. For more information on the PASIG, visit the main PASIG web site. Below are some of my notes from today's meeting.

Tom Cramer, Chief Technology Strategist and Associate Director, Digital Library Systems and Services, Stanford University, started off the morning. One of the interesting points Tom made was how Stanford seamlessly pulls data from five digital systems in the process of archiving student thesis papers. Starting with student and professor information from Stanford's Oracle Peoplesoft campus information system, archive metadata is automatically populated and combined with thesis PDFs, a new library catalog data record is automatically created, and finally, PDFs and associated metadata are automatically crawled and published to the world via Google Books.

Next, Oxford's Neil Jefferies took the discussion a bit deeper and talked about the changing nature of intellectual discourse. While Oxford's collection holds over 250 km shelf-miles of paper books, the library is increasingly working to archive more ephemeral university data sources including websites, social media, and linked data. A consistent theme discussed by Neil and many of the other speakers was the increasing focus on providing not only archive and preservation but also access to data.

On formally to the continent, Laurent DuPlouy and Olivier Rouchon from the French National Library presented on the SPAR Project and CINES Collaboration. They were brave enough to show a live demo of their system, including use of a StorageTek SL8500 Modular Library System.

Back to the UK, Brian Hole from The British Library presented on the LIFE3 project which aims to model the long term preservation lifecycle costs of digitized data. Brian's taking suggestions for improvements in LIFE4 and and I suggested he including in his model the Oracle Secure Backup Cloud module which can securely backup databases to Amazon S3 cloud storage.

After a wonderful Spanish lunch the first panel session of the day started with discussions on Community and Tool Set collaborations.

DuraSpace CEO Sandy Payette presented on the Platform as a Service (PaaS) offering DuraCloud..

Richard Jones presented on the SWORD project on repository interoperability. Read and comment on the SWORD whitepaper.

Jan Reichelt, founder and director of Mendeley reference management software used to organize, share, and discover academic research papers. Mendeley tracks over 28 million research papers including information on most read papers and authors.

David Tarrant of EPrints discussed how EPrints software is used to create and manage repositories.

Finally, Bram van der Werf of Open Planets Foundation described the Open Planets suite of tools for managing digital data.

After the panel presentation, we heard from a series of Oracle speakers. The Oracle Enterprise Content Management Suite 11g is broadly applicable to preservation and archive, capable of archiving over 179 million documents a day as shown in a recent benchmark. Of course, many PASIG customers already use the Sun Storage Archive Manager software along with StorageTek modular library systems and there were updates from Oracle speakers on all of products and more.

The final session included short presentations from a number of Oracle software partners in the archive and preservation space. I definitely learned a lot today about what some of the world's leading digital libraries are doing on the preservation and archive front, and hopefully it was a day well spent for all who attended. If you are not already a PASIG member, be sure to signup now, for this growing Oracle community.




« July 2010