Tuesday Sep 29, 2009

OpenDS in the cloud on Amazon EC2

<script type="text/javascript"> var gaJsHost = (("https:" == document.location.protocol) ? "https://ssl." : "http://www."); document.write(unescape("%3Cscript src='" + gaJsHost + "google-analytics.com/ga.js' type='text/javascript'%3E%3C/script%3E")); </script> <script type="text/javascript"> try { var pageTracker = _gat._getTracker("UA-12162483-1"); pageTracker._trackPageview(); } catch(err) {}</script>


Why not run your Authentication service in the cloud? This is the first step to having a proper cloud IT. There are numerous efforts going to ease deploying your infrastructure in the cloud, from Sun and others, from OpenSSO to glassfish, from SugarCRM to Domino, and on goes the list. Here is my humble contribution for OpenDS.

Bird's Eye View

 Tonight I created my EC2 account and got OpenDS going on the Amazon infrastructure in about half an hour, I will retrace my steps here and point out some of the gotchas.

The Meat

Obviously, some steps must be taken prior to installing software.

First, you need an AWS (Amazon Web Services) account with access to EC2 (Elastic Compute Cloud) and S3 (Simple Storage Service). I will say this about EC2, it is so jovially intoxicating that I would not be surprised to be surprised by my first bill when it comes... but that's good, right? At least for amazon it is, yes.

Then you need to create a key pair, trivial as well. Everything is explained in the email you receive upon subscription.

Once that's done, you can cut to the chase and log on to the AWS management console right away to get used to the concepts and terms used in Amazon's infrastructure. The two main things are an instance and a volume. The names are rather self explanatory, the instance is a running image of an operating system of your choice. The caveat is that if shut it down, the next time you start this image, you will be back to the vanilla image. Think of it as a LiveCD. Can't write persistent data to it, if you do, it won't survive a power cycle.

To persist data between cycles, we'll have to rely on volumes for now. Volumes are just what they seem to be, only virtual. You can create and delete volumes at will, of whatever size you wish. Once a volume is created and becomes available, you need to attach it to your running instance in order to be able to mount it in the host operating system. CAUTION: look carefully at the "availability zone" where your instance is running, the volume must be created in the same zone or you won't be able to attach it.

 Here's a quick overview of the AWS management console with two instances of OpenSolaris 2009.06 running. The reason I have two instances here is that one runs OpenDS 2.0.0 and the other runs DSEE 6.3 :) -the fun never ends-. I'll use it later on to load OpenDS.

My main point of interest was to see OpenDS perform under this wildly virtualized environment. As I described in my previous article on OpenDS on Acer Aspire One, virtualization brings an interesting trend in the market that is rather orthogonal to the traditional perception of the evolution of performance through mere hardware improvements...

In one corner, the heavy weight telco/financial/pharmaceutical company weighing in at many millions of dollars for a large server farm dedicated to high performance authentication/authorization services. Opposite these folks, the ultra small company curled in the other corner, looking at every way to minimize cost in order to simply run the house while allowing to grow the supporting infrastructure as business ramps up.

Used to be quite the headache, that. I mean it's pretty easy to throw indecent amounts of hardware at meeting crazy SLAs. Architecting a small, nimble deployment yet able to grow later? Not so much. If you've been in this business for some time, you know that every iteration of sizing requires to go back to capacity planning and benchmarking which is too long and too costly most of the time. That's where the elastic approaches can help. The "cloud" (basically, hyped up managed hosting) is one of them.

Our team also has its own, LDAP-specific, approach to elasticity, I will talk about that in another article, let's focus on our "cloud" for now. 

 Once your instance is running, follow these simple steps to mount your volume and we can start talking about why EC2 is a great idea that needs to be developed further for our performance savvy crowd.

In this first snapshot, I am running a stock OpenDS 2.0.0 server with 5,000 standard MakeLDIF entries. This is to keep it comparable to the database I used on the netbook. Same searchrate, sub scope, return the whole entry, across all 5,000.

If this doesn't ring a bell? Check out the Acer article. Your basic EC2 instance has about as much juice as a netbook. Now the beauty of it all is that all it takes on my part to improve the performance of that same OpenDS server is to stop my "small" EC2 instance and start a medium one.


  I've got 2.5 times the initial performance. I did not change ONE thing on OpenDS, this took 3 minutes to do, I simply restarted the instance with more CPU. I already hear you cry out that it's a shame we can't do this live -it is virtualization after all- but I'm sure it'll come in due course. It is worth noting that even though I could use 80+% of CPU on the small instance of OpenDS, in this case I was only using about 60% so the benefit would likely be greater but I would need more client instances. This imperfect example still proves the point on the ease of use and the elasticity aspect.

The other thing that you can see coming is an image of OpenDS for EC2. I'm thinking it should be rather easy to script 2 things:

1) self-discovery of an OpenDS topology and automatic hook up in the multi master mesh and

2) snapshot -> copy -> restore the db, almost no catch up to do data wise. If you need more power, just spawn a number of new instances: no setup, no config, no tuning. How about that ?

Although we could do more with additional features from the virtualization infrastructure, there is already a number of unexplored options with what is already there. So let's roll up our sleeves and have a serious look. Below is a snapshot of OpenDS modrate on the same medium instance as before with about 25% CPU utilization. As I said before, this thing has had NO fine tuning whatsoever so these figures are with the default, out-of-the-box settings.

  I would like to warmly thank Sam Falkner for his help and advice and most importantly for teasing me into trying EC2 with his FROSUG lightning talk! That stuff is awesome! Try it yourself.


Directory Services Tutorials, Utilities, Tips and Tricks


« February 2017