Monday Dec 21, 2009

Twice Faster: Import Tips And Tricks

Rationale

We understand. Really, we do.

You want to import your data as fast possible. Not only nobody likes to wait but there are time constraints on everyone of us. In particular, we're constrained by maintenance windows.

So, what does give you the best return on investment? Higher CPU Clock speed? More CPUs? More memory? We're here to discover just that together.

Irrespective of the litterature on the topic, you must make sure that import will not have to be done in multiple passes or you'll get killed on index merging time. To do so, shoot high and dial down gradually to find the right cache size for your data set.

Bird's Eye View

In essence, what I did was to start an import on a vanilla system and Directory Server with a customer's actual data, 7.9M entries. The system I am testing with isn't a server. The system doesn't matter, it is a constant. The variable are the amount of import cache, the number of CPUs active (1-8) and the CPU clock speed (from 3.3 to 3.7GHz). In short, memory matters most.

The Meat

The Setup

The instance I am doing this with is an actual telco setup, with 7.9M entries in the LDIF file. The LDIF weighs in at 12GiB. There are 10 custom indexes configured for equality only.The 8 system indexes are there as well.

On the physical side of things, the machine is my desktop, an Intel Corei7 975EE @ 3.3GHz. It has a 64GB Intel X25 and a 10,000 rpm SATA drive. The disk layout is described in more detail here.

Sensivity To Import Cache 

Despite what the documentation says, there are huge gains to be reaped from increasing the import cache size, and depending on your data set, this may make  a world of difference.

This is the first thing that I tweaked during this test first phase and bumping import cache from 2GiB to 4GiB chopped import time in half. Basically, if your import has to occur in more than a single pass, then your import cache isn't big enough, try to increase it if your system can take it.

Sensivity To Clock Speed

Ever wondered if a system with CPUs twice faster would buy you time on import? Not really. Why? Well, if the CPUs are waiting on the disks or locks then higher clock speeds isn't going to move the needle at all. That's what's going on here. Check this out...

The reason the 3.7GHz import isn't as fast as the 3.3GHz is because my overclocking might have thrown off the balance between the core clock and the bus clock, so the CPU is spinning its wheels ,waiting to access memory and IO...

I officially declare this one moot. I'll try again later with underclocking.

Sensivity To Number Of CPUs

Scalability is an interesting challenge. Ideally, you'd want half the import time given twice the resources. In reality, import is very lock intensive to avoid collisions/corruptions so it isn't quite that linear. Here's what I got on my system, all other things being equal.

So even though the scalability isn't linear, the good thing is the more CPUs the better your performance is going to be.

<script type="text/javascript"> var gaJsHost = (("https:" == document.location.protocol) ? "https://ssl." : "http://www."); document.write(unescape("%3Cscript src='" + gaJsHost + "google-analytics.com/ga.js' type='text/javascript'%3E%3C/script%3E")); </script> <script type="text/javascript"> try { var pageTracker = _gat._getTracker("UA-12162483-1"); pageTracker._trackPageview(); } catch(err) {}</script>

Saturday Dec 19, 2009

DPS Coherence Plug-In: New Features

<script type="text/javascript"> var gaJsHost = (("https:" == document.location.protocol) ? "https://ssl." : "http://www."); document.write(unescape("%3Cscript src='" + gaJsHost + "google-analytics.com/ga.js' type='text/javascript'%3E%3C/script%3E")); </script> <script type="text/javascript"> try { var pageTracker = _gat._getTracker("UA-12162483-1"); pageTracker._trackPageview(); } catch(err) {}</script>

Rationale

After having conversations on this plug-in both internally and with a handful of you out there, I sat down and added what I felt was the main missing features in my current implementation. This mainly covers security but also some flexibility. 

Bird's Eye View

On the security front, you now have the ability to filter out attributes so they don't make it in the cache (e.g. userPassword). What's more, you can filter out what entries you don't want in the cache in order to avoid the cache to be polluted by the occasional non-production (e.g. administration) hits.

On the flexibility front, you can set a time to live for entries that do make it in the cache. This allows to control whether you want to retain a value forever in the cache (default) or you want to make sure that it gets evicted after a certain time. You can also provide a list of regular expressions for the Bind DN (the identity) you grant access to the cache. And of course, you can decide to inlcude (default) or exclude unauthenticated clients to access the cache as well.

The Meat 

  • attributeNotToCache: userPassword
  • This attribute in the plugin's configuration entry can be multivalued and is not a regular expression but a plain string. Case, as always in LDAP, matters not. Any attribute name matching on of the provided values will be stripped from the entry before storing in Coherence.

  • dnNotToCache: .\*,ou=secret,o=o
This attribute can multivalued and allows to prevent DNs matching the regular expression to be stored in Coherence.
  • cacheForBindDN: cn=[2-3],o=o
This is attribute can be multivalued. It is a regular expression. Any authenticated clients' Bind DN must match one of the provided regular expressions to be granted access to the contents stored in Coherence.
  • cacheForAnonymous: false
This attribute is single valued. It is a boolean, either true or false. When false, unauthenticated clients will not be granted access to the contents stored in Coeherence and will therefore always hit the back-end.
  • cacheExpirationInMS: 30000

This attribute is single valued. It is a long and represents the length of time in milliseconds that an entry should be kept in the cache after the last time it has been accessed.

So, in the end, here is an example configuration entry:

dn: cn=CoherencePlugin,cn=Plugins,cn=config
objectClass: top
objectClass: configEntry
objectClass: plugin
objectClass: extensibleObject
cn: CoherencePlugin
description: Oracle Coherence Cache Plugin
enabled: true
pluginClassName: com.sun.directory.proxy.extensions.CoherencePlugin
pluginType: preoperation search
pluginType: search result entry
pluginType: postoperation delete
pluginType: postoperation modify
pluginType: postoperation modify dn
cacheName: LDAPCache
attributeNotToCache: userpassword
attributeNotToCache: aci
dnNotToCache: .\*,ou=secret,o=o
dnNotToCache: .\*,cn=config
cacheForBindDN: cn=[2-3],o=o
cacheForBindDN: uid=user.[0-9]+,ou=People,o=o
cacheForAnonymous: false
cacheExpirationInMS: 30000

That's it for my Friday night, let me know if there is more than DPS+Coherence can do for you!

As always, if you want to try this DPS plug-in, ping me: arnaud -at- sun -dot- com

Friday Dec 18, 2009

More ZFS Goodness: The OpenSolaris Build Machine

<script type="text/javascript"> var gaJsHost = (("https:" == document.location.protocol) ? "https://ssl." : "http://www."); document.write(unescape("%3Cscript src='" + gaJsHost + "google-analytics.com/ga.js' type='text/javascript'%3E%3C/script%3E")); </script> <script type="text/javascript"> try { var pageTracker = _gat._getTracker("UA-12162483-1"); pageTracker._trackPageview(); } catch(err) {}</script>

Apart from my usual LDAP'ing, I also -try to- help the opensolaris team with anything I can.

Lately, I've helped build their new x64 build rig, for which I carefully selected the best components out there while trying to keep the overall box budget on a leash. It came out at about $5k. Not on the cheap side, but cheaper than most machines in most data centers.

The components:

  • 2 Intel Xeon E5220 HyperThreaded QuadCores@2.27GHz. 16 cpus in solaris
  • 2 32GB Intel X25 SSD
  • 2 2TB WD drives
  • 24GB ECC DDR2

I felt compelled to follow up my previous post about making the most out your SSD because some people commented that non mirrored pools were evil. Well, here's how this is set up this time: in order to avoid using either of the relatively small SSDs for the system, I have partitioned the big 2TB drives with exactly the same layout, one 100GB partition for the system, the rest of the disk is going to be holding our data. This leaves our SSD available for the ZIL and the L2ARC. But thinking about it, the ZIL is never going to take up the entire 32GB SSD. So I partitioned one of the SSDs with a 3GB slice for the ZIL and the rest for L2ARC.

The result is a system with 24GB of RAM for the Level 1 ZFS cache (ARC) and 57GB for L2ARC in combination with a 3GB ZIL. So we know it will be fast. But the icing on the cache ... the cake sorry, is that the rpool is mirrored. And so is the data pool.

Here's how it looks: 

admin@factory:~$ zpool status
  pool: data
 state: ONLINE
 scrub: none requested
config:

    NAME        STATE     READ WRITE CKSUM
    data        ONLINE       0     0     0
      c5d1p2    ONLINE       0     0     0
      c6d1p2    ONLINE       0     0     0
    logs
      c6d0p1    ONLINE       0     0     0
    cache
      c5d0p1    ONLINE       0     0     0
      c6d0p2    ONLINE       0     0     0

errors: No known data errors

  pool: rpool
 state: ONLINE
 scrub: none requested
config:

    NAME        STATE     READ WRITE CKSUM
    rpool       ONLINE       0     0     0
      mirror-0  ONLINE       0     0     0
        c5d1s0  ONLINE       0     0     0
        c6d1p1  ONLINE       0     0     0

errors: No known data errors
admin@factory:~$

 This is a really good example of how to setup a real-life machine designed to be robust and fast without compromise. This rig achieves performance on par with $40k+ servers. And THAT is why ZFS is so compelling.

Wednesday Dec 16, 2009

Oracle Coherence Support Right In DPS!!!

Rationale

Why do caching in DPS ?

The Directory Server back-ends are not able to "absorb" as many updates when they're stressed with a large proportion of searches. After all there's already caching on the back-end Directory Server itself. It helps a lot performance, since reads return to the client faster, it relieves some of the stress and frees up resources to take care of the writes that lock resources for longer atomic stretches of time. But as long as searches hit the back-end, even with cache, there's some weight lifting to be done: open the connection, parse the request, put the request in the work queue, lookup entries in the cache, return the entry, close the connection...

That's why caching right in DPS started to look appealing to me.

Why Coherence ?

Well, as much as one may think it is because Sun is about to be gobbled up by Oracle that I made this choice but the answer is no. Coherence is simply a compelling choice, these guys seem to have made all the technical choices I would have ... and then some. For one, you download the bits, then just start it and it works. It may sound like a marketing pitch but see for yourself. Excellent 0 to 60 in my book. Once you have it working, you get acquainted with it, the protocol is dead simple, the API is clean, robust and pretty lean. After that, you check the docs out (that's right, I read the docs after the fact) and start to realize how powerful a platform it is, how expandable it is, how far you can push deployments to accommodate growing performance or reliability needs. 

Bird's Eye View

The integration with Coherence is done by way of DPS (7+) plug-in that will -asynchronously- populate a Coherence cache with entries being returned by your regular traffic. When requests come in, a lookup is done to check if the entry is present in the cache and returned immediately if it is the case, otherwise the request is routed to the back-end as usual.

Note that I'm not making any claims on the performance aspect from the client's perspective of this caching approach because our Directory Server back-end is already pretty darn fast. This for sure relieves it from a bit of "frequent" traffic and will certainly benefit the overall performance of the topology. The relief will most certainly result in improved write response times but nothing speaks to the performance of the Coherence cache lookup. I just haven't collected enough data so far.

The Meat

nitty gritty anyone ?

Suppose we have a setup like this ... 

 the first read would be processed as described below:


but the second one would be processed without hitting the back-end.

Understand the tremendous impact this will have on your back-end ?

It frees it up to process writes or some heavy weight searches...

How to get the plug-in? ask nicely. 

Tuesday Dec 15, 2009

DSEE 7.0 On Windows Gotcha

<script type="text/javascript"> var gaJsHost = (("https:" == document.location.protocol) ? "https://ssl." : "http://www."); document.write(unescape("%3Cscript src='" + gaJsHost + "google-analytics.com/ga.js' type='text/javascript'%3E%3C/script%3E")); </script> <script type="text/javascript"> try { var pageTracker = _gat._getTracker("UA-12162483-1"); pageTracker._trackPageview(); } catch(err) {}</script>

Rationale

When you just want it to work quick and easy, and since the zip distribution lends itself particularly well to that with only extracting the archive in the way to being productive with your Directory Server, you only skim through the install guide. Well, at least if you're as impatient as I am. Make sure to read this. 

Bird's Eye View

After the quick and painless install, you add the single bin folder to your PATH environment and get crackin'. Directory Server instances can be created. As can Directory Proxy Server instances. But the usual command line in the DSRK, the indispensable ldapsearch, doesn't. What? You missed one tiny component that still needs to be "installed" on your machine. 

The Meat 

If when you try ldapsearch from DSEE 7.0 on your vanilla windows install you get the message below: 

 I'll make a quick text note here so it's searchable...

The application was unable to start correctly (0xc0150002)

Then it only means that you forgot to install the Visual C++ Redistributable Package, as stated in a tiny Note- block in the install guide which had completely escaped me during my quick skimming. Note that the doc points you to downloading the bits from microsoft but since we're nice guys, the installer is included in the DSEE ZIP archive, at the root of the zip file and is called vcredist_x86.exe. Be sure to install that if you need the ldap\* utilities.

happy LDAP'ing

Monday Dec 14, 2009

Import: Intel's HyperThreading Performance Gain

<script type="text/javascript"> var gaJsHost = (("https:" == document.location.protocol) ? "https://ssl." : "http://www."); document.write(unescape("%3Cscript src='" + gaJsHost + "google-analytics.com/ga.js' type='text/javascript'%3E%3C/script%3E")); </script> <script type="text/javascript"> try { var pageTracker = _gat._getTracker("UA-12162483-1"); pageTracker._trackPageview(); } catch(err) {}</script>

Rationale

 As I am experimenting with more and more of the new-ish Intel Nehalem-based servers, a question kept bugging me. How much am I  getting out of these "hyper" threads that appear as CPUs in the operating system? I took some time to compare a Directory Server import on my corei7 workstation with an actual customer's data set of 7.9 Million entries.

Bird's Eye View

Don't bet the house on HyperThreading.

The Meat

To conduct this test in a simple fashion, all I did was an import of these entries first with all the CPUs enabled and then by disabling all the odd CPUs. Doesn't get any simpler. All other things can be considered equal.

The result is quite stunning really.



 Import Time
 8 CPUs (with HyperThreading)
 30:56.62
 4 CPUs (without HyperThreading)
 35:04.60

Which makes a difference of about 13%...

Just thought someone out there would find this as puzzling as I. Quite honestly, I  believe this is shared between the Directory Server not scaling too well, the IO actually being a contention point and HyperThreading being somewhat ineffective. More on that later.

Make The Most Of Your SSD With ZFS

<script type="text/javascript"> var gaJsHost = (("https:" == document.location.protocol) ? "https://ssl." : "http://www."); document.write(unescape("%3Cscript src='" + gaJsHost + "google-analytics.com/ga.js' type='text/javascript'%3E%3C/script%3E")); </script> <script type="text/javascript"> try { var pageTracker = _gat._getTracker("UA-12162483-1"); pageTracker._trackPageview(); } catch(err) {}</script>

Rationale

 If you're anything like me, you're lucky if you have a single SSD in your workstation or server. I had a dilemma in the past, I couldn't quite find a way to make the most of my one flash drive: I had to chose between making it a ZIL or use it as L2ARC. I dreaded having to make a definitive choice for one or the other. When I installed my workstation with OpenSolaris 2009.06, I had an idea in mind, so I installed the system on the SSD in a small partition (10GB) and left the rest of the drive unallocated if you catch my drift...

Bird's Eye View

Simple! Just partition the SSD to be able to use it as both L2ARC and ZIL in whatever proportions you think is going to suit  your needs. Note however that the IOs are shared between your partitions on the same drive. From my testing though, I can tell you that with this setupu you're still coming out on top in most situations.

The Meat

It's all pretty simple really, when you install solaris, you have a choice of installing on "whole disk" or to use the tool to make a smaller custom partition. I cut out a 36GB partition which allows ample room for the system and swap. The rest of my 64GB SSD is left unallocated at install time, we'll take care of everything later.

The second disk in my system is a 300GB 10,000 rpm SATA drive which, being fast but small, I wanted to leave whole for my data pool (keep in mind that the rpool is a little different than your regular pool, so make sure to treat it accordingly). That is why I decided to compromise and use some of the SSD space for the system. You don't have, you could partition your spindle and have the system on there.

Now, that you have opensolaris up and running, install GParted to be able to edit your disks partitions. You can either use the opensolaris package manager or

pfexec pkg install SUNWGParted

It's all downhill from here. Open GParted. If you just installed it, you will need to log out and back in to see in the GNnome menu. It will be in Applications->System tools->GParted Partition Editor

Select your flash drive and carve out a 2GB partition for your ZIL and assign the remaining space for L2ARC. Apply the changes and keep the window open.


Note the two devices path in /dev/dsk because that's what we'll use to add these two SSD partitions as performance enhancing tools in our existing pool.

arnaud@ioexception:/data/dsee7.0/instances$ pfexec zpool add data log /dev/dsk/c9d0p2 cache /dev/dsk/c9d0p3

Let's check how our pool looks now...

arnaud@ioexception:/data/dsee7.0/instances$ zpool status data
  pool: data
 state: ONLINE
 scrub: none requested
config:

    NAME        STATE     READ WRITE CKSUM
    data        ONLINE       0     0     0
      c8d0      ONLINE       0     0     0
    logs
      c9d0p2    ONLINE       0     0     0
    cache
      c9d0p3    ONLINE       0     0     0

errors: No known data errors

Et voila!

You've got the best of both worlds, making the absolute most of whatever little hardware you had at your disposal!

Enjoy!

Add And Remove ZILs Live!

<script type="text/javascript"> var gaJsHost = (("https:" == document.location.protocol) ? "https://ssl." : "http://www."); document.write(unescape("%3Cscript src='" + gaJsHost + "google-analytics.com/ga.js' type='text/javascript'%3E%3C/script%3E")); </script> <script type="text/javascript"> try { var pageTracker = _gat._getTracker("UA-12162483-1"); pageTracker._trackPageview(); } catch(err) {}</script>

Rationale

Ever been playing with separate logs (separate ZIL, logzilla, etc...) and had to rebuild the pool everytime you wanted to yank the slog off ?

Not so anymore! The zil can now be added and removed as you like! Truly fantastic to tinker with it and observe its actual impact on performance. A quick walk through one of the most painless migrations of all times.

Bird's Eye View

As ZILs get more and more exposure to production environments, technical staff is getting to experiment more and more to be able to make recommendations and the new removal feature adds a lot more flexibility to try creative combinations.

The Meat

Let's suppose you have a pool with a separate log:

arnaud@ioexception:/data/dsee7.0/instances$ zpool status data
  pool: data
 state: ONLINE
status: The pool is formatted using an older on-disk format.  The pool can
    still be used, but some features are unavailable.
action: Upgrade the pool using 'zpool upgrade'.  Once this is done, the
    pool will no longer be accessible on older software versions.
 scrub: none requested
config:

    NAME                      STATE     READ WRITE CKSUM
    data                      ONLINE       0     0     0
      c8d0                    ONLINE       0     0     0
    logs
      /dev/ramdisk/zil-drive  ONLINE       0     0     0

errors: No known data errors

If you try to remove the log by removing the actual separate log device, you'll get the following error:

arnaud@ioexception:/data/dsee7.0/instances$ pfexec ramdiskadm -d zil-drive
ramdiskadm: couldn't delete ramdisk "zil-drive": Device busy

If you now try to use the zpool remove command, you will also hit a wall:


arnaud@ioexception:/data/dsee7.0/instances$ zpool remove data log /dev/ramdisk/zil-drive
cannot remove log: no such device in pool
cannot remove /dev/ramdisk/zil-drive: pool must be upgrade to support log removal

So let's just follow up on the suggestion and upgrade the pool: 



arnaud@ioexception:/data/dsee7.0/instances$ pfexec zpool upgrade data
This system is currently running ZFS pool version 22.

Successfully upgraded 'data' from version 14 to version 22

arnaud@ioexception:/data/dsee7.0/instances$ zpool status data
  pool: data
 state: ONLINE
 scrub: none requested
config:

    NAME                      STATE     READ WRITE CKSUM
    data                      ONLINE       0     0     0
      c8d0                    ONLINE       0     0     0
    logs
      /dev/ramdisk/zil-drive  ONLINE       0     0     0

errors: No known data errors

Wow, that's quick, easy and smooth. Gotta love migrations of that sort. Let's now try to remove our separate log:


arnaud@ioexception:/data/dsee7.0/instances$ pfexec zpool remove data log /dev/ramdisk/zil-drive
cannot remove log: no such device in pool
arnaud@ioexception:/data/dsee7.0/instances$ zpool status data  pool: data
 state: ONLINE
 scrub: none requested
config:

    NAME        STATE     READ WRITE CKSUM
    data        ONLINE       0     0     0
      c8d0      ONLINE       0     0     0

errors: No known data errors

So even though it barfed an error message, my separate log has been removed from my pool and I will be able to decommission my device.

arnaud@ioexception:/data/dsee7.0/instances$ pfexec ramdiskadm -d zil-drive
arnaud@ioexception:/data/dsee7.0/instances$

And I can now create a new device and do some more testing. Props to the ZFS for the ever improving level of service this file system brings to the table!

Tuesday Dec 08, 2009

Using Wireshark Do Dissect LDAP Traffic

<script type="text/javascript"> var gaJsHost = (("https:" == document.location.protocol) ? "https://ssl." : "http://www."); document.write(unescape("%3Cscript src='" + gaJsHost + "google-analytics.com/ga.js' type='text/javascript'%3E%3C/script%3E")); </script> <script type="text/javascript"> try { var pageTracker = _gat._getTracker("UA-12162483-1"); pageTracker._trackPageview(); } catch(err) {}</script>

Rationale

Every once in a while, an elusive issue may drive you bonkers because you can't quite identify what is going awry, which of course is crucial to solving said issue. In such cases, it's good to go back to the basics and actually analyze what is flowing on the wire. To do so, snoop is the tool. But more important than capturing the data, which is fairly well documented on the web, analyzing properly what the capture contains may put you on track to solving the issue.

Bird's Eye View

In this entry, I'll introduce how wireshark can be used to help investigate LDAP issues or even simply to double check your LDAP product is true in its performance metrics... A word of advice, never trust a single metric at face value, check yourself. The LDAP server (or web or app server for that matter) may report a response time of X, but that really is no good to your client applications if the network card or the TCP layer has trouble getting the message through the wire. Do use snoop to collect data and wireshark to check that everything is fine or dig in the snoop files.  

The Meat

First things first, installing Wireshark.

On OpenSolaris:

1- get pkgutil

2- install wireshark

Here's how:

pfexec pkgadd -d http://blastwave.network.com/csw/pkgutil_`uname  -p`.pkg
yes|pfexec /opt/csw/bin/pkgutil --install CSWwireshark

3- start wireshark, it is installed in /opt/sfw/bin/wireshark, for example do:

nohup /opt/csw/bin/wireshark 2>/dev/null &

 4- Now that wireshark is started up, you can open up you snoop file and what you get is basic Ethernet+IP+TCP decoding., like so:

5- So we will quite simply have to tell wireshark to decode the TCP packets content as being LDAP (wherever applicable). Go to Analyze->Deocde As. A window pops up to allow you to select which decoder to use. Select the "Transport" tab, and then click on LDAP in the list. Parsing through the whole snoop file again may take a while, but once it's done, it will be worth the wait.

6- Once data is properly interpreted as LDAP, we can see that those TCP packets with an LDAP payload will now be advertised as LDAP (highlighted in green) right in the wireshark interface:

Now that you have followed these few simple steps, you can dig in the snoop data and graph statistics.

For example, you can very easily graph the ratio of LDAP operations to LDAP operation taking more than one second with the statistics graphing tool:

 Of course, there are dozens of other invaluable tools in wireshark that are not only of the greatest quality but immensely useful as well to spot problems in your traffic that many higher level tools won't be able to help with, what comes to mind is stuff like:

  • issues with lower-level than like IP problems (for e.g.  CRC errors) or TCP retransmits
  • hitting the maximum bandwith on the interface (use the statistics->IO Graphs tools and select YAxis to Bytes/Tick)
  • LDAP level SLA not being met: you can check wehther a particular LDAP Time was met or not over your snoop file as shown above
  • Check for particular LDAP return codes
  • Check for particular LDAP filters
  • Filter your results on anything contained in an LDAP packet

 As I said earlier, this tool is truly fantastic.

Hope this ends up helping someone. Enjoy! 

Monday Dec 07, 2009

DSIO: A Storage Sizing Tool For LDAP

<script type="text/javascript"> var gaJsHost = (("https:" == document.location.protocol) ? "https://ssl." : "http://www."); document.write(unescape("%3Cscript src='" + gaJsHost + "google-analytics.com/ga.js' type='text/javascript'%3E%3C/script%3E")); </script> <script type="text/javascript"> try { var pageTracker = _gat._getTracker("UA-12162483-1"); pageTracker._trackPageview(); } catch(err) {}</script>

Rationale

 Ever had this seemingly insurmountable wall of incomprehension between two teams that it was pretty obvious communication would never quite work ? Last year I had an interaction with a customer where one team manages all the storage infrastructure for the company and another manages the LDAP infrastructure. The LDAP team had to "purchase" storage from the storage team. Pricing was based on guaranteed IOPS. The LDAP folks provide authentication to the business and they are given performance requirements to meet and need to architect or expand to meet demand. The gap is that the LDAP team had no real tool to translate their LDAP performance numbers in terms of IOPS in order to purchase the right storage package.

This morning, I was on a call with a prospect who wants to run everything on VMWare ESX and swear their NAS can serve the load. Our LDAP experts told them that there just was no way to meet their very stringent performance requirements. The customer shot back the usual "show me the IOPS. So I thought I'd dust off the old DSIO DTrace script and share here since it could be reused in other instances that I do not know about.

Bird's Eye View

The DTrace script hooks up to your slapd process and intercepts calls the processing functions to be able to count the number of LDAP operations. It also intercepts lower level I/O operating system calls in order to be able to associate the LDAP operation to its subsequent I/O calls. Simple, isn't it?

The Meat

Pre-requisites

  • DTrace (obviously)
  • KSH (at least 88 but that's not much to ask) 
  • the dsio script

0 to 60 in no time

First, please note that you need to have administrative rights to run DTrace.

Second, by default dsio will look for the first slapd process it finds running, so only use this default behavior when you know for a fact that you have a single instance of Directory Server running.

# ./dsio

On Solaris:

$ pfexec ./dsio

The Rest Of The Way

- P: specify the PID of the running directory to trace (in case you'd have more than one running)

- l: print detailed information with respect to the LDAP traffic

-r: print detailed information with respect to the read I/O activity

-w: print detailed information with respect to the write I/O activity

As you can see in this case I only apply a modify load to the instance so as to make the point more explicit. Printing out the details is very useful to compare the actual counts of operations and the breakdown, the times, etc...

Note that in this case the IOPs are about half the LDAP throughput. How is that possible? By exploiting one of ZFS best features, the ZIL. I initially thought that my script was wrong and went to double check things with the ZFS folks. But that story is for another article.

Special thanks to Brendan Gregg for his invaluable scripts and tutorials on DTrace which helped me tremendously. The content this dude puts out turns DTrace black art into nothing more than shell scripting. Simply mind boggling.

Enhancements

  • Allow looping indefinitely
  • Implement a CSV output

Support

Best effort again as I'm not dedicated to this, you can send your questions and support requests to arnaud -- at -- sun -- dot -- com, or post your rants as comments here and I'll do my best to take care of you.

Enjoy!

Wednesday Dec 02, 2009

A Script To Rule Them All ... clients and backends that is

<script type="text/freezescript"> var gaJsHost = (("https:" == document.location.protocol) ? "https://ssl." : "http://www."); document.write(unescape("%3Cscript src='" + gaJsHost + "google-analytics.com/ga.js' type='text/javascript'%3E%3C/script%3E")); </script> <script type="text/freezescript"> try { var pageTracker = _gat._getTracker("UA-12162483-1"); pageTracker._trackPageview(); } catch(err) {}</script>

Rationale

With Directory Proxy Server, regardless of the version, investigating traffic can get:

  • a) real tricky
  • b) time consuming
  • c) confusing
  • d) all of the above

and the answer is ... rolling drum ... d) !

So here is a script that you can feed your DPS access log to. It will output a CSV file that you can then load in your favorite spreadsheet software or just graph with tools like gnuplot and the like... it just will make your life easy...er.

Bird's Eye View

Disclaimer: It's not as anywhere as clever as logconv.pl for Directory Server, it only munches the data so that YOU can more easily spot issues or identify patterns. So what does this script produce in the end ?

It will take your DPS 6.x/7.0 access log in and output three csv files, one with the transaction volumes (suffixed "tx"), one with the average response times (suffixed "avg") and finally one with the maximum response time over a minute (suffixed "max"). Why not all in one file? I did initially but in a csv it turned out to really not be practical. So at least when you open up one of these files you know what you're looking at.

The Meat

Pre-requisites

Since I really started this initially to simply be able to "grep" a file on a windows system, I really had no plan and no idea it would end up in a tool like this. All that to say that I wrote in python instead of our customary Java tools. At least it has the merit of existing so you don't have to start from scratch. So you'll need python, at least 2.4. If you're on Solaris or Linux, you're covered. If on windows, simply download your favorite python, I have installed the 2.6.4 windows version from here.

You will also need to download the script. You may as well get the netbeans project if you'd like to change it to adapt it to your specific needs or help improve on it.

How Does It Work

0 To 60 In No Time

python dps-log-cruncher.py access 

The Rest Of The Way

-c      : break up statistics per client
-s      : break up statistics per back-end server
-f hh:mm: start parsing at a given point in time
-t hh:mm: stop parsing after a given point in time
-h      : print this help message
-v      : print tool version

Some examples:

split the output per client for all clients:

python dps-log-cruncher.py -c \* access 

 split the output per back-end server for client 192.168.0.17:

python dps-log-cruncher.py -c 192.168.0.17 -s \* access 

 split the output for all clients, all servers:

python dps-log-cruncher.py -c \* -s \* access 

 only output results from 6pm (18:00) to 6:10pm (18:10):

python dps-log-cruncher.py -f 18:00 -t 18:10 access 

 output results between 6:00pm (18:00) to 6:10pm (18:10) and split results for all clients and back-end servers:

python dps-log-cruncher.py -f 18:00 -t 18:10 -c \* -s \* access 

Enhancements

This is a list to manage expectations as much as it is one for me to remember to implement:

  1. Selectable time granularity resolution. Currently, all data is aggregated per minute. In some case, it would be useful to be able to see what happens per second
  2. Improve error handling for parameters on the CLI.
  3. Add a built-in graphing capability to avoid having to resort to using a spreadsheet. Spreadsheets do however give a lot of flexibility
  4. Add the ability to filter / split results per bind DN
  5. Output the response time distribution

Support

Best effort is how I will label it for now, you can send your questions and support requests to arnaud -- at -- sun -- dot -- com.

Enjoy!

Saturday Oct 03, 2009

A Dashboard Like No Other: The OpenDS Weather Station

<script type="text/javascript"> var gaJsHost = (("https:" == document.location.protocol) ? "https://ssl." : "http://www."); document.write(unescape("%3Cscript src='" + gaJsHost + "google-analytics.com/ga.js' type='text/javascript'%3E%3C/script%3E")); </script> <script type="text/javascript"> try { var pageTracker = _gat._getTracker("UA-12162483-1"); pageTracker._trackPageview(); } catch(err) {}</script>

Rationale

 Doing so many benchmarks, profiling and other various performance related activities, I had to find a way to "keep an eye" on things while fetching emails, chatting on IM and the like. Having some experience in past projects with microcontrollers, although on Windows, I figured I could put together a little gizmo to help me keep tabs on my Directory Server.

Bird's Eye View

This is basically a simple setup with a USB Bit Whacker controlled by a Python script, feeding it data crunched from various sources, mainly the Directory Server access log, the garbage collection log and kstats... the result is a useful dashboard where I can see things happen at a glance.

The Meat

Everything starts with the USB Bit Whacker. It's a long story, but to cut short, a couple a years ago, Kohsuke Kawaguchi put together an orb that could be used to monitor the status of a build / unit tests in Hudson. Such devices are also know as eXtreme Feedback Devices or XFDs. Kohsuke chose to go with the USB Bit Whacker (UBW) for it is a USB 'aware' microcontroller that also draws power from the bus, and is therefore very versatile while remaining affordable ($25 soldered and tested from sparkfun but you can easily assemble your own). A quick search will tell you that this is a widely popular platform for hobbyists.

 On the software side, going all java would have been quite easy except for the part where you need platform specific libraries from the serial communication. Sun's javacomm library or rxtx have pros and cons but in my case, the cons were just too much of a hindrance. What's more, I am not one to inflict myself pain unless it is absolutely necessary. For that reason, I chose to go with Python. While apparently not as good on cross-platformedness compared to Java, installing the Python libraries for serial communication with the UBW is trivial and has worked for me right off the bat on every platform I have tried, namely: Mac OS, Linux and Solaris. For example, on OpenSolaris all there is to it is:

 $ pfexec easy_install-2.4 pySerial
Searching for pySerial
Reading http://pypi.python.org/simple/pySerial/
Reading http://pyserial.sourceforge.net/
Best match: pyserial 2.4
Downloading http://pypi.python.org/packages/source/p/pyserial/pyserial-2.4.tar.gz#md5=eec19df59fd75ba5a136992897f8e468
Processing pyserial-2.4.tar.gz
Running pyserial-2.4/setup.py -q bdist_egg --dist-dir /tmp/easy_install-Y8iJv9/pyserial-2.4/egg-dist-tmp-WYKpjg
setuptools
zip_safe flag not set; analyzing archive contents...
Adding pyserial 2.4 to easy-install.pth file

Installed /usr/lib/python2.4/site-packages/pyserial-2.4-py2.4.egg
Processing dependencies for pySerial
Finished processing dependencies for pySerial

 that's it! Of course, having easy_install is a prerequisite. If you don't, simply install setuptools for your python distro, which is a 400kB thing to install. You'll be glad you have it anyway.

Then, communicating with the UBW is mind boggingly easy. But let's not get ahead of ourselves, first things first:

Pluging The USB Bit Whacker On OpenSolaris For The First Tim

The controller will appear as a modem of the old days and communicating with equates to sending AT commands. For those of you who are used to accessing Load Balancers or other network equipment through the serial port, this is no big deal.

In the screenshot below, the first ls command output shows that nothing in /dev/term is an actual link, however, the second -which I issued after plugging the UBW on the usb port- shows a new '0' link has been created by the operating system.


Remember which link your ubw appeared as for our next step: talking to the board.

Your First Python Script To Talk To The UBW

I will show below how to send the UBW the 'V' command which instructs it to return the firmware version, and we'll see how to grab the return value and display it. Once you have that down, the sky is the limit. Here is how:

from serial import \*
ubw = Serial("/dev/term/0")
ubw.open()
print "Requesting UBW Firmware Version"
ubw.write("V\\n")
print "Result=["+ubw.readline().strip() + "]\\n"
ubw.close()

Below is the output for my board:

Voila!

That really is all there is to it, you are now one step away from your dream device. And it really is only a matter of imagination. Check out the documentation of current firmware to see what commands the board supports and you will realize all the neat things you can use it for: driving LEDs, Servos, LCD displays, acquiring data, ...

Concrete Example: The OpenDS Weather Station

As I said at the beginning of this post, my initial goal was to craft a monitoring device for OpenDS. Now you have a good idea of how I dealt with the hardware part, but an image is worth a thousand words so here is a snap...

On the software front, well, being a software engineer by trade, that was the easy part so that's almost not fun and I won't go inot as much detail but here is a 10,000ft view:

  • data is collected in a matrix of hash tables.
  • each hash table represent a population of data points for a sampling period
  • an individual time thread pushes a fresh list of hash tables in the matrix so as to reset the counters for a new sampling period

So for example, if we want to track CPU utilization, we only need to keep one metric. The hash table will only have one key pair. Easy. Slightly overkill but easy. Now if you want to keep track of transactions response times, the hash table will keep the response time (in ms) as a key and the number of transactions that were processed in that particular response time as the associated value. Therefore, if you have within one sampling period, 10,000 operations processed with 6,000 in 0 ms, 3,999 in 1ms and 1 in 15 ms, your hashtable will only have 3 entries as follows: [ 0 => 6000; 1=>3999; 15=>1 ]

This allows for a dramatic compression of the data compared to having a single line with etime for each operation, which would result in 10,000 lines of about 100 bytes.

What's more is that this representation of the same information allows to easily compute the average, extract the maximum value and calculate the standard deviation.

All that said, the weather station is only sent the last of the samples, so it always shows the current state of the server. And as it turns out, it is very useful, I like it very much just the way it worked out.

 Well, I'm glad to close down the shop, it's 7:30pm .... another busy Saturday

Friday Oct 02, 2009

Note To Self: Things To Do On A Vanilla System

<script type="text/javascript"> var gaJsHost = (("https:" == document.location.protocol) ? "https://ssl." : "http://www."); document.write(unescape("%3Cscript src='" + gaJsHost + "google-analytics.com/ga.js' type='text/javascript'%3E%3C/script%3E")); </script> <script type="text/javascript"> try { var pageTracker = _gat._getTracker("UA-12162483-1"); pageTracker._trackPageview(); } catch(err) {}</script>

I just thought I'd make a note of the common things I do and funny enough, I think this blog might be the closest thing I have from a sticky note / persistent backup ... so here goes:

 PATH=/usr/bin:/usr/sbin:/usr/gnu/bin:$PATH
# enable power management
pfexec echo "S3-support    enable" >> /etc/power.conf
pfexec pmconfig
pfexec svcadm restart hal

# disable access time update on rpool to minimize disk writes
pfexec zfs set atime=off rpool

# get pkgutil to install community software
pfexec pkgadd -d http://blastwave.network.com/csw/pkgutil_`uname  -p`.pkg

# Add repositories to the package manager

pfexec pkg set-publisher -O http://pkg.opensolaris.org/dev/ dev
pfexec pkg set-publisher -O http://pkg.opensolaris.org/contrib/ contrib
pfexec pkg set-publisher -O http://blastwave.network.com:10000/ blastwave
pfexec pkg set-publisher -O http://pkg.sunfreeware.com:9000/ sunfreeware


# download and install the flash plug-in for firefox
wget http://fpdownload.macromedia.com/get/flashplayer/current/flash_player_10_solaris_x86.tar.bz2 -O libfp.tar.bz2 --no-check-certificate
bunzip2 libfp.tar.bz2
tar xf libfp.tar
pfexec mv flash_player\*/libflashplayer.so /usr/lib/firefox/plugins
rm libfp.tar
rmdir flash_player\*

# get perfbar
wget http://blogs.sun.com/partnertech/resource/tools/perfbar.i386 -O perfbar --no-check-certificate
chmod 755 perfbar
nohup ./perfbar &

# configure coreadm
coreadm -g /var/cores/%t-%f -e global

Quad Monitor With Rotation: Where There Is A Will, There Is A Way

<script type="text/javascript"> var gaJsHost = (("https:" == document.location.protocol) ? "https://ssl." : "http://www."); document.write(unescape("%3Cscript src='" + gaJsHost + "google-analytics.com/ga.js' type='text/javascript'%3E%3C/script%3E")); </script> <script type="text/javascript"> try { var pageTracker = _gat._getTracker("UA-12162483-1"); pageTracker._trackPageview(); } catch(err) {}</script>

Talking with a friend recently, he told me about his miserable experience trying to get his workstation to work with four monitors.

Now, I was surprised at first because there are lots (ok, maybe not lots, but a sizeable number) of people with quad-head workstations out there, so obviously that seems rather doable. The trick in his case seemed to be heterogeneity: 2 different dual-head cards, and 4 different monitors of different brands and sizes. Additionally, he wanted one of his widescreens tilted in portrait mode for his coding. Nice for browsing as well, but he wanted to be able to have a tall IDE to see more code at once without the need to scroll.

It took me a while just to get the equipment but to find some spare time to this as well. I ended up with the following:

  1. a desktop that would lend itself to the experiment
  2. 4 dual head videos cards to test combinations
    1. nVidia GTX 280
    2. nVidia Quadro FX 380
    3. nVidia GeForce 9600 GT
    4. nVidia GTS 250
  3. 4 monitors
    1. Sun 24.1"
    2. Dell 22"
    3. Acer 24.3"
    4. Dell 20"
  4. a free Saturday (that was actually the most difficult component to find)

To cut short, the result is ... rolling drum ... it _can_ work once you know what to do and what not to. Here is the final result:

So how do we make that work? Well, first thing is NOT to desperately cling to TwinView. You have to let go of that, fall back on good ol' XineRama which does a fine job anyway.

As I said in my previous post, rotating the monitor is only a matter of adding Option "Rotate" "left" in the relevant screen section.

For all the X options explained, I found this quite useful. Dig in there.

What you want to be careful about:

  • if at first both cards are not recognized, worry not. Go to a terminal and issue the following command:

pfexec nvidia-xconfig -a

This will force the nvidia config utility to look across all cards.

Note that if this still doesn't work, issue:

pfexec scanpci

and write down the PCI id for each card. It is the first number right after the pci bus 0x002. In this example, this would translate into

BusID "PCI:2:0:0"

in the device section in xorg.conf

  • look at your /var/log/Xorg.0.log for errors
    • you will see something like

(II) LoadModule: "xtsol"
(WW) Warning, couldn't open module xtsol
(II) UnloadModule: "xtsol"
(II) Failed to load module "xtsol" (module does not exist, 0)

 Don't worry, that's a trusted solaris extension that is hardcoded to be loaded by X even when it's not a trusted solaris OS running, this has yet to be fixed. 

  •  make sure to enable Composite
  • make sure to enable GLX with composite
  • make sure to enable RandRRotation
  • Check /var/adm/messages for IRQ collisions which could result in some funky discrepancies. If you find any, tweak your BIOS to force each PCI slot to a distinct IRQ. The message would look similar to:

unix: [ID 954099 kern.info] NOTICE: IRQ16 is being shared by drivers with different interrupt levels

All that said, here is an example of xorg.conf with a single monitor tilted, and everything working pretty well considering that nothing is matched. It does work but doesn't come for free as you can see. There is one drawback however, I have not been able to make Compiz work because apparently the cards would have to have an SLI link between them, but I haven't confirmed that for sure. That's it for today folks!

OpenSolaris 2010.02 on EeePC 1000: Out-Of-The-Box!!!

<script type="text/javascript"> var gaJsHost = (("https:" == document.location.protocol) ? "https://ssl." : "http://www."); document.write(unescape("%3Cscript src='" + gaJsHost + "google-analytics.com/ga.js' type='text/javascript'%3E%3C/script%3E")); </script> <script type="text/javascript"> try { var pageTracker = _gat._getTracker("UA-12162483-1"); pageTracker._trackPageview(); } catch(err) {}</script>

As usual, I try to give as much away in my titles as I can. This one is no different: it just works....


With 2009.06, you needed to build your own drivers for ethernet and wifi. Pretty much a non starter for 99% of users, understandably so: when it just works for Linux and Windows, why sweat it on OpenSolaris ?

Now that dilemma is behind us: I installed an early access of 2010.02 (OpenSolaris b124) and when the installation was done, everything worked: a whole new  experience for me on OpenSolaris. I almost EXPECT to have to fiddle with a driver, a config file, an SMF service that doesn't start, ..., something.

In this case: nothing! Simultaneously gratifying and almost disappointing. I mean, even on my desktop OpenSolaris required some elbow grease to work  the way I wanted, but in this case, the coveted prize of a functional system would be handed to me without even the hint of a fight ? ... unusual, to say the least.

And that's good. I used to say that Solaris is the certainly best server OS and just as certainly the worst desktop OS, but this one shot has me wondering... maybe the Sun engineers have covered some of the ground that separates OpenSolaris from Linux. Granted, there's still ways to go! Yes the embedded 1.3 Mega Pixels webcam works but the quality of the picture is perfectible and I don't think it is the hardware... to be fair, Sun has to write their own drivers for everything so I'm even surprised it worked at all, so that pretty good!

Now there is on rather big bummer though...it does suspend but doesn't resume. Pretty big issue for a laptop which is -because of its form factor- bound to be used on the go. If I can make it work, I will post here. If you have had success make resume work, drop me a line!



Lenovo W700ds dual monitor laptop: works! Another 2010.02 success

<script type="text/javascript"> var gaJsHost = (("https:" == document.location.protocol) ? "https://ssl." : "http://www."); document.write(unescape("%3Cscript src='" + gaJsHost + "google-analytics.com/ga.js' type='text/javascript'%3E%3C/script%3E")); </script> <script type="text/javascript"> try { var pageTracker = _gat._getTracker("UA-12162483-1"); pageTracker._trackPageview(); } catch(err) {}</script>

OpenSolaris 2010.02 early access build 124 is really faring pretty well so far. It isn't free of issues, granted, but at the same time, it has improved leaps and bounds on laptop support, especially for netbooks, thanks to a passionate and dedicated team writing up a bunch of device drivers for wifi and network cards found in these little laptops.

Today, I installed build 124 on a Lenovo W700ds.

You probably have never heard of that beast because they probably only sold half a dozen of them, one of which landed on my desk yesterday. The main reason for this success is probably that it weighs a ton (11 lbs or 5 Kgs!!!) due in part to its main 17" monitor, doubled by a netbook-like 10" monitor that slides out from behind the main one.... here are the specs. Notice they call it "portable power". Trasnportable would be more accurate. After using this laptop for about an hour now (I'm writing this post on it), I do have to say that it is quite fantastically comfortable, just about as much a desktop would be...not really surprising if you consider it has a full size keyboard + numeric keypad.

 So, OpenSolaris installs without a glitch, once again the installer just does its job without whining. If you run the device driver utility it will notify you that two devices do not have a driver for solaris, one being the integrated bluetooth card and the other being the fingerprint reader. Not a big deal. Once OpenSolaris is installed, it will boot in Gnome just as on any other machine, but what you really want is the second monitor to work... and there's a trick to that.

 First, the second monitor won't be recognized if you don't pull it all the way out at boot time. Took me a while to figure this one out. To save some mW, the Lenovo folks don't power it unless it's out and that makes it undetectable at first.

Second, once recognized by X, it will actually display sideways. This "companion" display is actually is 16:9 10" netbook display tilted right so that it's width resolution (1280x768) almost matches the height resolution of the main display (1920x1200). So all we have to do is to tilt it "left" to compensate for the hardware arrangement. To do so, simply enable the Rotate and Resize option on the graphics card and then tell X to rotate the appropriate screen left. Here's how:

Section "Monitor"

    # HorizSync source: edid, VertRefresh source: edid
    Identifier     "SlideOut"
    VendorName     "Lenovo"
    ModelName      "LEN 2nd Display"
    HorizSync       30.0 - 75.0
    VertRefresh     60.0
    Option         "DPMS"
    Option         "Rotate" "left"
EndSection

Section "Device"
    Identifier     "Device1"
    Driver         "nvidia"
    VendorName     "NVIDIA Corporation"
    BoardName      "Quadro FX 2700M"
    BusID          "PCI:1:0:0"
    Screen          1
    Option       "RandRRotation"    "on"
EndSection

 Note that TwinView must be disabled because twinview aggregates both display into a single block. Rotation with twinview on will result in rotate both displays. So you need to make them two X displays and enable xinerama.

here is the final xorg.conf in case you're interested...

Additional notes:

Suspend/Resume works great with this laptop -most of the time- however, it seems that sometimes, you will lose the second display upon resume, I'm not sure why.

Tuesday Sep 29, 2009

OpenDS in the cloud on Amazon EC2

<script type="text/javascript"> var gaJsHost = (("https:" == document.location.protocol) ? "https://ssl." : "http://www."); document.write(unescape("%3Cscript src='" + gaJsHost + "google-analytics.com/ga.js' type='text/javascript'%3E%3C/script%3E")); </script> <script type="text/javascript"> try { var pageTracker = _gat._getTracker("UA-12162483-1"); pageTracker._trackPageview(); } catch(err) {}</script>

Rationale

Why not run your Authentication service in the cloud? This is the first step to having a proper cloud IT. There are numerous efforts going to ease deploying your infrastructure in the cloud, from Sun and others, from OpenSSO to glassfish, from SugarCRM to Domino, and on goes the list. Here is my humble contribution for OpenDS.

Bird's Eye View

 Tonight I created my EC2 account and got OpenDS going on the Amazon infrastructure in about half an hour, I will retrace my steps here and point out some of the gotchas.

The Meat

Obviously, some steps must be taken prior to installing software.

First, you need an AWS (Amazon Web Services) account with access to EC2 (Elastic Compute Cloud) and S3 (Simple Storage Service). I will say this about EC2, it is so jovially intoxicating that I would not be surprised to be surprised by my first bill when it comes... but that's good, right? At least for amazon it is, yes.

Then you need to create a key pair, trivial as well. Everything is explained in the email you receive upon subscription.

Once that's done, you can cut to the chase and log on to the AWS management console right away to get used to the concepts and terms used in Amazon's infrastructure. The two main things are an instance and a volume. The names are rather self explanatory, the instance is a running image of an operating system of your choice. The caveat is that if shut it down, the next time you start this image, you will be back to the vanilla image. Think of it as a LiveCD. Can't write persistent data to it, if you do, it won't survive a power cycle.

To persist data between cycles, we'll have to rely on volumes for now. Volumes are just what they seem to be, only virtual. You can create and delete volumes at will, of whatever size you wish. Once a volume is created and becomes available, you need to attach it to your running instance in order to be able to mount it in the host operating system. CAUTION: look carefully at the "availability zone" where your instance is running, the volume must be created in the same zone or you won't be able to attach it.

 Here's a quick overview of the AWS management console with two instances of OpenSolaris 2009.06 running. The reason I have two instances here is that one runs OpenDS 2.0.0 and the other runs DSEE 6.3 :) -the fun never ends-. I'll use it later on to load OpenDS.

My main point of interest was to see OpenDS perform under this wildly virtualized environment. As I described in my previous article on OpenDS on Acer Aspire One, virtualization brings an interesting trend in the market that is rather orthogonal to the traditional perception of the evolution of performance through mere hardware improvements...

In one corner, the heavy weight telco/financial/pharmaceutical company weighing in at many millions of dollars for a large server farm dedicated to high performance authentication/authorization services. Opposite these folks, the ultra small company curled in the other corner, looking at every way to minimize cost in order to simply run the house while allowing to grow the supporting infrastructure as business ramps up.

Used to be quite the headache, that. I mean it's pretty easy to throw indecent amounts of hardware at meeting crazy SLAs. Architecting a small, nimble deployment yet able to grow later? Not so much. If you've been in this business for some time, you know that every iteration of sizing requires to go back to capacity planning and benchmarking which is too long and too costly most of the time. That's where the elastic approaches can help. The "cloud" (basically, hyped up managed hosting) is one of them.

Our team also has its own, LDAP-specific, approach to elasticity, I will talk about that in another article, let's focus on our "cloud" for now. 

 Once your instance is running, follow these simple steps to mount your volume and we can start talking about why EC2 is a great idea that needs to be developed further for our performance savvy crowd.

In this first snapshot, I am running a stock OpenDS 2.0.0 server with 5,000 standard MakeLDIF entries. This is to keep it comparable to the database I used on the netbook. Same searchrate, sub scope, return the whole entry, across all 5,000.

If this doesn't ring a bell? Check out the Acer article. Your basic EC2 instance has about as much juice as a netbook. Now the beauty of it all is that all it takes on my part to improve the performance of that same OpenDS server is to stop my "small" EC2 instance and start a medium one.

Voila!

  I've got 2.5 times the initial performance. I did not change ONE thing on OpenDS, this took 3 minutes to do, I simply restarted the instance with more CPU. I already hear you cry out that it's a shame we can't do this live -it is virtualization after all- but I'm sure it'll come in due course. It is worth noting that even though I could use 80+% of CPU on the small instance of OpenDS, in this case I was only using about 60% so the benefit would likely be greater but I would need more client instances. This imperfect example still proves the point on the ease of use and the elasticity aspect.

The other thing that you can see coming is an image of OpenDS for EC2. I'm thinking it should be rather easy to script 2 things:

1) self-discovery of an OpenDS topology and automatic hook up in the multi master mesh and

2) snapshot -> copy -> restore the db, almost no catch up to do data wise. If you need more power, just spawn a number of new instances: no setup, no config, no tuning. How about that ?

Although we could do more with additional features from the virtualization infrastructure, there is already a number of unexplored options with what is already there. So let's roll up our sleeves and have a serious look. Below is a snapshot of OpenDS modrate on the same medium instance as before with about 25% CPU utilization. As I said before, this thing has had NO fine tuning whatsoever so these figures are with the default, out-of-the-box settings.

  I would like to warmly thank Sam Falkner for his help and advice and most importantly for teasing me into trying EC2 with his FROSUG lightning talk! That stuff is awesome! Try it yourself.

Tracking Down All Outliers From Your LDAP Servers ...

<script type="text/javascript"> var gaJsHost = (("https:" == document.location.protocol) ? "https://ssl." : "http://www."); document.write(unescape("%3Cscript src='" + gaJsHost + "google-analytics.com/ga.js' type='text/javascript'%3E%3C/script%3E")); </script> <script type="text/javascript"> try { var pageTracker = _gat._getTracker("UA-12162483-1"); pageTracker._trackPageview(); } catch(err) {}</script>

Rationale

 I was recently faced with the challenge to track down and eliminate outliers from a customer's traffic and I had to come up with some some of tool to help in diagnosing where these long response time transactions originated from. Not really rocket science -hardly anything IS rocket science, even rocket science isn't all that complicated, but I digress- yet nothing that I had in the tool box would quite serve the purpose. So I sat down and wrote a tool that would allow me to visually correlate events in real time. At least that was the idea.

Bird's Eye View

This little tool is only meant for investigations and we are working on delivering something better and more polished (code name Gualicho, shhhhhhh) for production monitoring. The tool I am describing in this article simply correlates the server throughput, peak etime, I/O, CPU, Network and Garbage Collection activity (for OpenDS). It is all presented in a sliding line metric, stacked on top of each other, making visual identification and correlation easy. Later on I will adapt the tool to work on DPS, since it is the other product I like to fine tune for my customers.

The Meat

When pointed to the access log and the GC log, here is the text output you get. There is one line per second that is displayed with the aggregated information collected from the access log and garbage collection as well as kstats for network, I/O, CPU.


If you looked at it closely, I represented the garbage collection in % which is somewhat unsual but after debating on how to make this metric available, I decided that all I was interested was a relative measure of the time spent in stop-the-world GC operations over the time the application itself is running. As I will show in the snapshot below, this is quite effective to spot correlations with high etimes in most cases. To generate this output in the GC log, all you have to do is add the following to your set of JAVA_ARGS for start-ds.java-args in /path/to/OpenDS/config/java.properties:

 -Xloggc:/data/OpenDS/logs/gc.log -XX:+PrintGCApplicationConcurrentTime -XX:+PrintGCApplicationStoppedTime

And then my GUI will show something like:


Don't hesitate to zoom in on this snapshot. The image is barely legible due to blog formatting constraints.

Excuse me if I have not waited 7+ days to take the snapshot for this article but I think this simple snap serves the purpose. You can see that most of the time we spend 2% of the time blocked in GC but sometimes we have spikes up to 8% and when this happens, even though it has little impact on the overall throughput over one second, the peak etime suddenly jumps to 50ms. I will describe in another article what we can do to mitigate this issue, I simply wanted to share this simple tool here since I think it can serve some of our expert community.

OpenDS on Acer Aspire One: Smashing!

<script type="text/javascript"> var gaJsHost = (("https:" == document.location.protocol) ? "https://ssl." : "http://www."); document.write(unescape("%3Cscript src='" + gaJsHost + "google-analytics.com/ga.js' type='text/javascript'%3E%3C/script%3E")); </script> <script type="text/javascript"> try { var pageTracker = _gat._getTracker("UA-12162483-1"); pageTracker._trackPageview(); } catch(err) {}</script>

Rationale

As far fetched as it may seem,  with the growing use of virtualization and cloud computing, the average image instance that LDAP authentication systems are having to run on look more like your average netbook than a supercomputer. With that in mind, I set out to find a reasonable netbook to test OpenDS on. I ended up with an Acer Aspire ONE with 1GB of RAM. Pretty slim on memory. Let's see what we can get out of that thing!

Bird's Eye View

In this rapid test I have done, I loaded OpenDS (2.1b1) with 5,000 entries (stock MakeLdif template delivered with it), hooked up the netbook to a closed GigE network and loaded it from a corei7 machine with searchrate. Result: 1,300+ searches per second. Not bad for a machine that only draws around 15 Watts!

The Meat 

As usual, some more details about the test but first a quick disclaimer: this is not a proper test or benchmark of the Atom as a platform, it is merely a kick in the tires. I have not measured other metrics than the throughput and only for a search workload at that. It is only to get a "feel" of it on such a lightweight sub-notebook.

In short:

  • Netbook: Acer Aspire One ZG5 - Atom N270 @1.6GHz, 1GB RAM, 100GB HDD
  • OS: OpenSolaris 2009.05
  • FS: ZFS
  • OpenDS: all stock, I did not even touch the JAVA options which I usually do
  • JAVA: 1.6 Update 13

The little guy in action, perfbar shows the CPU is all the way up there with little headroom...


Monday Aug 24, 2009

Learn From Errors: Is Your Cloud About To Burst?

<script type="text/javascript"> var gaJsHost = (("https:" == document.location.protocol) ? "https://ssl." : "http://www."); document.write(unescape("%3Cscript src='" + gaJsHost + "google-analytics.com/ga.js' type='text/javascript'%3E%3C/script%3E")); </script> <script type="text/javascript"> try { var pageTracker = _gat._getTracker("UA-12162483-1"); pageTracker._trackPageview(); } catch(err) {}</script>

Rationale

A number of customers I talk have a hugely diverse ecosystem of application relying on the LDAP infrastructure for authentication, single sign-on and also user-specific configuration storage. Very few have a strictly controlled environment with a reduced set of well-known clients.

One cause of trouble I have seen many times over sparks from client applications not being robust and poorly handle the protocol. There is an easy way to grow confidence in your infrastructure and ecosystem at the same time: after setting up the prototype and before you go in production, during the QA stage, try to spend some time intentionally injecting errors in your traffic. You'll immediately see if  clients start blowing up left and right!

Bird's Eye View

To cut to the chase, this plug-in sits on DS as a pre operation search. You can "create" any entry simply by adding a configuration parameter to the plug-in. For example, if you want to have DS return "no such entry" (Error 32) for cn=nosuch,dc=example,dc=com, like shown below :

all you would have to do (once the plug-in is properly set up) is:

dsconf set-plugin-prop arbitrary-response argument+:cn=nosuch,dc=example,dc=com#32#0

The Meat

I honestly have no idea why I have not shared this small tool earlier. I wrote this plug-in years ago for Directory Server 5.2 and later on recompiled it against DS 6.x on OpenSolaris. Currently it is built for Solaris 9/10/OpenSolaris x86/x64. If you want it on another platform, let me know and I'll spin it for you.

To install this plugin, simply unzip the file and then follow the instructions in the bundled README file. The sequence of commands will work for DS 6.x.

In its current version (1.1b) the plug-in can inject errors as well as delays into an arbitrary response. This means that you can easily test how connection idle timeouts are managed by your client applications connection pooling mechanism, if any.

Injecting delay is done through the third parameter of the plug-in. For example, to return a valid response with error code 0 after 15 seconds, you would have to add the following argument to the plug-in:

dsconf set-plugin-prop arbitrary-response argument+:cn=ok,dc=example,dc=com#0#15

Useful things not in this version

  1. I will probably add a 4th parameter which will represent the probability that the error is returned, otherwise, just pass on the request to DS core

  2. Ability to interpret regular expressions in the base DN part of the plug-in argument

That's it for today!

About

Directory Services Tutorials, Utilities, Tips and Tricks

Search

Archives
« April 2014
SunMonTueWedThuFriSat
  
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
   
       
Today