Monday Dec 21, 2009

Twice Faster: Import Tips And Tricks

Rationale

We understand. Really, we do.

You want to import your data as fast possible. Not only nobody likes to wait but there are time constraints on everyone of us. In particular, we're constrained by maintenance windows.

So, what does give you the best return on investment? Higher CPU Clock speed? More CPUs? More memory? We're here to discover just that together.

Irrespective of the litterature on the topic, you must make sure that import will not have to be done in multiple passes or you'll get killed on index merging time. To do so, shoot high and dial down gradually to find the right cache size for your data set.

Bird's Eye View

In essence, what I did was to start an import on a vanilla system and Directory Server with a customer's actual data, 7.9M entries. The system I am testing with isn't a server. The system doesn't matter, it is a constant. The variable are the amount of import cache, the number of CPUs active (1-8) and the CPU clock speed (from 3.3 to 3.7GHz). In short, memory matters most.

The Meat

The Setup

The instance I am doing this with is an actual telco setup, with 7.9M entries in the LDIF file. The LDIF weighs in at 12GiB. There are 10 custom indexes configured for equality only.The 8 system indexes are there as well.

On the physical side of things, the machine is my desktop, an Intel Corei7 975EE @ 3.3GHz. It has a 64GB Intel X25 and a 10,000 rpm SATA drive. The disk layout is described in more detail here.

Sensivity To Import Cache 

Despite what the documentation says, there are huge gains to be reaped from increasing the import cache size, and depending on your data set, this may make  a world of difference.

This is the first thing that I tweaked during this test first phase and bumping import cache from 2GiB to 4GiB chopped import time in half. Basically, if your import has to occur in more than a single pass, then your import cache isn't big enough, try to increase it if your system can take it.

Sensivity To Clock Speed

Ever wondered if a system with CPUs twice faster would buy you time on import? Not really. Why? Well, if the CPUs are waiting on the disks or locks then higher clock speeds isn't going to move the needle at all. That's what's going on here. Check this out...

The reason the 3.7GHz import isn't as fast as the 3.3GHz is because my overclocking might have thrown off the balance between the core clock and the bus clock, so the CPU is spinning its wheels ,waiting to access memory and IO...

I officially declare this one moot. I'll try again later with underclocking.

Sensivity To Number Of CPUs

Scalability is an interesting challenge. Ideally, you'd want half the import time given twice the resources. In reality, import is very lock intensive to avoid collisions/corruptions so it isn't quite that linear. Here's what I got on my system, all other things being equal.

So even though the scalability isn't linear, the good thing is the more CPUs the better your performance is going to be.

<script type="text/javascript"> var gaJsHost = (("https:" == document.location.protocol) ? "https://ssl." : "http://www."); document.write(unescape("%3Cscript src='" + gaJsHost + "google-analytics.com/ga.js' type='text/javascript'%3E%3C/script%3E")); </script> <script type="text/javascript"> try { var pageTracker = _gat._getTracker("UA-12162483-1"); pageTracker._trackPageview(); } catch(err) {}</script>

Saturday Dec 19, 2009

DPS Coherence Plug-In: New Features

<script type="text/javascript"> var gaJsHost = (("https:" == document.location.protocol) ? "https://ssl." : "http://www."); document.write(unescape("%3Cscript src='" + gaJsHost + "google-analytics.com/ga.js' type='text/javascript'%3E%3C/script%3E")); </script> <script type="text/javascript"> try { var pageTracker = _gat._getTracker("UA-12162483-1"); pageTracker._trackPageview(); } catch(err) {}</script>

Rationale

After having conversations on this plug-in both internally and with a handful of you out there, I sat down and added what I felt was the main missing features in my current implementation. This mainly covers security but also some flexibility. 

Bird's Eye View

On the security front, you now have the ability to filter out attributes so they don't make it in the cache (e.g. userPassword). What's more, you can filter out what entries you don't want in the cache in order to avoid the cache to be polluted by the occasional non-production (e.g. administration) hits.

On the flexibility front, you can set a time to live for entries that do make it in the cache. This allows to control whether you want to retain a value forever in the cache (default) or you want to make sure that it gets evicted after a certain time. You can also provide a list of regular expressions for the Bind DN (the identity) you grant access to the cache. And of course, you can decide to inlcude (default) or exclude unauthenticated clients to access the cache as well.

The Meat 

  • attributeNotToCache: userPassword
  • This attribute in the plugin's configuration entry can be multivalued and is not a regular expression but a plain string. Case, as always in LDAP, matters not. Any attribute name matching on of the provided values will be stripped from the entry before storing in Coherence.

  • dnNotToCache: .\*,ou=secret,o=o
This attribute can multivalued and allows to prevent DNs matching the regular expression to be stored in Coherence.
  • cacheForBindDN: cn=[2-3],o=o
This is attribute can be multivalued. It is a regular expression. Any authenticated clients' Bind DN must match one of the provided regular expressions to be granted access to the contents stored in Coherence.
  • cacheForAnonymous: false
This attribute is single valued. It is a boolean, either true or false. When false, unauthenticated clients will not be granted access to the contents stored in Coeherence and will therefore always hit the back-end.
  • cacheExpirationInMS: 30000

This attribute is single valued. It is a long and represents the length of time in milliseconds that an entry should be kept in the cache after the last time it has been accessed.

So, in the end, here is an example configuration entry:

dn: cn=CoherencePlugin,cn=Plugins,cn=config
objectClass: top
objectClass: configEntry
objectClass: plugin
objectClass: extensibleObject
cn: CoherencePlugin
description: Oracle Coherence Cache Plugin
enabled: true
pluginClassName: com.sun.directory.proxy.extensions.CoherencePlugin
pluginType: preoperation search
pluginType: search result entry
pluginType: postoperation delete
pluginType: postoperation modify
pluginType: postoperation modify dn
cacheName: LDAPCache
attributeNotToCache: userpassword
attributeNotToCache: aci
dnNotToCache: .\*,ou=secret,o=o
dnNotToCache: .\*,cn=config
cacheForBindDN: cn=[2-3],o=o
cacheForBindDN: uid=user.[0-9]+,ou=People,o=o
cacheForAnonymous: false
cacheExpirationInMS: 30000

That's it for my Friday night, let me know if there is more than DPS+Coherence can do for you!

As always, if you want to try this DPS plug-in, ping me: arnaud -at- sun -dot- com

Friday Dec 18, 2009

More ZFS Goodness: The OpenSolaris Build Machine

<script type="text/javascript"> var gaJsHost = (("https:" == document.location.protocol) ? "https://ssl." : "http://www."); document.write(unescape("%3Cscript src='" + gaJsHost + "google-analytics.com/ga.js' type='text/javascript'%3E%3C/script%3E")); </script> <script type="text/javascript"> try { var pageTracker = _gat._getTracker("UA-12162483-1"); pageTracker._trackPageview(); } catch(err) {}</script>

Apart from my usual LDAP'ing, I also -try to- help the opensolaris team with anything I can.

Lately, I've helped build their new x64 build rig, for which I carefully selected the best components out there while trying to keep the overall box budget on a leash. It came out at about $5k. Not on the cheap side, but cheaper than most machines in most data centers.

The components:

  • 2 Intel Xeon E5220 HyperThreaded QuadCores@2.27GHz. 16 cpus in solaris
  • 2 32GB Intel X25 SSD
  • 2 2TB WD drives
  • 24GB ECC DDR2

I felt compelled to follow up my previous post about making the most out your SSD because some people commented that non mirrored pools were evil. Well, here's how this is set up this time: in order to avoid using either of the relatively small SSDs for the system, I have partitioned the big 2TB drives with exactly the same layout, one 100GB partition for the system, the rest of the disk is going to be holding our data. This leaves our SSD available for the ZIL and the L2ARC. But thinking about it, the ZIL is never going to take up the entire 32GB SSD. So I partitioned one of the SSDs with a 3GB slice for the ZIL and the rest for L2ARC.

The result is a system with 24GB of RAM for the Level 1 ZFS cache (ARC) and 57GB for L2ARC in combination with a 3GB ZIL. So we know it will be fast. But the icing on the cache ... the cake sorry, is that the rpool is mirrored. And so is the data pool.

Here's how it looks: 

admin@factory:~$ zpool status
  pool: data
 state: ONLINE
 scrub: none requested
config:

    NAME        STATE     READ WRITE CKSUM
    data        ONLINE       0     0     0
      c5d1p2    ONLINE       0     0     0
      c6d1p2    ONLINE       0     0     0
    logs
      c6d0p1    ONLINE       0     0     0
    cache
      c5d0p1    ONLINE       0     0     0
      c6d0p2    ONLINE       0     0     0

errors: No known data errors

  pool: rpool
 state: ONLINE
 scrub: none requested
config:

    NAME        STATE     READ WRITE CKSUM
    rpool       ONLINE       0     0     0
      mirror-0  ONLINE       0     0     0
        c5d1s0  ONLINE       0     0     0
        c6d1p1  ONLINE       0     0     0

errors: No known data errors
admin@factory:~$

 This is a really good example of how to setup a real-life machine designed to be robust and fast without compromise. This rig achieves performance on par with $40k+ servers. And THAT is why ZFS is so compelling.

Wednesday Dec 16, 2009

Oracle Coherence Support Right In DPS!!!

Rationale

Why do caching in DPS ?

The Directory Server back-ends are not able to "absorb" as many updates when they're stressed with a large proportion of searches. After all there's already caching on the back-end Directory Server itself. It helps a lot performance, since reads return to the client faster, it relieves some of the stress and frees up resources to take care of the writes that lock resources for longer atomic stretches of time. But as long as searches hit the back-end, even with cache, there's some weight lifting to be done: open the connection, parse the request, put the request in the work queue, lookup entries in the cache, return the entry, close the connection...

That's why caching right in DPS started to look appealing to me.

Why Coherence ?

Well, as much as one may think it is because Sun is about to be gobbled up by Oracle that I made this choice but the answer is no. Coherence is simply a compelling choice, these guys seem to have made all the technical choices I would have ... and then some. For one, you download the bits, then just start it and it works. It may sound like a marketing pitch but see for yourself. Excellent 0 to 60 in my book. Once you have it working, you get acquainted with it, the protocol is dead simple, the API is clean, robust and pretty lean. After that, you check the docs out (that's right, I read the docs after the fact) and start to realize how powerful a platform it is, how expandable it is, how far you can push deployments to accommodate growing performance or reliability needs. 

Bird's Eye View

The integration with Coherence is done by way of DPS (7+) plug-in that will -asynchronously- populate a Coherence cache with entries being returned by your regular traffic. When requests come in, a lookup is done to check if the entry is present in the cache and returned immediately if it is the case, otherwise the request is routed to the back-end as usual.

Note that I'm not making any claims on the performance aspect from the client's perspective of this caching approach because our Directory Server back-end is already pretty darn fast. This for sure relieves it from a bit of "frequent" traffic and will certainly benefit the overall performance of the topology. The relief will most certainly result in improved write response times but nothing speaks to the performance of the Coherence cache lookup. I just haven't collected enough data so far.

The Meat

nitty gritty anyone ?

Suppose we have a setup like this ... 

 the first read would be processed as described below:


but the second one would be processed without hitting the back-end.

Understand the tremendous impact this will have on your back-end ?

It frees it up to process writes or some heavy weight searches...

How to get the plug-in? ask nicely. 

Tuesday Dec 15, 2009

DSEE 7.0 On Windows Gotcha

<script type="text/javascript"> var gaJsHost = (("https:" == document.location.protocol) ? "https://ssl." : "http://www."); document.write(unescape("%3Cscript src='" + gaJsHost + "google-analytics.com/ga.js' type='text/javascript'%3E%3C/script%3E")); </script> <script type="text/javascript"> try { var pageTracker = _gat._getTracker("UA-12162483-1"); pageTracker._trackPageview(); } catch(err) {}</script>

Rationale

When you just want it to work quick and easy, and since the zip distribution lends itself particularly well to that with only extracting the archive in the way to being productive with your Directory Server, you only skim through the install guide. Well, at least if you're as impatient as I am. Make sure to read this. 

Bird's Eye View

After the quick and painless install, you add the single bin folder to your PATH environment and get crackin'. Directory Server instances can be created. As can Directory Proxy Server instances. But the usual command line in the DSRK, the indispensable ldapsearch, doesn't. What? You missed one tiny component that still needs to be "installed" on your machine. 

The Meat 

If when you try ldapsearch from DSEE 7.0 on your vanilla windows install you get the message below: 

 I'll make a quick text note here so it's searchable...

The application was unable to start correctly (0xc0150002)

Then it only means that you forgot to install the Visual C++ Redistributable Package, as stated in a tiny Note- block in the install guide which had completely escaped me during my quick skimming. Note that the doc points you to downloading the bits from microsoft but since we're nice guys, the installer is included in the DSEE ZIP archive, at the root of the zip file and is called vcredist_x86.exe. Be sure to install that if you need the ldap\* utilities.

happy LDAP'ing

Monday Dec 14, 2009

Import: Intel's HyperThreading Performance Gain

<script type="text/javascript"> var gaJsHost = (("https:" == document.location.protocol) ? "https://ssl." : "http://www."); document.write(unescape("%3Cscript src='" + gaJsHost + "google-analytics.com/ga.js' type='text/javascript'%3E%3C/script%3E")); </script> <script type="text/javascript"> try { var pageTracker = _gat._getTracker("UA-12162483-1"); pageTracker._trackPageview(); } catch(err) {}</script>

Rationale

 As I am experimenting with more and more of the new-ish Intel Nehalem-based servers, a question kept bugging me. How much am I  getting out of these "hyper" threads that appear as CPUs in the operating system? I took some time to compare a Directory Server import on my corei7 workstation with an actual customer's data set of 7.9 Million entries.

Bird's Eye View

Don't bet the house on HyperThreading.

The Meat

To conduct this test in a simple fashion, all I did was an import of these entries first with all the CPUs enabled and then by disabling all the odd CPUs. Doesn't get any simpler. All other things can be considered equal.

The result is quite stunning really.



 Import Time
 8 CPUs (with HyperThreading)
 30:56.62
 4 CPUs (without HyperThreading)
 35:04.60

Which makes a difference of about 13%...

Just thought someone out there would find this as puzzling as I. Quite honestly, I  believe this is shared between the Directory Server not scaling too well, the IO actually being a contention point and HyperThreading being somewhat ineffective. More on that later.

Make The Most Of Your SSD With ZFS

<script type="text/javascript"> var gaJsHost = (("https:" == document.location.protocol) ? "https://ssl." : "http://www."); document.write(unescape("%3Cscript src='" + gaJsHost + "google-analytics.com/ga.js' type='text/javascript'%3E%3C/script%3E")); </script> <script type="text/javascript"> try { var pageTracker = _gat._getTracker("UA-12162483-1"); pageTracker._trackPageview(); } catch(err) {}</script>

Rationale

 If you're anything like me, you're lucky if you have a single SSD in your workstation or server. I had a dilemma in the past, I couldn't quite find a way to make the most of my one flash drive: I had to chose between making it a ZIL or use it as L2ARC. I dreaded having to make a definitive choice for one or the other. When I installed my workstation with OpenSolaris 2009.06, I had an idea in mind, so I installed the system on the SSD in a small partition (10GB) and left the rest of the drive unallocated if you catch my drift...

Bird's Eye View

Simple! Just partition the SSD to be able to use it as both L2ARC and ZIL in whatever proportions you think is going to suit  your needs. Note however that the IOs are shared between your partitions on the same drive. From my testing though, I can tell you that with this setupu you're still coming out on top in most situations.

The Meat

It's all pretty simple really, when you install solaris, you have a choice of installing on "whole disk" or to use the tool to make a smaller custom partition. I cut out a 36GB partition which allows ample room for the system and swap. The rest of my 64GB SSD is left unallocated at install time, we'll take care of everything later.

The second disk in my system is a 300GB 10,000 rpm SATA drive which, being fast but small, I wanted to leave whole for my data pool (keep in mind that the rpool is a little different than your regular pool, so make sure to treat it accordingly). That is why I decided to compromise and use some of the SSD space for the system. You don't have, you could partition your spindle and have the system on there.

Now, that you have opensolaris up and running, install GParted to be able to edit your disks partitions. You can either use the opensolaris package manager or

pfexec pkg install SUNWGParted

It's all downhill from here. Open GParted. If you just installed it, you will need to log out and back in to see in the GNnome menu. It will be in Applications->System tools->GParted Partition Editor

Select your flash drive and carve out a 2GB partition for your ZIL and assign the remaining space for L2ARC. Apply the changes and keep the window open.


Note the two devices path in /dev/dsk because that's what we'll use to add these two SSD partitions as performance enhancing tools in our existing pool.

arnaud@ioexception:/data/dsee7.0/instances$ pfexec zpool add data log /dev/dsk/c9d0p2 cache /dev/dsk/c9d0p3

Let's check how our pool looks now...

arnaud@ioexception:/data/dsee7.0/instances$ zpool status data
  pool: data
 state: ONLINE
 scrub: none requested
config:

    NAME        STATE     READ WRITE CKSUM
    data        ONLINE       0     0     0
      c8d0      ONLINE       0     0     0
    logs
      c9d0p2    ONLINE       0     0     0
    cache
      c9d0p3    ONLINE       0     0     0

errors: No known data errors

Et voila!

You've got the best of both worlds, making the absolute most of whatever little hardware you had at your disposal!

Enjoy!

Add And Remove ZILs Live!

<script type="text/javascript"> var gaJsHost = (("https:" == document.location.protocol) ? "https://ssl." : "http://www."); document.write(unescape("%3Cscript src='" + gaJsHost + "google-analytics.com/ga.js' type='text/javascript'%3E%3C/script%3E")); </script> <script type="text/javascript"> try { var pageTracker = _gat._getTracker("UA-12162483-1"); pageTracker._trackPageview(); } catch(err) {}</script>

Rationale

Ever been playing with separate logs (separate ZIL, logzilla, etc...) and had to rebuild the pool everytime you wanted to yank the slog off ?

Not so anymore! The zil can now be added and removed as you like! Truly fantastic to tinker with it and observe its actual impact on performance. A quick walk through one of the most painless migrations of all times.

Bird's Eye View

As ZILs get more and more exposure to production environments, technical staff is getting to experiment more and more to be able to make recommendations and the new removal feature adds a lot more flexibility to try creative combinations.

The Meat

Let's suppose you have a pool with a separate log:

arnaud@ioexception:/data/dsee7.0/instances$ zpool status data
  pool: data
 state: ONLINE
status: The pool is formatted using an older on-disk format.  The pool can
    still be used, but some features are unavailable.
action: Upgrade the pool using 'zpool upgrade'.  Once this is done, the
    pool will no longer be accessible on older software versions.
 scrub: none requested
config:

    NAME                      STATE     READ WRITE CKSUM
    data                      ONLINE       0     0     0
      c8d0                    ONLINE       0     0     0
    logs
      /dev/ramdisk/zil-drive  ONLINE       0     0     0

errors: No known data errors

If you try to remove the log by removing the actual separate log device, you'll get the following error:

arnaud@ioexception:/data/dsee7.0/instances$ pfexec ramdiskadm -d zil-drive
ramdiskadm: couldn't delete ramdisk "zil-drive": Device busy

If you now try to use the zpool remove command, you will also hit a wall:


arnaud@ioexception:/data/dsee7.0/instances$ zpool remove data log /dev/ramdisk/zil-drive
cannot remove log: no such device in pool
cannot remove /dev/ramdisk/zil-drive: pool must be upgrade to support log removal

So let's just follow up on the suggestion and upgrade the pool: 



arnaud@ioexception:/data/dsee7.0/instances$ pfexec zpool upgrade data
This system is currently running ZFS pool version 22.

Successfully upgraded 'data' from version 14 to version 22

arnaud@ioexception:/data/dsee7.0/instances$ zpool status data
  pool: data
 state: ONLINE
 scrub: none requested
config:

    NAME                      STATE     READ WRITE CKSUM
    data                      ONLINE       0     0     0
      c8d0                    ONLINE       0     0     0
    logs
      /dev/ramdisk/zil-drive  ONLINE       0     0     0

errors: No known data errors

Wow, that's quick, easy and smooth. Gotta love migrations of that sort. Let's now try to remove our separate log:


arnaud@ioexception:/data/dsee7.0/instances$ pfexec zpool remove data log /dev/ramdisk/zil-drive
cannot remove log: no such device in pool
arnaud@ioexception:/data/dsee7.0/instances$ zpool status data  pool: data
 state: ONLINE
 scrub: none requested
config:

    NAME        STATE     READ WRITE CKSUM
    data        ONLINE       0     0     0
      c8d0      ONLINE       0     0     0

errors: No known data errors

So even though it barfed an error message, my separate log has been removed from my pool and I will be able to decommission my device.

arnaud@ioexception:/data/dsee7.0/instances$ pfexec ramdiskadm -d zil-drive
arnaud@ioexception:/data/dsee7.0/instances$

And I can now create a new device and do some more testing. Props to the ZFS for the ever improving level of service this file system brings to the table!

Tuesday Dec 08, 2009

Using Wireshark Do Dissect LDAP Traffic

<script type="text/javascript"> var gaJsHost = (("https:" == document.location.protocol) ? "https://ssl." : "http://www."); document.write(unescape("%3Cscript src='" + gaJsHost + "google-analytics.com/ga.js' type='text/javascript'%3E%3C/script%3E")); </script> <script type="text/javascript"> try { var pageTracker = _gat._getTracker("UA-12162483-1"); pageTracker._trackPageview(); } catch(err) {}</script>

Rationale

Every once in a while, an elusive issue may drive you bonkers because you can't quite identify what is going awry, which of course is crucial to solving said issue. In such cases, it's good to go back to the basics and actually analyze what is flowing on the wire. To do so, snoop is the tool. But more important than capturing the data, which is fairly well documented on the web, analyzing properly what the capture contains may put you on track to solving the issue.

Bird's Eye View

In this entry, I'll introduce how wireshark can be used to help investigate LDAP issues or even simply to double check your LDAP product is true in its performance metrics... A word of advice, never trust a single metric at face value, check yourself. The LDAP server (or web or app server for that matter) may report a response time of X, but that really is no good to your client applications if the network card or the TCP layer has trouble getting the message through the wire. Do use snoop to collect data and wireshark to check that everything is fine or dig in the snoop files.  

The Meat

First things first, installing Wireshark.

On OpenSolaris:

1- get pkgutil

2- install wireshark

Here's how:

pfexec pkgadd -d http://blastwave.network.com/csw/pkgutil_`uname  -p`.pkg
yes|pfexec /opt/csw/bin/pkgutil --install CSWwireshark

3- start wireshark, it is installed in /opt/sfw/bin/wireshark, for example do:

nohup /opt/csw/bin/wireshark 2>/dev/null &

 4- Now that wireshark is started up, you can open up you snoop file and what you get is basic Ethernet+IP+TCP decoding., like so:

5- So we will quite simply have to tell wireshark to decode the TCP packets content as being LDAP (wherever applicable). Go to Analyze->Deocde As. A window pops up to allow you to select which decoder to use. Select the "Transport" tab, and then click on LDAP in the list. Parsing through the whole snoop file again may take a while, but once it's done, it will be worth the wait.

6- Once data is properly interpreted as LDAP, we can see that those TCP packets with an LDAP payload will now be advertised as LDAP (highlighted in green) right in the wireshark interface:

Now that you have followed these few simple steps, you can dig in the snoop data and graph statistics.

For example, you can very easily graph the ratio of LDAP operations to LDAP operation taking more than one second with the statistics graphing tool:

 Of course, there are dozens of other invaluable tools in wireshark that are not only of the greatest quality but immensely useful as well to spot problems in your traffic that many higher level tools won't be able to help with, what comes to mind is stuff like:

  • issues with lower-level than like IP problems (for e.g.  CRC errors) or TCP retransmits
  • hitting the maximum bandwith on the interface (use the statistics->IO Graphs tools and select YAxis to Bytes/Tick)
  • LDAP level SLA not being met: you can check wehther a particular LDAP Time was met or not over your snoop file as shown above
  • Check for particular LDAP return codes
  • Check for particular LDAP filters
  • Filter your results on anything contained in an LDAP packet

 As I said earlier, this tool is truly fantastic.

Hope this ends up helping someone. Enjoy! 

Monday Dec 07, 2009

DSIO: A Storage Sizing Tool For LDAP

<script type="text/javascript"> var gaJsHost = (("https:" == document.location.protocol) ? "https://ssl." : "http://www."); document.write(unescape("%3Cscript src='" + gaJsHost + "google-analytics.com/ga.js' type='text/javascript'%3E%3C/script%3E")); </script> <script type="text/javascript"> try { var pageTracker = _gat._getTracker("UA-12162483-1"); pageTracker._trackPageview(); } catch(err) {}</script>

Rationale

 Ever had this seemingly insurmountable wall of incomprehension between two teams that it was pretty obvious communication would never quite work ? Last year I had an interaction with a customer where one team manages all the storage infrastructure for the company and another manages the LDAP infrastructure. The LDAP team had to "purchase" storage from the storage team. Pricing was based on guaranteed IOPS. The LDAP folks provide authentication to the business and they are given performance requirements to meet and need to architect or expand to meet demand. The gap is that the LDAP team had no real tool to translate their LDAP performance numbers in terms of IOPS in order to purchase the right storage package.

This morning, I was on a call with a prospect who wants to run everything on VMWare ESX and swear their NAS can serve the load. Our LDAP experts told them that there just was no way to meet their very stringent performance requirements. The customer shot back the usual "show me the IOPS. So I thought I'd dust off the old DSIO DTrace script and share here since it could be reused in other instances that I do not know about.

Bird's Eye View

The DTrace script hooks up to your slapd process and intercepts calls the processing functions to be able to count the number of LDAP operations. It also intercepts lower level I/O operating system calls in order to be able to associate the LDAP operation to its subsequent I/O calls. Simple, isn't it?

The Meat

Pre-requisites

  • DTrace (obviously)
  • KSH (at least 88 but that's not much to ask) 
  • the dsio script

0 to 60 in no time

First, please note that you need to have administrative rights to run DTrace.

Second, by default dsio will look for the first slapd process it finds running, so only use this default behavior when you know for a fact that you have a single instance of Directory Server running.

# ./dsio

On Solaris:

$ pfexec ./dsio

The Rest Of The Way

- P: specify the PID of the running directory to trace (in case you'd have more than one running)

- l: print detailed information with respect to the LDAP traffic

-r: print detailed information with respect to the read I/O activity

-w: print detailed information with respect to the write I/O activity

As you can see in this case I only apply a modify load to the instance so as to make the point more explicit. Printing out the details is very useful to compare the actual counts of operations and the breakdown, the times, etc...

Note that in this case the IOPs are about half the LDAP throughput. How is that possible? By exploiting one of ZFS best features, the ZIL. I initially thought that my script was wrong and went to double check things with the ZFS folks. But that story is for another article.

Special thanks to Brendan Gregg for his invaluable scripts and tutorials on DTrace which helped me tremendously. The content this dude puts out turns DTrace black art into nothing more than shell scripting. Simply mind boggling.

Enhancements

  • Allow looping indefinitely
  • Implement a CSV output

Support

Best effort again as I'm not dedicated to this, you can send your questions and support requests to arnaud -- at -- sun -- dot -- com, or post your rants as comments here and I'll do my best to take care of you.

Enjoy!

Wednesday Dec 02, 2009

A Script To Rule Them All ... clients and backends that is

<script type="text/freezescript"> var gaJsHost = (("https:" == document.location.protocol) ? "https://ssl." : "http://www."); document.write(unescape("%3Cscript src='" + gaJsHost + "google-analytics.com/ga.js' type='text/javascript'%3E%3C/script%3E")); </script> <script type="text/freezescript"> try { var pageTracker = _gat._getTracker("UA-12162483-1"); pageTracker._trackPageview(); } catch(err) {}</script>

Rationale

With Directory Proxy Server, regardless of the version, investigating traffic can get:

  • a) real tricky
  • b) time consuming
  • c) confusing
  • d) all of the above

and the answer is ... rolling drum ... d) !

So here is a script that you can feed your DPS access log to. It will output a CSV file that you can then load in your favorite spreadsheet software or just graph with tools like gnuplot and the like... it just will make your life easy...er.

Bird's Eye View

Disclaimer: It's not as anywhere as clever as logconv.pl for Directory Server, it only munches the data so that YOU can more easily spot issues or identify patterns. So what does this script produce in the end ?

It will take your DPS 6.x/7.0 access log in and output three csv files, one with the transaction volumes (suffixed "tx"), one with the average response times (suffixed "avg") and finally one with the maximum response time over a minute (suffixed "max"). Why not all in one file? I did initially but in a csv it turned out to really not be practical. So at least when you open up one of these files you know what you're looking at.

The Meat

Pre-requisites

Since I really started this initially to simply be able to "grep" a file on a windows system, I really had no plan and no idea it would end up in a tool like this. All that to say that I wrote in python instead of our customary Java tools. At least it has the merit of existing so you don't have to start from scratch. So you'll need python, at least 2.4. If you're on Solaris or Linux, you're covered. If on windows, simply download your favorite python, I have installed the 2.6.4 windows version from here.

You will also need to download the script. You may as well get the netbeans project if you'd like to change it to adapt it to your specific needs or help improve on it.

How Does It Work

0 To 60 In No Time

python dps-log-cruncher.py access 

The Rest Of The Way

-c      : break up statistics per client
-s      : break up statistics per back-end server
-f hh:mm: start parsing at a given point in time
-t hh:mm: stop parsing after a given point in time
-h      : print this help message
-v      : print tool version

Some examples:

split the output per client for all clients:

python dps-log-cruncher.py -c \* access 

 split the output per back-end server for client 192.168.0.17:

python dps-log-cruncher.py -c 192.168.0.17 -s \* access 

 split the output for all clients, all servers:

python dps-log-cruncher.py -c \* -s \* access 

 only output results from 6pm (18:00) to 6:10pm (18:10):

python dps-log-cruncher.py -f 18:00 -t 18:10 access 

 output results between 6:00pm (18:00) to 6:10pm (18:10) and split results for all clients and back-end servers:

python dps-log-cruncher.py -f 18:00 -t 18:10 -c \* -s \* access 

Enhancements

This is a list to manage expectations as much as it is one for me to remember to implement:

  1. Selectable time granularity resolution. Currently, all data is aggregated per minute. In some case, it would be useful to be able to see what happens per second
  2. Improve error handling for parameters on the CLI.
  3. Add a built-in graphing capability to avoid having to resort to using a spreadsheet. Spreadsheets do however give a lot of flexibility
  4. Add the ability to filter / split results per bind DN
  5. Output the response time distribution

Support

Best effort is how I will label it for now, you can send your questions and support requests to arnaud -- at -- sun -- dot -- com.

Enjoy!

About

Directory Services Tutorials, Utilities, Tips and Tricks

Search

Archives
« December 2009
SunMonTueWedThuFriSat
  
1
3
4
5
6
9
10
11
12
13
17
20
22
23
24
25
26
27
28
29
30
31
  
       
Today