Monday Dec 21, 2009

Twice Faster: Import Tips And Tricks

Rationale

We understand. Really, we do.

You want to import your data as fast possible. Not only nobody likes to wait but there are time constraints on everyone of us. In particular, we're constrained by maintenance windows.

So, what does give you the best return on investment? Higher CPU Clock speed? More CPUs? More memory? We're here to discover just that together.

Irrespective of the litterature on the topic, you must make sure that import will not have to be done in multiple passes or you'll get killed on index merging time. To do so, shoot high and dial down gradually to find the right cache size for your data set.

Bird's Eye View

In essence, what I did was to start an import on a vanilla system and Directory Server with a customer's actual data, 7.9M entries. The system I am testing with isn't a server. The system doesn't matter, it is a constant. The variable are the amount of import cache, the number of CPUs active (1-8) and the CPU clock speed (from 3.3 to 3.7GHz). In short, memory matters most.

The Meat

The Setup

The instance I am doing this with is an actual telco setup, with 7.9M entries in the LDIF file. The LDIF weighs in at 12GiB. There are 10 custom indexes configured for equality only.The 8 system indexes are there as well.

On the physical side of things, the machine is my desktop, an Intel Corei7 975EE @ 3.3GHz. It has a 64GB Intel X25 and a 10,000 rpm SATA drive. The disk layout is described in more detail here.

Sensivity To Import Cache 

Despite what the documentation says, there are huge gains to be reaped from increasing the import cache size, and depending on your data set, this may make  a world of difference.

This is the first thing that I tweaked during this test first phase and bumping import cache from 2GiB to 4GiB chopped import time in half. Basically, if your import has to occur in more than a single pass, then your import cache isn't big enough, try to increase it if your system can take it.

Sensivity To Clock Speed

Ever wondered if a system with CPUs twice faster would buy you time on import? Not really. Why? Well, if the CPUs are waiting on the disks or locks then higher clock speeds isn't going to move the needle at all. That's what's going on here. Check this out...

The reason the 3.7GHz import isn't as fast as the 3.3GHz is because my overclocking might have thrown off the balance between the core clock and the bus clock, so the CPU is spinning its wheels ,waiting to access memory and IO...

I officially declare this one moot. I'll try again later with underclocking.

Sensivity To Number Of CPUs

Scalability is an interesting challenge. Ideally, you'd want half the import time given twice the resources. In reality, import is very lock intensive to avoid collisions/corruptions so it isn't quite that linear. Here's what I got on my system, all other things being equal.

So even though the scalability isn't linear, the good thing is the more CPUs the better your performance is going to be.

<script type="text/javascript"> var gaJsHost = (("https:" == document.location.protocol) ? "https://ssl." : "http://www."); document.write(unescape("%3Cscript src='" + gaJsHost + "google-analytics.com/ga.js' type='text/javascript'%3E%3C/script%3E")); </script> <script type="text/javascript"> try { var pageTracker = _gat._getTracker("UA-12162483-1"); pageTracker._trackPageview(); } catch(err) {}</script>

Tuesday Dec 15, 2009

DSEE 7.0 On Windows Gotcha

<script type="text/javascript"> var gaJsHost = (("https:" == document.location.protocol) ? "https://ssl." : "http://www."); document.write(unescape("%3Cscript src='" + gaJsHost + "google-analytics.com/ga.js' type='text/javascript'%3E%3C/script%3E")); </script> <script type="text/javascript"> try { var pageTracker = _gat._getTracker("UA-12162483-1"); pageTracker._trackPageview(); } catch(err) {}</script>

Rationale

When you just want it to work quick and easy, and since the zip distribution lends itself particularly well to that with only extracting the archive in the way to being productive with your Directory Server, you only skim through the install guide. Well, at least if you're as impatient as I am. Make sure to read this. 

Bird's Eye View

After the quick and painless install, you add the single bin folder to your PATH environment and get crackin'. Directory Server instances can be created. As can Directory Proxy Server instances. But the usual command line in the DSRK, the indispensable ldapsearch, doesn't. What? You missed one tiny component that still needs to be "installed" on your machine. 

The Meat 

If when you try ldapsearch from DSEE 7.0 on your vanilla windows install you get the message below: 

 I'll make a quick text note here so it's searchable...

The application was unable to start correctly (0xc0150002)

Then it only means that you forgot to install the Visual C++ Redistributable Package, as stated in a tiny Note- block in the install guide which had completely escaped me during my quick skimming. Note that the doc points you to downloading the bits from microsoft but since we're nice guys, the installer is included in the DSEE ZIP archive, at the root of the zip file and is called vcredist_x86.exe. Be sure to install that if you need the ldap\* utilities.

happy LDAP'ing

Monday Dec 14, 2009

Import: Intel's HyperThreading Performance Gain

<script type="text/javascript"> var gaJsHost = (("https:" == document.location.protocol) ? "https://ssl." : "http://www."); document.write(unescape("%3Cscript src='" + gaJsHost + "google-analytics.com/ga.js' type='text/javascript'%3E%3C/script%3E")); </script> <script type="text/javascript"> try { var pageTracker = _gat._getTracker("UA-12162483-1"); pageTracker._trackPageview(); } catch(err) {}</script>

Rationale

 As I am experimenting with more and more of the new-ish Intel Nehalem-based servers, a question kept bugging me. How much am I  getting out of these "hyper" threads that appear as CPUs in the operating system? I took some time to compare a Directory Server import on my corei7 workstation with an actual customer's data set of 7.9 Million entries.

Bird's Eye View

Don't bet the house on HyperThreading.

The Meat

To conduct this test in a simple fashion, all I did was an import of these entries first with all the CPUs enabled and then by disabling all the odd CPUs. Doesn't get any simpler. All other things can be considered equal.

The result is quite stunning really.



 Import Time
 8 CPUs (with HyperThreading)
 30:56.62
 4 CPUs (without HyperThreading)
 35:04.60

Which makes a difference of about 13%...

Just thought someone out there would find this as puzzling as I. Quite honestly, I  believe this is shared between the Directory Server not scaling too well, the IO actually being a contention point and HyperThreading being somewhat ineffective. More on that later.

Tuesday Dec 08, 2009

Using Wireshark Do Dissect LDAP Traffic

<script type="text/javascript"> var gaJsHost = (("https:" == document.location.protocol) ? "https://ssl." : "http://www."); document.write(unescape("%3Cscript src='" + gaJsHost + "google-analytics.com/ga.js' type='text/javascript'%3E%3C/script%3E")); </script> <script type="text/javascript"> try { var pageTracker = _gat._getTracker("UA-12162483-1"); pageTracker._trackPageview(); } catch(err) {}</script>

Rationale

Every once in a while, an elusive issue may drive you bonkers because you can't quite identify what is going awry, which of course is crucial to solving said issue. In such cases, it's good to go back to the basics and actually analyze what is flowing on the wire. To do so, snoop is the tool. But more important than capturing the data, which is fairly well documented on the web, analyzing properly what the capture contains may put you on track to solving the issue.

Bird's Eye View

In this entry, I'll introduce how wireshark can be used to help investigate LDAP issues or even simply to double check your LDAP product is true in its performance metrics... A word of advice, never trust a single metric at face value, check yourself. The LDAP server (or web or app server for that matter) may report a response time of X, but that really is no good to your client applications if the network card or the TCP layer has trouble getting the message through the wire. Do use snoop to collect data and wireshark to check that everything is fine or dig in the snoop files.  

The Meat

First things first, installing Wireshark.

On OpenSolaris:

1- get pkgutil

2- install wireshark

Here's how:

pfexec pkgadd -d http://blastwave.network.com/csw/pkgutil_`uname  -p`.pkg
yes|pfexec /opt/csw/bin/pkgutil --install CSWwireshark

3- start wireshark, it is installed in /opt/sfw/bin/wireshark, for example do:

nohup /opt/csw/bin/wireshark 2>/dev/null &

 4- Now that wireshark is started up, you can open up you snoop file and what you get is basic Ethernet+IP+TCP decoding., like so:

5- So we will quite simply have to tell wireshark to decode the TCP packets content as being LDAP (wherever applicable). Go to Analyze->Deocde As. A window pops up to allow you to select which decoder to use. Select the "Transport" tab, and then click on LDAP in the list. Parsing through the whole snoop file again may take a while, but once it's done, it will be worth the wait.

6- Once data is properly interpreted as LDAP, we can see that those TCP packets with an LDAP payload will now be advertised as LDAP (highlighted in green) right in the wireshark interface:

Now that you have followed these few simple steps, you can dig in the snoop data and graph statistics.

For example, you can very easily graph the ratio of LDAP operations to LDAP operation taking more than one second with the statistics graphing tool:

 Of course, there are dozens of other invaluable tools in wireshark that are not only of the greatest quality but immensely useful as well to spot problems in your traffic that many higher level tools won't be able to help with, what comes to mind is stuff like:

  • issues with lower-level than like IP problems (for e.g.  CRC errors) or TCP retransmits
  • hitting the maximum bandwith on the interface (use the statistics->IO Graphs tools and select YAxis to Bytes/Tick)
  • LDAP level SLA not being met: you can check wehther a particular LDAP Time was met or not over your snoop file as shown above
  • Check for particular LDAP return codes
  • Check for particular LDAP filters
  • Filter your results on anything contained in an LDAP packet

 As I said earlier, this tool is truly fantastic.

Hope this ends up helping someone. Enjoy! 

Monday Dec 07, 2009

DSIO: A Storage Sizing Tool For LDAP

<script type="text/javascript"> var gaJsHost = (("https:" == document.location.protocol) ? "https://ssl." : "http://www."); document.write(unescape("%3Cscript src='" + gaJsHost + "google-analytics.com/ga.js' type='text/javascript'%3E%3C/script%3E")); </script> <script type="text/javascript"> try { var pageTracker = _gat._getTracker("UA-12162483-1"); pageTracker._trackPageview(); } catch(err) {}</script>

Rationale

 Ever had this seemingly insurmountable wall of incomprehension between two teams that it was pretty obvious communication would never quite work ? Last year I had an interaction with a customer where one team manages all the storage infrastructure for the company and another manages the LDAP infrastructure. The LDAP team had to "purchase" storage from the storage team. Pricing was based on guaranteed IOPS. The LDAP folks provide authentication to the business and they are given performance requirements to meet and need to architect or expand to meet demand. The gap is that the LDAP team had no real tool to translate their LDAP performance numbers in terms of IOPS in order to purchase the right storage package.

This morning, I was on a call with a prospect who wants to run everything on VMWare ESX and swear their NAS can serve the load. Our LDAP experts told them that there just was no way to meet their very stringent performance requirements. The customer shot back the usual "show me the IOPS. So I thought I'd dust off the old DSIO DTrace script and share here since it could be reused in other instances that I do not know about.

Bird's Eye View

The DTrace script hooks up to your slapd process and intercepts calls the processing functions to be able to count the number of LDAP operations. It also intercepts lower level I/O operating system calls in order to be able to associate the LDAP operation to its subsequent I/O calls. Simple, isn't it?

The Meat

Pre-requisites

  • DTrace (obviously)
  • KSH (at least 88 but that's not much to ask) 
  • the dsio script

0 to 60 in no time

First, please note that you need to have administrative rights to run DTrace.

Second, by default dsio will look for the first slapd process it finds running, so only use this default behavior when you know for a fact that you have a single instance of Directory Server running.

# ./dsio

On Solaris:

$ pfexec ./dsio

The Rest Of The Way

- P: specify the PID of the running directory to trace (in case you'd have more than one running)

- l: print detailed information with respect to the LDAP traffic

-r: print detailed information with respect to the read I/O activity

-w: print detailed information with respect to the write I/O activity

As you can see in this case I only apply a modify load to the instance so as to make the point more explicit. Printing out the details is very useful to compare the actual counts of operations and the breakdown, the times, etc...

Note that in this case the IOPs are about half the LDAP throughput. How is that possible? By exploiting one of ZFS best features, the ZIL. I initially thought that my script was wrong and went to double check things with the ZFS folks. But that story is for another article.

Special thanks to Brendan Gregg for his invaluable scripts and tutorials on DTrace which helped me tremendously. The content this dude puts out turns DTrace black art into nothing more than shell scripting. Simply mind boggling.

Enhancements

  • Allow looping indefinitely
  • Implement a CSV output

Support

Best effort again as I'm not dedicated to this, you can send your questions and support requests to arnaud -- at -- sun -- dot -- com, or post your rants as comments here and I'll do my best to take care of you.

Enjoy!

Saturday Oct 03, 2009

A Dashboard Like No Other: The OpenDS Weather Station

<script type="text/javascript"> var gaJsHost = (("https:" == document.location.protocol) ? "https://ssl." : "http://www."); document.write(unescape("%3Cscript src='" + gaJsHost + "google-analytics.com/ga.js' type='text/javascript'%3E%3C/script%3E")); </script> <script type="text/javascript"> try { var pageTracker = _gat._getTracker("UA-12162483-1"); pageTracker._trackPageview(); } catch(err) {}</script>

Rationale

 Doing so many benchmarks, profiling and other various performance related activities, I had to find a way to "keep an eye" on things while fetching emails, chatting on IM and the like. Having some experience in past projects with microcontrollers, although on Windows, I figured I could put together a little gizmo to help me keep tabs on my Directory Server.

Bird's Eye View

This is basically a simple setup with a USB Bit Whacker controlled by a Python script, feeding it data crunched from various sources, mainly the Directory Server access log, the garbage collection log and kstats... the result is a useful dashboard where I can see things happen at a glance.

The Meat

Everything starts with the USB Bit Whacker. It's a long story, but to cut short, a couple a years ago, Kohsuke Kawaguchi put together an orb that could be used to monitor the status of a build / unit tests in Hudson. Such devices are also know as eXtreme Feedback Devices or XFDs. Kohsuke chose to go with the USB Bit Whacker (UBW) for it is a USB 'aware' microcontroller that also draws power from the bus, and is therefore very versatile while remaining affordable ($25 soldered and tested from sparkfun but you can easily assemble your own). A quick search will tell you that this is a widely popular platform for hobbyists.

 On the software side, going all java would have been quite easy except for the part where you need platform specific libraries from the serial communication. Sun's javacomm library or rxtx have pros and cons but in my case, the cons were just too much of a hindrance. What's more, I am not one to inflict myself pain unless it is absolutely necessary. For that reason, I chose to go with Python. While apparently not as good on cross-platformedness compared to Java, installing the Python libraries for serial communication with the UBW is trivial and has worked for me right off the bat on every platform I have tried, namely: Mac OS, Linux and Solaris. For example, on OpenSolaris all there is to it is:

 $ pfexec easy_install-2.4 pySerial
Searching for pySerial
Reading http://pypi.python.org/simple/pySerial/
Reading http://pyserial.sourceforge.net/
Best match: pyserial 2.4
Downloading http://pypi.python.org/packages/source/p/pyserial/pyserial-2.4.tar.gz#md5=eec19df59fd75ba5a136992897f8e468
Processing pyserial-2.4.tar.gz
Running pyserial-2.4/setup.py -q bdist_egg --dist-dir /tmp/easy_install-Y8iJv9/pyserial-2.4/egg-dist-tmp-WYKpjg
setuptools
zip_safe flag not set; analyzing archive contents...
Adding pyserial 2.4 to easy-install.pth file

Installed /usr/lib/python2.4/site-packages/pyserial-2.4-py2.4.egg
Processing dependencies for pySerial
Finished processing dependencies for pySerial

 that's it! Of course, having easy_install is a prerequisite. If you don't, simply install setuptools for your python distro, which is a 400kB thing to install. You'll be glad you have it anyway.

Then, communicating with the UBW is mind boggingly easy. But let's not get ahead of ourselves, first things first:

Pluging The USB Bit Whacker On OpenSolaris For The First Tim

The controller will appear as a modem of the old days and communicating with equates to sending AT commands. For those of you who are used to accessing Load Balancers or other network equipment through the serial port, this is no big deal.

In the screenshot below, the first ls command output shows that nothing in /dev/term is an actual link, however, the second -which I issued after plugging the UBW on the usb port- shows a new '0' link has been created by the operating system.


Remember which link your ubw appeared as for our next step: talking to the board.

Your First Python Script To Talk To The UBW

I will show below how to send the UBW the 'V' command which instructs it to return the firmware version, and we'll see how to grab the return value and display it. Once you have that down, the sky is the limit. Here is how:

from serial import \*
ubw = Serial("/dev/term/0")
ubw.open()
print "Requesting UBW Firmware Version"
ubw.write("V\\n")
print "Result=["+ubw.readline().strip() + "]\\n"
ubw.close()

Below is the output for my board:

Voila!

That really is all there is to it, you are now one step away from your dream device. And it really is only a matter of imagination. Check out the documentation of current firmware to see what commands the board supports and you will realize all the neat things you can use it for: driving LEDs, Servos, LCD displays, acquiring data, ...

Concrete Example: The OpenDS Weather Station

As I said at the beginning of this post, my initial goal was to craft a monitoring device for OpenDS. Now you have a good idea of how I dealt with the hardware part, but an image is worth a thousand words so here is a snap...

On the software front, well, being a software engineer by trade, that was the easy part so that's almost not fun and I won't go inot as much detail but here is a 10,000ft view:

  • data is collected in a matrix of hash tables.
  • each hash table represent a population of data points for a sampling period
  • an individual time thread pushes a fresh list of hash tables in the matrix so as to reset the counters for a new sampling period

So for example, if we want to track CPU utilization, we only need to keep one metric. The hash table will only have one key pair. Easy. Slightly overkill but easy. Now if you want to keep track of transactions response times, the hash table will keep the response time (in ms) as a key and the number of transactions that were processed in that particular response time as the associated value. Therefore, if you have within one sampling period, 10,000 operations processed with 6,000 in 0 ms, 3,999 in 1ms and 1 in 15 ms, your hashtable will only have 3 entries as follows: [ 0 => 6000; 1=>3999; 15=>1 ]

This allows for a dramatic compression of the data compared to having a single line with etime for each operation, which would result in 10,000 lines of about 100 bytes.

What's more is that this representation of the same information allows to easily compute the average, extract the maximum value and calculate the standard deviation.

All that said, the weather station is only sent the last of the samples, so it always shows the current state of the server. And as it turns out, it is very useful, I like it very much just the way it worked out.

 Well, I'm glad to close down the shop, it's 7:30pm .... another busy Saturday

Tuesday Sep 29, 2009

OpenDS in the cloud on Amazon EC2

<script type="text/javascript"> var gaJsHost = (("https:" == document.location.protocol) ? "https://ssl." : "http://www."); document.write(unescape("%3Cscript src='" + gaJsHost + "google-analytics.com/ga.js' type='text/javascript'%3E%3C/script%3E")); </script> <script type="text/javascript"> try { var pageTracker = _gat._getTracker("UA-12162483-1"); pageTracker._trackPageview(); } catch(err) {}</script>

Rationale

Why not run your Authentication service in the cloud? This is the first step to having a proper cloud IT. There are numerous efforts going to ease deploying your infrastructure in the cloud, from Sun and others, from OpenSSO to glassfish, from SugarCRM to Domino, and on goes the list. Here is my humble contribution for OpenDS.

Bird's Eye View

 Tonight I created my EC2 account and got OpenDS going on the Amazon infrastructure in about half an hour, I will retrace my steps here and point out some of the gotchas.

The Meat

Obviously, some steps must be taken prior to installing software.

First, you need an AWS (Amazon Web Services) account with access to EC2 (Elastic Compute Cloud) and S3 (Simple Storage Service). I will say this about EC2, it is so jovially intoxicating that I would not be surprised to be surprised by my first bill when it comes... but that's good, right? At least for amazon it is, yes.

Then you need to create a key pair, trivial as well. Everything is explained in the email you receive upon subscription.

Once that's done, you can cut to the chase and log on to the AWS management console right away to get used to the concepts and terms used in Amazon's infrastructure. The two main things are an instance and a volume. The names are rather self explanatory, the instance is a running image of an operating system of your choice. The caveat is that if shut it down, the next time you start this image, you will be back to the vanilla image. Think of it as a LiveCD. Can't write persistent data to it, if you do, it won't survive a power cycle.

To persist data between cycles, we'll have to rely on volumes for now. Volumes are just what they seem to be, only virtual. You can create and delete volumes at will, of whatever size you wish. Once a volume is created and becomes available, you need to attach it to your running instance in order to be able to mount it in the host operating system. CAUTION: look carefully at the "availability zone" where your instance is running, the volume must be created in the same zone or you won't be able to attach it.

 Here's a quick overview of the AWS management console with two instances of OpenSolaris 2009.06 running. The reason I have two instances here is that one runs OpenDS 2.0.0 and the other runs DSEE 6.3 :) -the fun never ends-. I'll use it later on to load OpenDS.

My main point of interest was to see OpenDS perform under this wildly virtualized environment. As I described in my previous article on OpenDS on Acer Aspire One, virtualization brings an interesting trend in the market that is rather orthogonal to the traditional perception of the evolution of performance through mere hardware improvements...

In one corner, the heavy weight telco/financial/pharmaceutical company weighing in at many millions of dollars for a large server farm dedicated to high performance authentication/authorization services. Opposite these folks, the ultra small company curled in the other corner, looking at every way to minimize cost in order to simply run the house while allowing to grow the supporting infrastructure as business ramps up.

Used to be quite the headache, that. I mean it's pretty easy to throw indecent amounts of hardware at meeting crazy SLAs. Architecting a small, nimble deployment yet able to grow later? Not so much. If you've been in this business for some time, you know that every iteration of sizing requires to go back to capacity planning and benchmarking which is too long and too costly most of the time. That's where the elastic approaches can help. The "cloud" (basically, hyped up managed hosting) is one of them.

Our team also has its own, LDAP-specific, approach to elasticity, I will talk about that in another article, let's focus on our "cloud" for now. 

 Once your instance is running, follow these simple steps to mount your volume and we can start talking about why EC2 is a great idea that needs to be developed further for our performance savvy crowd.

In this first snapshot, I am running a stock OpenDS 2.0.0 server with 5,000 standard MakeLDIF entries. This is to keep it comparable to the database I used on the netbook. Same searchrate, sub scope, return the whole entry, across all 5,000.

If this doesn't ring a bell? Check out the Acer article. Your basic EC2 instance has about as much juice as a netbook. Now the beauty of it all is that all it takes on my part to improve the performance of that same OpenDS server is to stop my "small" EC2 instance and start a medium one.

Voila!

  I've got 2.5 times the initial performance. I did not change ONE thing on OpenDS, this took 3 minutes to do, I simply restarted the instance with more CPU. I already hear you cry out that it's a shame we can't do this live -it is virtualization after all- but I'm sure it'll come in due course. It is worth noting that even though I could use 80+% of CPU on the small instance of OpenDS, in this case I was only using about 60% so the benefit would likely be greater but I would need more client instances. This imperfect example still proves the point on the ease of use and the elasticity aspect.

The other thing that you can see coming is an image of OpenDS for EC2. I'm thinking it should be rather easy to script 2 things:

1) self-discovery of an OpenDS topology and automatic hook up in the multi master mesh and

2) snapshot -> copy -> restore the db, almost no catch up to do data wise. If you need more power, just spawn a number of new instances: no setup, no config, no tuning. How about that ?

Although we could do more with additional features from the virtualization infrastructure, there is already a number of unexplored options with what is already there. So let's roll up our sleeves and have a serious look. Below is a snapshot of OpenDS modrate on the same medium instance as before with about 25% CPU utilization. As I said before, this thing has had NO fine tuning whatsoever so these figures are with the default, out-of-the-box settings.

  I would like to warmly thank Sam Falkner for his help and advice and most importantly for teasing me into trying EC2 with his FROSUG lightning talk! That stuff is awesome! Try it yourself.

Tracking Down All Outliers From Your LDAP Servers ...

<script type="text/javascript"> var gaJsHost = (("https:" == document.location.protocol) ? "https://ssl." : "http://www."); document.write(unescape("%3Cscript src='" + gaJsHost + "google-analytics.com/ga.js' type='text/javascript'%3E%3C/script%3E")); </script> <script type="text/javascript"> try { var pageTracker = _gat._getTracker("UA-12162483-1"); pageTracker._trackPageview(); } catch(err) {}</script>

Rationale

 I was recently faced with the challenge to track down and eliminate outliers from a customer's traffic and I had to come up with some some of tool to help in diagnosing where these long response time transactions originated from. Not really rocket science -hardly anything IS rocket science, even rocket science isn't all that complicated, but I digress- yet nothing that I had in the tool box would quite serve the purpose. So I sat down and wrote a tool that would allow me to visually correlate events in real time. At least that was the idea.

Bird's Eye View

This little tool is only meant for investigations and we are working on delivering something better and more polished (code name Gualicho, shhhhhhh) for production monitoring. The tool I am describing in this article simply correlates the server throughput, peak etime, I/O, CPU, Network and Garbage Collection activity (for OpenDS). It is all presented in a sliding line metric, stacked on top of each other, making visual identification and correlation easy. Later on I will adapt the tool to work on DPS, since it is the other product I like to fine tune for my customers.

The Meat

When pointed to the access log and the GC log, here is the text output you get. There is one line per second that is displayed with the aggregated information collected from the access log and garbage collection as well as kstats for network, I/O, CPU.


If you looked at it closely, I represented the garbage collection in % which is somewhat unsual but after debating on how to make this metric available, I decided that all I was interested was a relative measure of the time spent in stop-the-world GC operations over the time the application itself is running. As I will show in the snapshot below, this is quite effective to spot correlations with high etimes in most cases. To generate this output in the GC log, all you have to do is add the following to your set of JAVA_ARGS for start-ds.java-args in /path/to/OpenDS/config/java.properties:

 -Xloggc:/data/OpenDS/logs/gc.log -XX:+PrintGCApplicationConcurrentTime -XX:+PrintGCApplicationStoppedTime

And then my GUI will show something like:


Don't hesitate to zoom in on this snapshot. The image is barely legible due to blog formatting constraints.

Excuse me if I have not waited 7+ days to take the snapshot for this article but I think this simple snap serves the purpose. You can see that most of the time we spend 2% of the time blocked in GC but sometimes we have spikes up to 8% and when this happens, even though it has little impact on the overall throughput over one second, the peak etime suddenly jumps to 50ms. I will describe in another article what we can do to mitigate this issue, I simply wanted to share this simple tool here since I think it can serve some of our expert community.

OpenDS on Acer Aspire One: Smashing!

<script type="text/javascript"> var gaJsHost = (("https:" == document.location.protocol) ? "https://ssl." : "http://www."); document.write(unescape("%3Cscript src='" + gaJsHost + "google-analytics.com/ga.js' type='text/javascript'%3E%3C/script%3E")); </script> <script type="text/javascript"> try { var pageTracker = _gat._getTracker("UA-12162483-1"); pageTracker._trackPageview(); } catch(err) {}</script>

Rationale

As far fetched as it may seem,  with the growing use of virtualization and cloud computing, the average image instance that LDAP authentication systems are having to run on look more like your average netbook than a supercomputer. With that in mind, I set out to find a reasonable netbook to test OpenDS on. I ended up with an Acer Aspire ONE with 1GB of RAM. Pretty slim on memory. Let's see what we can get out of that thing!

Bird's Eye View

In this rapid test I have done, I loaded OpenDS (2.1b1) with 5,000 entries (stock MakeLdif template delivered with it), hooked up the netbook to a closed GigE network and loaded it from a corei7 machine with searchrate. Result: 1,300+ searches per second. Not bad for a machine that only draws around 15 Watts!

The Meat 

As usual, some more details about the test but first a quick disclaimer: this is not a proper test or benchmark of the Atom as a platform, it is merely a kick in the tires. I have not measured other metrics than the throughput and only for a search workload at that. It is only to get a "feel" of it on such a lightweight sub-notebook.

In short:

  • Netbook: Acer Aspire One ZG5 - Atom N270 @1.6GHz, 1GB RAM, 100GB HDD
  • OS: OpenSolaris 2009.05
  • FS: ZFS
  • OpenDS: all stock, I did not even touch the JAVA options which I usually do
  • JAVA: 1.6 Update 13

The little guy in action, perfbar shows the CPU is all the way up there with little headroom...


Monday Aug 24, 2009

Learn From Errors: Is Your Cloud About To Burst?

<script type="text/javascript"> var gaJsHost = (("https:" == document.location.protocol) ? "https://ssl." : "http://www."); document.write(unescape("%3Cscript src='" + gaJsHost + "google-analytics.com/ga.js' type='text/javascript'%3E%3C/script%3E")); </script> <script type="text/javascript"> try { var pageTracker = _gat._getTracker("UA-12162483-1"); pageTracker._trackPageview(); } catch(err) {}</script>

Rationale

A number of customers I talk have a hugely diverse ecosystem of application relying on the LDAP infrastructure for authentication, single sign-on and also user-specific configuration storage. Very few have a strictly controlled environment with a reduced set of well-known clients.

One cause of trouble I have seen many times over sparks from client applications not being robust and poorly handle the protocol. There is an easy way to grow confidence in your infrastructure and ecosystem at the same time: after setting up the prototype and before you go in production, during the QA stage, try to spend some time intentionally injecting errors in your traffic. You'll immediately see if  clients start blowing up left and right!

Bird's Eye View

To cut to the chase, this plug-in sits on DS as a pre operation search. You can "create" any entry simply by adding a configuration parameter to the plug-in. For example, if you want to have DS return "no such entry" (Error 32) for cn=nosuch,dc=example,dc=com, like shown below :

all you would have to do (once the plug-in is properly set up) is:

dsconf set-plugin-prop arbitrary-response argument+:cn=nosuch,dc=example,dc=com#32#0

The Meat

I honestly have no idea why I have not shared this small tool earlier. I wrote this plug-in years ago for Directory Server 5.2 and later on recompiled it against DS 6.x on OpenSolaris. Currently it is built for Solaris 9/10/OpenSolaris x86/x64. If you want it on another platform, let me know and I'll spin it for you.

To install this plugin, simply unzip the file and then follow the instructions in the bundled README file. The sequence of commands will work for DS 6.x.

In its current version (1.1b) the plug-in can inject errors as well as delays into an arbitrary response. This means that you can easily test how connection idle timeouts are managed by your client applications connection pooling mechanism, if any.

Injecting delay is done through the third parameter of the plug-in. For example, to return a valid response with error code 0 after 15 seconds, you would have to add the following argument to the plug-in:

dsconf set-plugin-prop arbitrary-response argument+:cn=ok,dc=example,dc=com#0#15

Useful things not in this version

  1. I will probably add a 4th parameter which will represent the probability that the error is returned, otherwise, just pass on the request to DS core

  2. Ability to interpret regular expressions in the base DN part of the plug-in argument

That's it for today!

Thursday Jun 25, 2009

Speed Up Your Server For Free: Using ZFS To Your Advantage For Access Logs

Rationale

Access that can cause some performance focused users some discomfort. The one main thing usually making logs a performance hog is the fact that entries must be ordered somehow. I our products, the ordering is chronological. Here is an easy way to alleviate the issue if you're on Solaris and have a spare drive.

Bird's Eye View

ZFS Intent Log (or ZIL) can be configured on a separate disk to help synchronous performance.

You will find lots of literature  on the matter out there, including Neil and Brendan's blogs for example.

The Meat

So you heard about all the great benefits you can get with SSDs but don't have one yet (Go get one!) or don't have enough that you can dedicate one to your logs?

Worry not!

All you need to do is create a ramdisk drive that will be used for ZIL when we create our access-log-dedicated ZFS pool. Here's how:

$ ramdiskadm -a zil-drive 512m
$ zpool create log-pool c8d1 log /dev/ramdisk/zil-drive

For DPS, all you need to do is:

$ dpconf set-access-log-prop log-file-name:/log-pool/access

It's just as simple for DS, do:

$ dsconf set-log-prop access path:/log-pool/access 

And OpenDS is no more complicated to configure, do:

$ dsconfig -n set-log-publisher-prop --publisher-name "File-Based Access Logger" --set log-file:/log-pool/access

OR use the interactive command, simply do: 

$ dsconfig

and follow

  • 20) log publisher
  • 3)  View and edit an existing Log Publisher
  • 1)  File-Based Access Logger
  • 3)  log-file              logs/access
  • 2)  Change the value
  • type /log-pool/access, hit return
  • type "f" to finish and apply
  • restart OpenDS bin/stop-ds;bin/start-ds
  •  I know it looks like more work but the nice thing about dsconf is that it gives you context and you will get familiar with other aspects of the server


Caveats

    In the rare event that a server configured as described here loses power, the ZIL -being on a ramdisk- will be lost. This does not however corrupt the data stored on the disk and upon restart, all you would have to do is add the ZIL on a newly created ramdisk again. This can of course be automated to be done at boot time so that you do not need to do it yourself at every power cycle.

<script type="text/freezescript"> var gaJsHost = (("https:" == document.location.protocol) ? "https://ssl." : "http://www."); document.write(unescape("%3Cscript src='" + gaJsHost + "google-analytics.com/ga.js' type='text/javascript'%3E%3C/script%3E")); </script> <script type="text/freezescript"> try { var pageTracker = _gat._getTracker("UA-12162483-1"); pageTracker._trackPageview(); } catch(err) {}</script>

What Was The Best Performance You Ever Had With OpenDS ?

Rationale

    After discussing the article I posted yesterday with someone, they asked me: "What was the best performance you ever had with OpenDS?" and though I couldn't really answer off the top of my head, I dug in my archives from the last benchmark and found what I think was my best run so far.

Bird's Eye View

    To put it bluntly, about 120,000 operations per second @ <2ms. As this was done while I was tuning OpenDS for the 10 Million entries benchmark on Intel Nehalem-based Sun Blade x6270, I therefore had the whole setup, 10M entries, searches span across the entire DB and some of the Java tunings are bleeding edge, as I will detail in the next section.

The Meat

Environment

    As I said earlier, this is the same environment as described in my previous entry except for Java.

Java

    The JVM arguments are as follows: -d64 -server -Xbootclasspath/p:/path/to/SecretSauce.jar -XX:+UseNUMA -XX:LargePageSizeInBytes=2m -XX:+UseCompressedOops -XX:+AggressiveOpts -XX:+UseBiasedLocking -Xms6g -Xmx6g -Xmn4g -XX:+UseConcMarkSweepGC -XX:+CMSIncrementalMode -XX:CMSInitiatingOccupancyFraction=85 -XX:MaxTenuringThreshold=1

    It's all pretty much business as usual but some of them call for explanation:

  • -Xbootclasspath/p:/path/to/SecretSauce.jar: One of our engineers, our lead for OpenDS core actually, has found a significant performance improvement in one of the JVM's core classes. This SecretSauce.jar contains his patched and improved version that overrides the JVM's own at run time. This makes a big difference in lowering GC pause times.
  • -XX:+UseNUMA: this is simply because the Sun Blade x6270 is a NUMA architecture and using this switch tells the JVM to be clever about memory and cache locality.
  • -XX:+UseCompressedOops: This allows to benefit of the 64-bit JVM larger Heap size, actually not quite as big but bigger than that of the 32-bit JVM while retaining 32-bit like performance. The best of both worlds. Works beautifully. And it is being improved ...

Results 

Searches Completed
Count Avg/Second Avg/Interval Std Dev Corr Coeff
35428814 116160.046 580800.230 10313.696 -0.037
Search Time (ms)
Total Duration Total Count Avg Duration Avg Count/Interval Std Dev Corr Coeff
60861940 35428857 1.718 580800.934 0.119 0.023

Caveats

    So, now that I told you all my secrets, you're wondering why I didn't use those settings for the benchmark? Because the benchmark is supposed to give me numbers on what could be achieved in a production environment, and in this case, using our patched JVM core class and a somewhat experimental or at least relatively new memory addressing mode of  the JVM isn't what I would advise to a customer about to go live.

All these bleeding edge settings only give us a 12% boost overall, I don't think it is worth the risk. But this shows that we are paving the way for an ever increasing performance on OpenDS. Tomorrow, these settings will all be well proven and safe for production. Just hang in there.

<script type="text/freezescript"> var gaJsHost = (("https:" == document.location.protocol) ? "https://ssl." : "http://www."); document.write(unescape("%3Cscript src='" + gaJsHost + "google-analytics.com/ga.js' type='text/javascript'%3E%3C/script%3E")); </script> <script type="text/freezescript"> try { var pageTracker = _gat._getTracker("UA-12162483-1"); pageTracker._trackPageview(); } catch(err) {}</script>

Wednesday Jun 24, 2009

OpenDS Nehalem Benchmark - SunBlade 6270

Rationale

Long have we heard that the new Nehalem-based Sun x86 systems would bring significant performance boost over the AMD Opterons still ruling the land to this day. The whole idea of the test was to see in the particular case of Directory Services, and even more specifically of OpenDS, how this translated into throughput, response time and all the good things we (meaning the seriously loony LDAP geeks) like to look at...

Bird's Eye View

 On this single blade, OpenDS achieves over 93,000 search operations per second and over 17,000 modification operations per second. Under lighter -but still significant throughput always above 70,000k ops/sec- OpenDS delivers sub millisecond response time.

Sounds too good to be true? Then read further...

To sum it up as Benoit did in his post, this would give you, in a fully populated 6000 chassis, the ability to process almost A MILLION REQUESTS PER SECOND in a well integrated, highly available and easily manageable package. it does NOT get any better from any vendor out there as of today.

Special thanks to Benoit Chaffanjon and his team for making this equipment available to us on short notice. Their support, reactivity and in-depth knowledge of all things benchmark is what makes them top-notch and an indispensable component of our success.

The Meat

Maybe you have already heard about Benoit's previous benchmark of DSEE (6.3.1) on Nehalem. If you haven't, read it, it'll give all you the background you need to read these results here. I tried to stick as much as I could to his bench, and I think did a pretty good job at that. The main intentional difference between our two benches is that in his, searches only span across 1 Million entries among the 10 Million data base. In mine, searches span across the whole 10 Million entries. In practice, he's right to do his benchmarks the way he does, as it better reflects the reality of how most customers end up consuming data, but mine is more stressful on the system.

Setup

Hardware

Software

Tunings

Hardware

None

Software

Solaris
  • Cap the ZFS ARC size to ( SYSTEM MEMORY \* 0.95 ) - OPENDS JVM HEAP SIZE
  • Disable ZFS cache flush since the storage takes care of that for us and has persistent cache (4GB of NVRAM)
  • Put ZFS ZIL on a dedicated SSD

Other things to consider doing:

    • use jumbo frames if returning whole entries, YMMV depending on your most frequent access patterns. I haven't tried this time around for lack of time but this should be interesting in reducing the network overhead. As we'll see later, OpenDS on this blade can max out a gigabit Ethernet connection.
Java

With very high volumes like we are seeing here, say above 80k ops/sec, you will likely want to bump request handlers and worker threads a notch to cope with the frenzy of the traffic. When you do so, the 32-bits JVM will quickly become too small no matter what tunings you try. Even though the 64-bits is not as space efficient for cache and all other aspects of memory access, it will provide an extremely stable environment for OpenDS even under heavy client traffic. I have been able to attach 10,000 hyper-clients (as in clients continuously sending traffic with no pause between requests) to OpenDS without a problem.

To cut to the chase, the settings:

OpenDS

 Worker Threads
32

 Connection Handlers

 16


As I have said previously, you may want to dial these values depending on a couple of factors:

  • How many clients you have at peak
  • How quickly your client applications open their connections (bursts or ramped up?)
  • How frantic a client is on each connection in average

If you have 5,00 clients opening 100 connections all at once, you will likely want to have more connection handlers to be able to cope with the suddenness of the pattern. This will however come at a performance cost (that we have yet to appropriately profile) under more normal circumstances.

If you have few frantic clients, these values will be right, you may want to bump up the number of worker threads a bit. This too is subobptimal  under normal circumstances.

Note: regardless of the access pattern, these settings will be adequate to serve whatever load you throw at the server, I'm only pointing out ways to improve the performance a bit. In particular, these advices will contribute to keeping the request backlog on a leash.

Import

Importing our 10M entries took 14'59", which averages at 11,120 entries per second.

Search Performance

These tests mainly aim at determining the maximum throughput that can be achieved. As such, they tend to load the servers to artificially high number of concurrent clients, inflating the response time compared to what can be expected under more normal production conditions... in the last section (Lighter Load), I will show what the response time looks like with lighter loads and lower overall throughput.

 Exact Search

 Return 1 Attribute
Heavy Load, Maximum Throughput 
Actual Duration
1839 seconds (30m 39s)
Searches Completed
Count Avg/Second Avg/Interval Std Dev Corr Coeff
169056808 93660.281 468301.407 5590.951 -0.004
Exceptions Caught
Count Avg/Second Avg/Interval Std Dev Corr Coeff
0 0.000 0.000 0.000 0.000
Entries Returned
Total Avg Value Avg/Second Avg/Interval Std Dev Corr Coeff
169056809 1.000 93660.282 468301.410 0.000 0.000
Search Time (ms)
Total Duration Total Count Avg Duration Avg Count/Interval Std Dev Corr Coeff
450590169 169056809 2.665 468301.410 0.189 -0.006
Lighter Load
Searches Completed
Count Avg/Second Avg/Interval Std Dev Corr Coeff
28143684 92274.374 461371.869 3791.935 -0.040
Exceptions Caught
Count Avg/Second Avg/Interval Std Dev Corr Coeff
0 0.000 0.000 0.000 0.000
Entries Returned
Total Avg Value Avg/Second Avg/Interval Std Dev Corr Coeff
28143684 1.000 92274.374 461371.869 0.000 0.000
Search Time (ms)
Total Duration Total Count Avg Duration Avg Count/Interval Std Dev Corr Coeff
30399915 28143685 1.080 461371.885 0.055 0.023

Return whole entry
Heavy Load, Maximum Throughput 
Actual Duration
1839 seconds (30m 39s)
Searches Completed
Count Avg/Second Avg/Interval Std Dev Corr Coeff
151991059 84205.573 421027.864 5264.386 -0.006
Exceptions Caught
Count Avg/Second Avg/Interval Std Dev Corr Coeff
0 0.000 0.000 0.000 0.000
Entries Returned
Total Avg Value Avg/Second Avg/Interval Std Dev Corr Coeff
151991061 1.000 84205.574 421027.870 0.000 0.000
Search Time (ms)
Total Duration Total Count Avg Duration Avg Count/Interval Std Dev Corr Coeff
360407639 151991065 2.371 421027.881 0.183 0.022

Lighter Load
Searches Completed
Count Avg/Second Avg/Interval Std Dev Corr Coeff
21896817 71792.843 358964.213 4125.281 -0.020
Exceptions Caught
Count Avg/Second Avg/Interval Std Dev Corr Coeff
0 0.000 0.000 0.000 0.000
Entries Returned
Total Avg Value Avg/Second Avg/Interval Std Dev Corr Coeff
21896817 1.000 71792.843 358964.213 0.000 0.000
Search Time (ms)
Total Duration Total Count Avg Duration Avg Count/Interval Std Dev Corr Coeff
15177289 21896817 0.693 358964.213 0.047 0.023

Sub Scope Search

Return 1 Attribute
Heavy load, Maximum Throughput 
Actual Duration
1838 seconds (30m 38s)
Searches Completed
Count Avg/Second Avg/Interval Std Dev Corr Coeff
169252464 93768.678 468843.391 6339.082 -0.012
Exceptions Caught
Count Avg/Second Avg/Interval Std Dev Corr Coeff
0 0.000 0.000 0.000 0.000
Entries Returned
Total Avg Value Avg/Second Avg/Interval Std Dev Corr Coeff
169252464 1.000 93768.678 468843.391 0.000 0.000
Search Time (ms)
Total Duration Total Count Avg Duration Avg Count/Interval Std Dev Corr Coeff
270122894 169252465 1.596 468843.393 0.140 0.022
Lighter Load
Searches Completed
Count Avg/Second Avg/Interval Std Dev Corr Coeff
24902860 81648.721 408243.607 4020.767 -0.011
Exceptions Caught
Count Avg/Second Avg/Interval Std Dev Corr Coeff
0 0.000 0.000 0.000 0.000
Entries Returned
Total Avg Value Avg/Second Avg/Interval Std Dev Corr Coeff
24902860 1.000 81648.721 408243.607 0.000 0.000
Search Time (ms)
Total Duration Total Count Avg Duration Avg Count/Interval Std Dev Corr Coeff
15166324 24902860 0.609 408243.607 0.039 0.023

Return Whole Entry
Heavy Load, Maximum Throughput 
Actual Duration
1839 seconds (30m 39s)
Searches Completed
Count Avg/Second Avg/Interval Std Dev Corr Coeff
152888061 84702.527 423512.634 6003.399 -0.008
Exceptions Caught
Count Avg/Second Avg/Interval Std Dev Corr Coeff
0 0.000 0.000 0.000 0.000
Entries Returned
Total Avg Value Avg/Second Avg/Interval Std Dev Corr Coeff
152888064 1.000 84702.529 423512.643 0.000 0.000
Search Time (ms)
Total Duration Total Count Avg Duration Avg Count/Interval Std Dev Corr Coeff
270188257 152888064 1.767 423512.643 0.154 0.013

Lighter Load
Searches Completed
Count Avg/Second Avg/Interval Std Dev Corr Coeff
22151207 72626.908 363134.541 3680.320 -0.007
Exceptions Caught
Count Avg/Second Avg/Interval Std Dev Corr Coeff
0 0.000 0.000 0.000 0.000
Entries Returned
Total Avg Value Avg/Second Avg/Interval Std Dev Corr Coeff
22151207 1.000 72626.908

Tuesday Jun 23, 2009

No Directory Manager - Protect Your LDAP Server

Rationale

    Many institutions, companies and organizations have security policies in place to keep security under control with an homogeneous environment. One of those guidelines mandates that no credentials be shared  between any two employees. When that is the case, cn=Directory Manager lays as what seems like a gaping hole in violation of such policies.

The other fact that bothers regulators with this user is that it is not subjected to Access Controls. It can therefore, by design, by-pass any carefully-designed access restriction policy. While this can sometimes be useful for performance reasons, this is incompatible with a quest of absolute security.

There are institutions where this is not tolerable.
Here is a painless way to stay compliant.

Bird'sEye View

    The idea behind this tip is to disable "cn=Directory Manager" knowing that a number of things are perfectible about this use, the main one being that one could run a brute force attack on it. Knowing the user name, which remains to its default more often than not, only makes things worse. So the number 1 thing would be to change the user name to some other value. But that would still allow brute force attacks.

The other thing that can be done is to null the directory manager password, which, combined with a mandatory password, effectively renders "cn=Directory Manager" unusable.

The Meat

  1. create a random password and store it in a file protected on the host to be readable only by root.
        e.g. store pd80wu709@w87-3WQJX%mjx097hc&50 in /path/to/cryptic-directory-manager-password.
        Note: Do not use echo or cat or anything of that sort as this could be sniffed. Use an editor like vi, joe or whatever is most convenient.
  2. create a random user. The only constraint is that it should be a valid DN - see rfc 2253 - and even that rule can be bent a bit...
       e.g. store tr-d7=9gcxf7tu in /path/to/cryptic-directory-manager-dn
       Note: take the same precautions as in step 1
  3. Never use the same user and password between any 2 instances of  Directory Server instances
      e.g. dsadm create -D `echo  /path/to/cryptic-directory-manager-dn` -w /path/to/cryptic-directory-manager-password -p xyz -P zyx </path/to>/instance
  4. delete the cryptic password file
  5. delete the cryptic dn file
  6. edit </path/to>/instance/config/dse.ldif and remove the value of the nsslapd-rootpw so that its contents are blank
    e.g.: nsslapd-rootpw:
  7. start the instance
    e.g. dsadm start </path/to>/instance


Your directory manager is effectively unusable and has little to no chance of having been compromised at any point of creating or starting the instance.[ if you really want absolute security, use a small program that will quietly output a randomly-generated password to file with 600 rights ]

Note that for an already created instance, you can simply do step 5 & 6, which is nice and easy. The only addition in that case is to check that the
require-bind-pwd-enabled property is on.
  e.g.
    $dsconf get-server-prop require-bind-pwd-enabled
    require-bind-pwd-enabled  :  on

Since at this point your directory manager is disabled, you will need to use an account like cn=admin,cn=Administrators,cn=config as your dsconf user.
simply export LDAP_ADMIN_USER=cn=admin,cn=Administrators,cn=config or use dsconf <command> -D cn=admin,cn=Administrators,cn=config ...

Caveats

    When following this procedure, you will end up with a server that only has "regular" users. This is mostly good but has a handful of shortcomings, such as not being able to repair ACIs ... since now all your users, including the administration accounts, are subjected to ACIs evaluation, you could end up in a state where all your administration accounts are locked. Care must be taken to keep an administration account with well calibered Access Controls. There also some additional troubleshooting operations that mandate (per the code) be done by directory manager.

<script type="text/javascript"> var gaJsHost = (("https:" == document.location.protocol) ? "https://ssl." : "http://www."); document.write(unescape("%3Cscript src='" + gaJsHost + "google-analytics.com/ga.js' type='text/javascript'%3E%3C/script%3E")); </script> <script type="text/javascript"> try { var pageTracker = _gat._getTracker("UA-12162483-1"); pageTracker._trackPageview(); } catch(err) {}</script>
About

Directory Services Tutorials, Utilities, Tips and Tricks

Search

Categories
Archives
« April 2014
SunMonTueWedThuFriSat
  
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
   
       
Today