Monday Dec 21, 2009

Twice Faster: Import Tips And Tricks

Rationale

We understand. Really, we do.

You want to import your data as fast possible. Not only nobody likes to wait but there are time constraints on everyone of us. In particular, we're constrained by maintenance windows.

So, what does give you the best return on investment? Higher CPU Clock speed? More CPUs? More memory? We're here to discover just that together.

Irrespective of the litterature on the topic, you must make sure that import will not have to be done in multiple passes or you'll get killed on index merging time. To do so, shoot high and dial down gradually to find the right cache size for your data set.

Bird's Eye View

In essence, what I did was to start an import on a vanilla system and Directory Server with a customer's actual data, 7.9M entries. The system I am testing with isn't a server. The system doesn't matter, it is a constant. The variable are the amount of import cache, the number of CPUs active (1-8) and the CPU clock speed (from 3.3 to 3.7GHz). In short, memory matters most.

The Meat

The Setup

The instance I am doing this with is an actual telco setup, with 7.9M entries in the LDIF file. The LDIF weighs in at 12GiB. There are 10 custom indexes configured for equality only.The 8 system indexes are there as well.

On the physical side of things, the machine is my desktop, an Intel Corei7 975EE @ 3.3GHz. It has a 64GB Intel X25 and a 10,000 rpm SATA drive. The disk layout is described in more detail here.

Sensivity To Import Cache 

Despite what the documentation says, there are huge gains to be reaped from increasing the import cache size, and depending on your data set, this may make  a world of difference.

This is the first thing that I tweaked during this test first phase and bumping import cache from 2GiB to 4GiB chopped import time in half. Basically, if your import has to occur in more than a single pass, then your import cache isn't big enough, try to increase it if your system can take it.

Sensivity To Clock Speed

Ever wondered if a system with CPUs twice faster would buy you time on import? Not really. Why? Well, if the CPUs are waiting on the disks or locks then higher clock speeds isn't going to move the needle at all. That's what's going on here. Check this out...

The reason the 3.7GHz import isn't as fast as the 3.3GHz is because my overclocking might have thrown off the balance between the core clock and the bus clock, so the CPU is spinning its wheels ,waiting to access memory and IO...

I officially declare this one moot. I'll try again later with underclocking.

Sensivity To Number Of CPUs

Scalability is an interesting challenge. Ideally, you'd want half the import time given twice the resources. In reality, import is very lock intensive to avoid collisions/corruptions so it isn't quite that linear. Here's what I got on my system, all other things being equal.

So even though the scalability isn't linear, the good thing is the more CPUs the better your performance is going to be.

<script type="text/javascript"> var gaJsHost = (("https:" == document.location.protocol) ? "https://ssl." : "http://www."); document.write(unescape("%3Cscript src='" + gaJsHost + "google-analytics.com/ga.js' type='text/javascript'%3E%3C/script%3E")); </script> <script type="text/javascript"> try { var pageTracker = _gat._getTracker("UA-12162483-1"); pageTracker._trackPageview(); } catch(err) {}</script>

Saturday Dec 19, 2009

DPS Coherence Plug-In: New Features

<script type="text/javascript"> var gaJsHost = (("https:" == document.location.protocol) ? "https://ssl." : "http://www."); document.write(unescape("%3Cscript src='" + gaJsHost + "google-analytics.com/ga.js' type='text/javascript'%3E%3C/script%3E")); </script> <script type="text/javascript"> try { var pageTracker = _gat._getTracker("UA-12162483-1"); pageTracker._trackPageview(); } catch(err) {}</script>

Rationale

After having conversations on this plug-in both internally and with a handful of you out there, I sat down and added what I felt was the main missing features in my current implementation. This mainly covers security but also some flexibility. 

Bird's Eye View

On the security front, you now have the ability to filter out attributes so they don't make it in the cache (e.g. userPassword). What's more, you can filter out what entries you don't want in the cache in order to avoid the cache to be polluted by the occasional non-production (e.g. administration) hits.

On the flexibility front, you can set a time to live for entries that do make it in the cache. This allows to control whether you want to retain a value forever in the cache (default) or you want to make sure that it gets evicted after a certain time. You can also provide a list of regular expressions for the Bind DN (the identity) you grant access to the cache. And of course, you can decide to inlcude (default) or exclude unauthenticated clients to access the cache as well.

The Meat 

  • attributeNotToCache: userPassword
  • This attribute in the plugin's configuration entry can be multivalued and is not a regular expression but a plain string. Case, as always in LDAP, matters not. Any attribute name matching on of the provided values will be stripped from the entry before storing in Coherence.

  • dnNotToCache: .\*,ou=secret,o=o
This attribute can multivalued and allows to prevent DNs matching the regular expression to be stored in Coherence.
  • cacheForBindDN: cn=[2-3],o=o
This is attribute can be multivalued. It is a regular expression. Any authenticated clients' Bind DN must match one of the provided regular expressions to be granted access to the contents stored in Coherence.
  • cacheForAnonymous: false
This attribute is single valued. It is a boolean, either true or false. When false, unauthenticated clients will not be granted access to the contents stored in Coeherence and will therefore always hit the back-end.
  • cacheExpirationInMS: 30000

This attribute is single valued. It is a long and represents the length of time in milliseconds that an entry should be kept in the cache after the last time it has been accessed.

So, in the end, here is an example configuration entry:

dn: cn=CoherencePlugin,cn=Plugins,cn=config
objectClass: top
objectClass: configEntry
objectClass: plugin
objectClass: extensibleObject
cn: CoherencePlugin
description: Oracle Coherence Cache Plugin
enabled: true
pluginClassName: com.sun.directory.proxy.extensions.CoherencePlugin
pluginType: preoperation search
pluginType: search result entry
pluginType: postoperation delete
pluginType: postoperation modify
pluginType: postoperation modify dn
cacheName: LDAPCache
attributeNotToCache: userpassword
attributeNotToCache: aci
dnNotToCache: .\*,ou=secret,o=o
dnNotToCache: .\*,cn=config
cacheForBindDN: cn=[2-3],o=o
cacheForBindDN: uid=user.[0-9]+,ou=People,o=o
cacheForAnonymous: false
cacheExpirationInMS: 30000

That's it for my Friday night, let me know if there is more than DPS+Coherence can do for you!

As always, if you want to try this DPS plug-in, ping me: arnaud -at- sun -dot- com

Wednesday Dec 16, 2009

Oracle Coherence Support Right In DPS!!!

Rationale

Why do caching in DPS ?

The Directory Server back-ends are not able to "absorb" as many updates when they're stressed with a large proportion of searches. After all there's already caching on the back-end Directory Server itself. It helps a lot performance, since reads return to the client faster, it relieves some of the stress and frees up resources to take care of the writes that lock resources for longer atomic stretches of time. But as long as searches hit the back-end, even with cache, there's some weight lifting to be done: open the connection, parse the request, put the request in the work queue, lookup entries in the cache, return the entry, close the connection...

That's why caching right in DPS started to look appealing to me.

Why Coherence ?

Well, as much as one may think it is because Sun is about to be gobbled up by Oracle that I made this choice but the answer is no. Coherence is simply a compelling choice, these guys seem to have made all the technical choices I would have ... and then some. For one, you download the bits, then just start it and it works. It may sound like a marketing pitch but see for yourself. Excellent 0 to 60 in my book. Once you have it working, you get acquainted with it, the protocol is dead simple, the API is clean, robust and pretty lean. After that, you check the docs out (that's right, I read the docs after the fact) and start to realize how powerful a platform it is, how expandable it is, how far you can push deployments to accommodate growing performance or reliability needs. 

Bird's Eye View

The integration with Coherence is done by way of DPS (7+) plug-in that will -asynchronously- populate a Coherence cache with entries being returned by your regular traffic. When requests come in, a lookup is done to check if the entry is present in the cache and returned immediately if it is the case, otherwise the request is routed to the back-end as usual.

Note that I'm not making any claims on the performance aspect from the client's perspective of this caching approach because our Directory Server back-end is already pretty darn fast. This for sure relieves it from a bit of "frequent" traffic and will certainly benefit the overall performance of the topology. The relief will most certainly result in improved write response times but nothing speaks to the performance of the Coherence cache lookup. I just haven't collected enough data so far.

The Meat

nitty gritty anyone ?

Suppose we have a setup like this ... 

 the first read would be processed as described below:


but the second one would be processed without hitting the back-end.

Understand the tremendous impact this will have on your back-end ?

It frees it up to process writes or some heavy weight searches...

How to get the plug-in? ask nicely. 

Tuesday Dec 15, 2009

DSEE 7.0 On Windows Gotcha

<script type="text/javascript"> var gaJsHost = (("https:" == document.location.protocol) ? "https://ssl." : "http://www."); document.write(unescape("%3Cscript src='" + gaJsHost + "google-analytics.com/ga.js' type='text/javascript'%3E%3C/script%3E")); </script> <script type="text/javascript"> try { var pageTracker = _gat._getTracker("UA-12162483-1"); pageTracker._trackPageview(); } catch(err) {}</script>

Rationale

When you just want it to work quick and easy, and since the zip distribution lends itself particularly well to that with only extracting the archive in the way to being productive with your Directory Server, you only skim through the install guide. Well, at least if you're as impatient as I am. Make sure to read this. 

Bird's Eye View

After the quick and painless install, you add the single bin folder to your PATH environment and get crackin'. Directory Server instances can be created. As can Directory Proxy Server instances. But the usual command line in the DSRK, the indispensable ldapsearch, doesn't. What? You missed one tiny component that still needs to be "installed" on your machine. 

The Meat 

If when you try ldapsearch from DSEE 7.0 on your vanilla windows install you get the message below: 

 I'll make a quick text note here so it's searchable...

The application was unable to start correctly (0xc0150002)

Then it only means that you forgot to install the Visual C++ Redistributable Package, as stated in a tiny Note- block in the install guide which had completely escaped me during my quick skimming. Note that the doc points you to downloading the bits from microsoft but since we're nice guys, the installer is included in the DSEE ZIP archive, at the root of the zip file and is called vcredist_x86.exe. Be sure to install that if you need the ldap\* utilities.

happy LDAP'ing

Monday Dec 14, 2009

Import: Intel's HyperThreading Performance Gain

<script type="text/javascript"> var gaJsHost = (("https:" == document.location.protocol) ? "https://ssl." : "http://www."); document.write(unescape("%3Cscript src='" + gaJsHost + "google-analytics.com/ga.js' type='text/javascript'%3E%3C/script%3E")); </script> <script type="text/javascript"> try { var pageTracker = _gat._getTracker("UA-12162483-1"); pageTracker._trackPageview(); } catch(err) {}</script>

Rationale

 As I am experimenting with more and more of the new-ish Intel Nehalem-based servers, a question kept bugging me. How much am I  getting out of these "hyper" threads that appear as CPUs in the operating system? I took some time to compare a Directory Server import on my corei7 workstation with an actual customer's data set of 7.9 Million entries.

Bird's Eye View

Don't bet the house on HyperThreading.

The Meat

To conduct this test in a simple fashion, all I did was an import of these entries first with all the CPUs enabled and then by disabling all the odd CPUs. Doesn't get any simpler. All other things can be considered equal.

The result is quite stunning really.



 Import Time
 8 CPUs (with HyperThreading)
 30:56.62
 4 CPUs (without HyperThreading)
 35:04.60

Which makes a difference of about 13%...

Just thought someone out there would find this as puzzling as I. Quite honestly, I  believe this is shared between the Directory Server not scaling too well, the IO actually being a contention point and HyperThreading being somewhat ineffective. More on that later.

Saturday Oct 03, 2009

A Dashboard Like No Other: The OpenDS Weather Station

<script type="text/javascript"> var gaJsHost = (("https:" == document.location.protocol) ? "https://ssl." : "http://www."); document.write(unescape("%3Cscript src='" + gaJsHost + "google-analytics.com/ga.js' type='text/javascript'%3E%3C/script%3E")); </script> <script type="text/javascript"> try { var pageTracker = _gat._getTracker("UA-12162483-1"); pageTracker._trackPageview(); } catch(err) {}</script>

Rationale

 Doing so many benchmarks, profiling and other various performance related activities, I had to find a way to "keep an eye" on things while fetching emails, chatting on IM and the like. Having some experience in past projects with microcontrollers, although on Windows, I figured I could put together a little gizmo to help me keep tabs on my Directory Server.

Bird's Eye View

This is basically a simple setup with a USB Bit Whacker controlled by a Python script, feeding it data crunched from various sources, mainly the Directory Server access log, the garbage collection log and kstats... the result is a useful dashboard where I can see things happen at a glance.

The Meat

Everything starts with the USB Bit Whacker. It's a long story, but to cut short, a couple a years ago, Kohsuke Kawaguchi put together an orb that could be used to monitor the status of a build / unit tests in Hudson. Such devices are also know as eXtreme Feedback Devices or XFDs. Kohsuke chose to go with the USB Bit Whacker (UBW) for it is a USB 'aware' microcontroller that also draws power from the bus, and is therefore very versatile while remaining affordable ($25 soldered and tested from sparkfun but you can easily assemble your own). A quick search will tell you that this is a widely popular platform for hobbyists.

 On the software side, going all java would have been quite easy except for the part where you need platform specific libraries from the serial communication. Sun's javacomm library or rxtx have pros and cons but in my case, the cons were just too much of a hindrance. What's more, I am not one to inflict myself pain unless it is absolutely necessary. For that reason, I chose to go with Python. While apparently not as good on cross-platformedness compared to Java, installing the Python libraries for serial communication with the UBW is trivial and has worked for me right off the bat on every platform I have tried, namely: Mac OS, Linux and Solaris. For example, on OpenSolaris all there is to it is:

 $ pfexec easy_install-2.4 pySerial
Searching for pySerial
Reading http://pypi.python.org/simple/pySerial/
Reading http://pyserial.sourceforge.net/
Best match: pyserial 2.4
Downloading http://pypi.python.org/packages/source/p/pyserial/pyserial-2.4.tar.gz#md5=eec19df59fd75ba5a136992897f8e468
Processing pyserial-2.4.tar.gz
Running pyserial-2.4/setup.py -q bdist_egg --dist-dir /tmp/easy_install-Y8iJv9/pyserial-2.4/egg-dist-tmp-WYKpjg
setuptools
zip_safe flag not set; analyzing archive contents...
Adding pyserial 2.4 to easy-install.pth file

Installed /usr/lib/python2.4/site-packages/pyserial-2.4-py2.4.egg
Processing dependencies for pySerial
Finished processing dependencies for pySerial

 that's it! Of course, having easy_install is a prerequisite. If you don't, simply install setuptools for your python distro, which is a 400kB thing to install. You'll be glad you have it anyway.

Then, communicating with the UBW is mind boggingly easy. But let's not get ahead of ourselves, first things first:

Pluging The USB Bit Whacker On OpenSolaris For The First Tim

The controller will appear as a modem of the old days and communicating with equates to sending AT commands. For those of you who are used to accessing Load Balancers or other network equipment through the serial port, this is no big deal.

In the screenshot below, the first ls command output shows that nothing in /dev/term is an actual link, however, the second -which I issued after plugging the UBW on the usb port- shows a new '0' link has been created by the operating system.


Remember which link your ubw appeared as for our next step: talking to the board.

Your First Python Script To Talk To The UBW

I will show below how to send the UBW the 'V' command which instructs it to return the firmware version, and we'll see how to grab the return value and display it. Once you have that down, the sky is the limit. Here is how:

from serial import \*
ubw = Serial("/dev/term/0")
ubw.open()
print "Requesting UBW Firmware Version"
ubw.write("V\\n")
print "Result=["+ubw.readline().strip() + "]\\n"
ubw.close()

Below is the output for my board:

Voila!

That really is all there is to it, you are now one step away from your dream device. And it really is only a matter of imagination. Check out the documentation of current firmware to see what commands the board supports and you will realize all the neat things you can use it for: driving LEDs, Servos, LCD displays, acquiring data, ...

Concrete Example: The OpenDS Weather Station

As I said at the beginning of this post, my initial goal was to craft a monitoring device for OpenDS. Now you have a good idea of how I dealt with the hardware part, but an image is worth a thousand words so here is a snap...

On the software front, well, being a software engineer by trade, that was the easy part so that's almost not fun and I won't go inot as much detail but here is a 10,000ft view:

  • data is collected in a matrix of hash tables.
  • each hash table represent a population of data points for a sampling period
  • an individual time thread pushes a fresh list of hash tables in the matrix so as to reset the counters for a new sampling period

So for example, if we want to track CPU utilization, we only need to keep one metric. The hash table will only have one key pair. Easy. Slightly overkill but easy. Now if you want to keep track of transactions response times, the hash table will keep the response time (in ms) as a key and the number of transactions that were processed in that particular response time as the associated value. Therefore, if you have within one sampling period, 10,000 operations processed with 6,000 in 0 ms, 3,999 in 1ms and 1 in 15 ms, your hashtable will only have 3 entries as follows: [ 0 => 6000; 1=>3999; 15=>1 ]

This allows for a dramatic compression of the data compared to having a single line with etime for each operation, which would result in 10,000 lines of about 100 bytes.

What's more is that this representation of the same information allows to easily compute the average, extract the maximum value and calculate the standard deviation.

All that said, the weather station is only sent the last of the samples, so it always shows the current state of the server. And as it turns out, it is very useful, I like it very much just the way it worked out.

 Well, I'm glad to close down the shop, it's 7:30pm .... another busy Saturday

Tuesday Sep 29, 2009

Tracking Down All Outliers From Your LDAP Servers ...

<script type="text/javascript"> var gaJsHost = (("https:" == document.location.protocol) ? "https://ssl." : "http://www."); document.write(unescape("%3Cscript src='" + gaJsHost + "google-analytics.com/ga.js' type='text/javascript'%3E%3C/script%3E")); </script> <script type="text/javascript"> try { var pageTracker = _gat._getTracker("UA-12162483-1"); pageTracker._trackPageview(); } catch(err) {}</script>

Rationale

 I was recently faced with the challenge to track down and eliminate outliers from a customer's traffic and I had to come up with some some of tool to help in diagnosing where these long response time transactions originated from. Not really rocket science -hardly anything IS rocket science, even rocket science isn't all that complicated, but I digress- yet nothing that I had in the tool box would quite serve the purpose. So I sat down and wrote a tool that would allow me to visually correlate events in real time. At least that was the idea.

Bird's Eye View

This little tool is only meant for investigations and we are working on delivering something better and more polished (code name Gualicho, shhhhhhh) for production monitoring. The tool I am describing in this article simply correlates the server throughput, peak etime, I/O, CPU, Network and Garbage Collection activity (for OpenDS). It is all presented in a sliding line metric, stacked on top of each other, making visual identification and correlation easy. Later on I will adapt the tool to work on DPS, since it is the other product I like to fine tune for my customers.

The Meat

When pointed to the access log and the GC log, here is the text output you get. There is one line per second that is displayed with the aggregated information collected from the access log and garbage collection as well as kstats for network, I/O, CPU.


If you looked at it closely, I represented the garbage collection in % which is somewhat unsual but after debating on how to make this metric available, I decided that all I was interested was a relative measure of the time spent in stop-the-world GC operations over the time the application itself is running. As I will show in the snapshot below, this is quite effective to spot correlations with high etimes in most cases. To generate this output in the GC log, all you have to do is add the following to your set of JAVA_ARGS for start-ds.java-args in /path/to/OpenDS/config/java.properties:

 -Xloggc:/data/OpenDS/logs/gc.log -XX:+PrintGCApplicationConcurrentTime -XX:+PrintGCApplicationStoppedTime

And then my GUI will show something like:


Don't hesitate to zoom in on this snapshot. The image is barely legible due to blog formatting constraints.

Excuse me if I have not waited 7+ days to take the snapshot for this article but I think this simple snap serves the purpose. You can see that most of the time we spend 2% of the time blocked in GC but sometimes we have spikes up to 8% and when this happens, even though it has little impact on the overall throughput over one second, the peak etime suddenly jumps to 50ms. I will describe in another article what we can do to mitigate this issue, I simply wanted to share this simple tool here since I think it can serve some of our expert community.

OpenDS on Acer Aspire One: Smashing!

<script type="text/javascript"> var gaJsHost = (("https:" == document.location.protocol) ? "https://ssl." : "http://www."); document.write(unescape("%3Cscript src='" + gaJsHost + "google-analytics.com/ga.js' type='text/javascript'%3E%3C/script%3E")); </script> <script type="text/javascript"> try { var pageTracker = _gat._getTracker("UA-12162483-1"); pageTracker._trackPageview(); } catch(err) {}</script>

Rationale

As far fetched as it may seem,  with the growing use of virtualization and cloud computing, the average image instance that LDAP authentication systems are having to run on look more like your average netbook than a supercomputer. With that in mind, I set out to find a reasonable netbook to test OpenDS on. I ended up with an Acer Aspire ONE with 1GB of RAM. Pretty slim on memory. Let's see what we can get out of that thing!

Bird's Eye View

In this rapid test I have done, I loaded OpenDS (2.1b1) with 5,000 entries (stock MakeLdif template delivered with it), hooked up the netbook to a closed GigE network and loaded it from a corei7 machine with searchrate. Result: 1,300+ searches per second. Not bad for a machine that only draws around 15 Watts!

The Meat 

As usual, some more details about the test but first a quick disclaimer: this is not a proper test or benchmark of the Atom as a platform, it is merely a kick in the tires. I have not measured other metrics than the throughput and only for a search workload at that. It is only to get a "feel" of it on such a lightweight sub-notebook.

In short:

  • Netbook: Acer Aspire One ZG5 - Atom N270 @1.6GHz, 1GB RAM, 100GB HDD
  • OS: OpenSolaris 2009.05
  • FS: ZFS
  • OpenDS: all stock, I did not even touch the JAVA options which I usually do
  • JAVA: 1.6 Update 13

The little guy in action, perfbar shows the CPU is all the way up there with little headroom...


Wednesday Apr 01, 2009

Setting DPS As Replication Hub - Part 1: a simple tut'

Rationale

    There may be cases where you would like to keep two environments up to date with the same data but there is no replication or synchronization solution that fit your particular needs. One example that comes to mind is to migrate away from a legacy LDAP (RACF, OiD, Sun DS 5...) to  OpenDS. After having initialized your new data store with the former data store contents, without synchronization mechanism, you would have to switch to the new data store right away. That would not quite be acceptable in production because for one thing, importing the data might take longer than the maintenance window, and more importantly, may something unexpected happen, all real-life deployments want to preserve the option of rolling back to the legacy system (that has proved to work in the past -even if performance or functionality could use a dust-off- ).

Enters DPS "replication" distribution algorithm. The idea is quite simple: route reads to a single data store, duplicate writes across all data stores. I use the term data store here because it needs not be LDAP only but any SQL data base that has a JDBC driver can be replicated to as well. For this tutorial though, I will use two LDAP stores. We will see a MySQL example in Part 2.

Bird's Eye View

    Unlike load balancing and fail over algorithm, which work across sources in a same pool, distribution algorithms work across data views. A distribution algorithm is a way to pick the appropriate data view among eligible data views to process a given client request. In this tutorial, I will show how the "replication" distribution algorithm allows to duplicate write traffic across two distinct data sources.

In the graph below, you can see how this is structured in DPS configuration.

The Meat

We will assume here that we have two existing LDAP servers running locally and serving the same suffix dc=example,dc=com:

  1. Store A: dsA on port 1389
  2. Store B: dsB on port 2389

Let's first go about the mundane task of setting up both stores in DPS:
    For Store A:

#dpconf create-ldap-data-source dsA localhost:1389
#dpconf create-ldap-data-source-pool poolA
#dpconf attach-ldap-data-source poolA dsA
#dpconf set-attached-ldap-data-source-prop poolA dsA add-weight:1 bind-weight:1 delete-weight:1 modify-weight:1 search-weight:1
#dpconf create-ldap-data-view viewA poolA dc=example,dc=com

    For Store B:

#dpconf create-ldap-data-source dsB localhost:2389
#dpconf create-ldap-data-source-pool poolB
#dpconf attach-ldap-data-source poolB dsB
#dpconf set-attached-ldap-data-source-prop poolB dsB add-weight:1 bind-weight:1 delete-weight:1 modify-weight:1 search-weight:1
#dpconf create-ldap-data-view viewB poolB dc=example,dc=com

    Now, the distribution algorithm must be set to replication on both data views:

#dpconf set-ldap-data-view-prop viewA distribution-algorithm:replication replication-role:master
#dpconf set-ldap-data-view-prop viewB distribution-algorithm:replication replication-role:master

  And finally, the catch:

    When using dpconf to set the replication-role property to master, it effectively writes distributionDataViewType as a single valued attribute in the data view configuration entry when in reality the schema allows it to be multi-valued. To see that for yourself, simply do:

#ldapsearch -p <your DPS port> -D "cn=proxy manager" -w password "(cn=viewA)"
version: 1
dn: cn=viewA,cn=data views,cn=config
dataSourcePool: poolA
viewBase: dc=example,dc=com
objectClass: top
objectClass: configEntry
objectClass: dataView
objectClass: ldapDataView
cn: viewA
viewAlternateSearchBase: ""
viewAlternateSearchBase: "dc=com"
distributionDataViewType: write
distributionAlgorithm: com.sun.directory.proxy.extensions.ReplicationDistributionAlgoritm


and then try to issue the following command:

#dpconf set-ldap-data-view-prop viewA replication-role+:consumer
The property "replication-role" cannot contain multiple values.
XXX exception-syntax-prop-add-val-invalid

...

...or just take my word for it. 

The issue is that in order for DPS to process read traffic (bind, search, etc...), one data view needs to be consumer but for the replication to work across data views, all of them must be master as well. That is why you will need to issue the following command on one (and one only) data view:

#ldapmodify -p <your DPS port> -D "cn=proxy manager" -w password
dn: cn=viewA,cn=data views,cn=config
changetype: modify
add: distributionDataViewType
distributionDataViewType: read

That's it!
Wasn't all that hard except it took some insider's knowledge, and now you have it.
Your search traffic will always go to Store A and all write traffic will get duplicated across Store A and B.

Caveats

Note that while this is very useful in a number of situations where nothing else will work, this should only be used for transitions as there are a number of caveats.
DPS does not store any historical information about traffic and therefore, in case of an outage of one of the underlying stores, contents may diverge on data stores. Especially when this mode is used where no synchronization solution can catch up after an outage.

Store A and Store B will end up out of synch if:

  • either store comes to be off-line
  • either store is unwilling to perform because the machine is  outpaced by traffic
<script type="text/freezescript"> var gaJsHost = (("https:" == document.location.protocol) ? "https://ssl." : "http://www."); document.write(unescape("%3Cscript src='" + gaJsHost + "google-analytics.com/ga.js' type='text/javascript'%3E%3C/script%3E")); </script> <script type="text/freezescript"> try { var pageTracker = _gat._getTracker("UA-12162483-1"); pageTracker._trackPageview(); } catch(err) {}</script>
About

Directory Services Tutorials, Utilities, Tips and Tricks

Search

Archives
« April 2014
SunMonTueWedThuFriSat
  
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
   
       
Today