By arnaud on Apr 01, 2009
There may be cases where you would like to keep two environments up to date with the same data but there is no replication or synchronization solution that fit your particular needs. One example that comes to mind is to migrate away from a legacy LDAP (RACF, OiD, Sun DS 5...) to OpenDS. After having initialized your new data store with the former data store contents, without synchronization mechanism, you would have to switch to the new data store right away. That would not quite be acceptable in production because for one thing, importing the data might take longer than the maintenance window, and more importantly, may something unexpected happen, all real-life deployments want to preserve the option of rolling back to the legacy system (that has proved to work in the past -even if performance or functionality could use a dust-off- ).
Enters DPS "replication" distribution algorithm. The idea is quite simple: route reads to a single data store, duplicate writes across all data stores. I use the term data store here because it needs not be LDAP only but any SQL data base that has a JDBC driver can be replicated to as well. For this tutorial though, I will use two LDAP stores. We will see a MySQL example in Part 2.
Bird's Eye View
Unlike load balancing and fail over algorithm, which work across sources in a same pool, distribution algorithms work across data views. A distribution algorithm is a way to pick the appropriate data view among eligible data views to process a given client request. In this tutorial, I will show how the "replication" distribution algorithm allows to duplicate write traffic across two distinct data sources.
In the graph below, you can see how this is structured in DPS configuration.
We will assume here that we have two existing LDAP servers running locally and serving the same suffix dc=example,dc=com:
- Store A: dsA on port 1389
- Store B: dsB on port 2389
Let's first go about the mundane task of setting up both stores in DPS:
For Store A:
#dpconf create-ldap-data-source dsA localhost:1389
#dpconf create-ldap-data-source-pool poolA
#dpconf attach-ldap-data-source poolA dsA
#dpconf set-attached-ldap-data-source-prop poolA dsA add-weight:1 bind-weight:1 delete-weight:1 modify-weight:1 search-weight:1
#dpconf create-ldap-data-view viewA poolA dc=example,dc=com
For Store B:
#dpconf create-ldap-data-source dsB localhost:2389
#dpconf create-ldap-data-source-pool poolB
#dpconf attach-ldap-data-source poolB dsB
#dpconf set-attached-ldap-data-source-prop poolB dsB add-weight:1 bind-weight:1 delete-weight:1 modify-weight:1 search-weight:1
#dpconf create-ldap-data-view viewB poolB dc=example,dc=com
#dpconf set-ldap-data-view-prop viewA distribution-algorithm:replication replication-role:master
#dpconf set-ldap-data-view-prop viewB distribution-algorithm:replication replication-role:master
And finally, the catch:
When using dpconf to set the replication-role property to master, it effectively writes distributionDataViewType as a single valued attribute in the data view configuration entry when in reality the schema allows it to be multi-valued. To see that for yourself, simply do:
#ldapsearch -p <your DPS port> -D "cn=proxy manager" -w password "(cn=viewA)"
dn: cn=viewA,cn=data views,cn=config
and then try to issue the following command:
#dpconf set-ldap-data-view-prop viewA replication-role+:consumer
The property "replication-role" cannot contain multiple values.
...or just take my word for it.
The issue is that in order for DPS to process read traffic (bind, search, etc...), one data view needs to be consumer but for the replication to work across data views, all of them must be master as well. That is why you will need to issue the following command on one (and one only) data view:
#ldapmodify -p <your DPS port> -D "cn=proxy manager" -w password
dn: cn=viewA,cn=data views,cn=config
Wasn't all that hard except it took some insider's knowledge, and now you have it.
Your search traffic will always go to Store A and all write traffic will get duplicated across Store A and B.
Note that while this is very useful in a number of situations where
nothing else will work, this should only be used for transitions as
there are a number of caveats.
DPS does not store any historical information about traffic and therefore, in case of an outage of one of the underlying stores, contents may diverge on data stores. Especially when this mode is used where no synchronization solution can catch up after an outage.
Store A and Store B will end up out of synch if:
- either store comes to be off-line
- either store is unwilling to perform because the machine is outpaced by traffic