On CLEANRUV

CLEANRUV is a dangerous task to be run, if any single doubt about its usage, please contact Sun Backline Support for this. While this task does what it supposed to do, it is irreversible, that is why we really need to know exactly what we want to do when executing it.


The goal of this entry is not about inviting people to run it freely but rather to ensure that, would that be required, it will be run safely.

The best way to be sure is to take a snapshot in real time of the complete topology RUVs. This is achieved with the following search via ALL OF THE MASTERS of the topology:

./ldapsearch -p <port> -h $host -D "cn=Directory Manager" -w <password> -b "cn=config" "(objectclass=nsds 5replica))" nsds50ruv

Next, we need to locate the evil replicaId. That will be the one to CLEANRUV. Which one will it be? Tricky question, but in general, if you have 2 lines of nsds50ruv which show the same exact host:port couple, then one of them will need to be removed. As all nsds50ruv lines must have different host:port couples.

Which is the one to remove? In general, the one which will contain the oldest MAXCSN. for example if we have:

{ replica 5 ldap://foo:389 } 4a200000 4b200000
{ replica 6 ldap://foo:389 } 4b300000 4e200000


Then we will need to remove the line for replicaId 5, for as the CSNs are older (note that the max csn for replica 5 is even smaller than the min csn of replica6, this is typical in these scenarios.

Sometimes, you will get something like:

{ replica 5 ldap://foo:389 } 4a200000 4b200000
{ replica 6 ldap://foo:389 }


In this cases, it is hard to know which one to remove. It could be that 6 was older than 5 and never logged any csn. Or it could be that 5 is older than 6 but we have never seen a single change yet on 6 since it was established. On these scenarios, it is typically easy to detect which is the valid replicaId by either doing a explicit search for the current replicaId on host foo:

./ldapsearch -p <port> -h foo -D "cn=Directory Manager" -w <password> -b
"cn=config" "(objectclass=nsds 5replica))" nsds5replicaId


or by explicitely creating a dummy update locally on foo:389 and relaunching the RUV search on it to see which RUV line has been updated. That would be the valid one and we will need to get rid of the other one.

Finally, one additional check before applying CLEANRUV is to verify that the line that we want to clean is exactly the same one in all of the replicas inside the topology (consumers included). In the example above, if we want to get rid of:


{ replica 5 ldap://foo:389 } 4a200000 4b200000

Then it will be worth checking that all replicas (masters and consumers) have answered with exactly the same values for such replicaId 5, specially concerning the MAXCSN. That will give us the guarantee that all of the changes which were generated on that particular old replicaId were replicated to all replicas and known by all of them, so we can safely remove the references for such replicaId forever as there is no single pending message to be replicated for its context.

And finally, we have to apply CLEANRUV. Great, we are finishing. CLEANRUV is basically an ldapmodify op where you launch a particular task providing the replicaId we want to get rid of as a parameter. Example:


ldapmodify -h -p -D "cn=Directory Manager" -w
dn: cn=replica, cn="<replicationbase>", cn=mapping tree, cn=config
changetype: modify
replace: nsds5task
nsds5task: CLEANRUV <replicaid>


When we do this action, we need to run it on all of the masters. Moreover, we need to launch the ldapmodify as much in parallel as possible. This is to avoid cleaning one master and then going to a second one and during that time the second one re-infects the first master's ruv once again with the RUV line we just cleaned and (worse) starts trying to replicate changes on it again :-D. Disabling the agmts then executing the mods, then re-enabling them always works. If that can't be done, then trying to do them in parallel and monitoring the new RUVs to verify there has not been any re-infectation will also be ok. Provided that you have this into account, then it is all that matters.

Comments:

super cela vas nous aider à nous retirer une belle épine et le tout bien écrit en français

Posted by jean-pierre jouanny on September 29, 2008 at 03:17 AM PDT #

Post a Comment:
Comments are closed for this entry.
About

marcos

Search

Categories
Archives
« April 2014
SunMonTueWedThuFriSat
  
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
   
       
Today