Wednesday Jun 10, 2009

Solving a disk outage situation during an online backup in DS5.2

This entry is an edited version of a case internal report for one of the interventions of our Premium Plus Software Support Team in Sun. Unlike the DS6.x online backup, which prevents regular check-pointings while the backup takes place, the 5.x online backup ( is written in such a way that it may require to be restarted over and over again if some of the transaction logs which existed when the backup started are swept out in between. This can cause a huge amount of transaction logs piled up and a too high time for backup overall. Consequences can go up to eating all the transaction logs partition space. This is one of such a case. The names of the machines have been changed to preserve the customer's identity.[Read More]

Tuesday Sep 16, 2008


CLEANRUV is one of those maintenance tasks one needs to do from time to time on 5.x master servers whenever replicaId changes are applied to a particular master replica inside a multimaster topology. No matter how long such task has been existing, I keep receiving questions about it, so I guess I should have written this entry 4 years ago but was too lazy about it. Anyhow, better late than never, here are the basic tips you need to consider before cleaning references for a particular old replicaId inside a multi-master topology.

[Read More]

Wednesday Apr 02, 2008

Long response times and spurious deadlock messages: checkpointing analysis required

A big part of customer live environments DS troubleshooting work is done on massive deployments which require analysis of many different parties that are ultimately interconnected, making it very difficult to split each troubleshooting experience into smaller parts to write small simple blog articles which could be used as miracle recipes. This time I am going to make an exception to the rule and not split a problem at all: quite the opposite, this article explains a complete troubleshooting experience from top to bottom, from the time when the initial customer definition of their concern was received, to the analysis which leads to a potentially very different problem, to the application and proposal of a remedy. This article touches many diverse DS related topics like static groups, referential integrity, verifying problem conditions in different replicas in the topology, memory and disk usage, and finally checkpointing.

[Read More]

Wednesday Mar 26, 2008

Using nsslapd-ioblocktimeout settings to avoid / minimize SSL potential hangs

This article explains how DS can be configured / protected against SSL hanging events and recover from them to avoid service outages.[Read More]

Set up of a comprehensive safe backup policy on a multi master topology

One of our customers wanted (or should I say needed) to improve their current backup policies inside their 5.2patch5 replicated topology. Before, a couple of db2ldif -r were executed on a weekly basis in one consumer of the topology. This was definitely not good both in terms of quantity (not enough) and quality (not spred uniformly). The analysis below was aimed to to identify whether binary copy was an option to take into account on their topology.[Read More]

Thursday Oct 04, 2007

Improving the I/O rates on a massive DS deployment

The following entry relates some tips to improve the I/O rates of a Directory Services massive deployment. This tips have been recently analyzed and succesfully put in place in one of the major Sun Accounts worldwide.
[Read More]

Sunday Jun 24, 2007

Side effect of bug 6310880 - replication loop (halt) with err=67

DS bug #6310880 has been fixed both in the DS5.2patch5 official release and in several Sun Support hotfixes built on top of 5.2patch4. Or, the fix only provides protection against newly created entries, it does not protect the entries which were already existing inside the DS database before the bug fix was put in place. This can lead to undesired results, among which potential err=67 replication halt events. This entry provides a detailed analysis on why this happens and how to safely protect from such potential undesired effects.

[Read More]

Thursday Jun 07, 2007

replck or how to fix a replication halt without re-initalization

The replck utility allows to check the health of a replication link and act on it in case it is required to fix a Sun DS replication halt/breakage. replck supports any replication link between any combination of DS5.x and DS6.x servers. replck can be run in two modes: "diagnose" mode (checks replication, identifying halts for the inspected link, but does not execute any repair action) and "repair" mode (checks replication and repairs any identified replication halt for the inspected link). Important to note is that the repair mode of replck may completely avoid costly topology re-initializations which are typically associated with long service outages.

[Read More]



« July 2016