Wednesday Jun 10, 2009

DS6 not protected by default to avoid connections increase overtime

This entry is an edited version of a case internal report for one of the interventions of our Premium Plus Software Support Team in Sun. On a DS6.3 6-master topology, we detected via netstat that the number of opened ESTABLISHED connections was over 12000 in each replica. Additional analysis showed that 98% of those connections were unused, some of them had been opened 5-6 months ago. Here is the analysis of such event and the explanations on why such an event can happen. The names of the machines have been changed to preserve the customer's identity.[Read More]

Solving a disk outage situation during an online backup in DS5.2

This entry is an edited version of a case internal report for one of the interventions of our Premium Plus Software Support Team in Sun. Unlike the DS6.x online backup, which prevents regular check-pointings while the backup takes place, the 5.x online backup ( is written in such a way that it may require to be restarted over and over again if some of the transaction logs which existed when the backup started are swept out in between. This can cause a huge amount of transaction logs piled up and a too high time for backup overall. Consequences can go up to eating all the transaction logs partition space. This is one of such a case. The names of the machines have been changed to preserve the customer's identity.[Read More]

Thursday Nov 20, 2008

Improving DS performances on Sun CoolThreads servers (T1, T2, T2 Plus)

Directory Server installations on top of  Sun CoolThreads servers based on UltraSPARC T1, T2, or T2 Plus processors (Sun SPARC Enterprise T5240 Server Sun SPARC Enterprise T5120 Server, Sun Fire T2000 Server and so on) will require a little bit of basic tuning in order to take complete advantage of the full spectrum of processor cores and compute threads provided by these platforms and allow for optimum scaling. This post addresses such tuning.

Tuesday Sep 16, 2008

About re-index and re-import speed expectations

Before determining a particular issue around importing/indexing performances, it is key to make up a list of reasonable expectations that can help us moving forward with such analysis. For this, it will be extremely helpful to understand the re-import and re-index behaviors between the different existing DS versions -DS5.x, DS6.x - and also for the forthcoming DS future. This entry gives a small insight in the DS import internals to shed some light on a quite unknown topic. For those wanting to know more, feel free to contact your Sun Support Representative, as there is no room for more on this topic in a public blog like this one.

Debugging a low performance incident within a DPS6 instance

It is not unusual that the performance of Java Applications gets reduced over time due to inadequate JVM tuning. This is not an exception for the 100% java DPS6 product. The following entry allows to dig into any java process garbage collector related information and use that information to determine whether the over time performance drops could be related to garbage collection undesired/uncontrolled intrusions or not.

CLEANRUV is one of those maintenance tasks one needs to do from time to time on 5.x master servers whenever replicaId changes are applied to a particular master replica inside a multimaster topology. No matter how long such task has been existing, I keep receiving questions about it, so I guess I should have written this entry 4 years ago but was too lazy about it. Anyhow, better late than never, here are the basic tips you need to consider before cleaning references for a particular old replicaId inside a multi-master topology.

Wednesday Apr 02, 2008

Long response times and spurious deadlock messages: checkpointing analysis required

A big part of customer live environments DS troubleshooting work is done on massive deployments which require analysis of many different parties that are ultimately interconnected, making it very difficult to split each troubleshooting experience into smaller parts to write small simple blog articles which could be used as miracle recipes. This time I am going to make an exception to the rule and not split a problem at all: quite the opposite, this article explains a complete troubleshooting experience from top to bottom, from the time when the initial customer definition of their concern was received, to the analysis which leads to a potentially very different problem, to the application and proposal of a remedy. This article touches many diverse DS related topics like static groups, referential integrity, verifying problem conditions in different replicas in the topology, memory and disk usage, and finally checkpointing.

Wednesday Mar 26, 2008

Using nsslapd-ioblocktimeout settings to avoid / minimize SSL potential hangs

This article explains how DS can be configured / protected against SSL hanging events and recover from them to avoid service outages.[Read More]

Set up of a comprehensive safe backup policy on a multi master topology

One of our customers wanted (or should I say needed) to improve their current backup policies inside their 5.2patch5 replicated topology. Before, a couple of db2ldif -r were executed on a weekly basis in one consumer of the topology. This was definitely not good both in terms of quantity (not enough) and quality (not spred uniformly). The analysis below was aimed to to identify whether binary copy was an option to take into account on their topology.[Read More]

Thursday Oct 04, 2007

Improving the I/O rates on a massive DS deployment

The following entry relates some tips to improve the I/O rates of a Directory Services massive deployment. This tips have been recently analyzed and succesfully put in place in one of the major Sun Accounts worldwide.
Monday Aug 13, 2007

How to know if you are running a DS 5.2 Patch5 timebomb affected system and how to fix it

If you are running a DS 5.2 Patch5 instance, you may be running a time bombed daemon without knowing it (Bug 6587775). This entry explains how to check if that is the case, which implications that may have for your deployment  and what actions can be undertaken to repair the mess.

Caution when installing new server certificates in topologies running Solaris8 and/or Solaris9 ldap client machines

Important connectivity problems may suddenly appear in a secured LDAP-based Naming Services deployment after the expiration of the Directory Server certificates. Depending on how those certificates will be renewed and on the current versions of the Directory Server instances, the Solaris8 and Solaris9 clients may still be unable to successfully contact the Directory Servers even after a succesful deployment of the new certificates. This issue is very rarely documented, so I decided to write a blog entry about it.

Tuesday Jul 03, 2007

replcheck officially delivered inside Directory Services 6.1

The recently available Sun DS6.1 comes along with some great news from a troubleshooter's perspective: on one side, the release of replcheck, a user-friendly Java implementation of the old replck script; on the other side, the support of DS5.1 automatic migrations.

Sunday Jun 24, 2007

Side effect of bug 6310880 - replication loop (halt) with err=67

DS bug #6310880 has been fixed both in the DS5.2patch5 official release and in several Sun Support hotfixes built on top of 5.2patch4. Or, the fix only provides protection against newly created entries, it does not protect the entries which were already existing inside the DS database before the bug fix was put in place. This can lead to undesired results, among which potential err=67 replication halt events. This entry provides a detailed analysis on why this happens and how to safely protect from such potential undesired effects.

