About re-index and re-import speed expectations
By marcos on Sep 16, 2008
Here are some very light internals that can be made freely shareable to make up some reasonable expectations regarding the re-import and re-index speed comparisons between the different existing DS versions (DS5.x, DS6.x) and the forthcoming DS future. For those wanting to know more, feel free to contact your Sun Support Representative, as there is no room for any additional comment on this topic in a public blog like this one.
DS5.x Re-index model (db2index)
- 1 single thread that does it all:
for 1 to M (indexes to re-index)
vlv (if requested)
parentid (if requested)
ancestorid (if requested)
=> because of the above, re-index perfs are often below those of a re-import when number of indexes to re-index is bigger than 2 or 3 attributes.
DS5.x Import Model (ldif2db)
- 1 thread PRODUCER that does( among other things)
Read and parse LDIF
- 1 thread FOREMAN that does( among other things)
- 1 thread per index (N indexes => N threads)
In general, FOREMAN is the bottleneck, but sometimes PRODUCER is as well (for example, on flat trees, where foreman can run faster as ancestorid and parentid indexes are easier to compute). Another example where FOREMAN may not be the bottleneck is when there is a relevant presence of huge indexes (long values in substring and presence and equality), depending on LDIF and configuration.
DS6.x Re-index model (db2index)
The design is still the same as in DS5. This said, the performances of db_get alone and str2entry have been impacted with the additions of new features/fixes which allow to speed up searches and provide better password policy capabilities.
As a consequence, DS6.x re-index is in general slightly slower than DS5.x re-index. A bug exists in DS6.3 to address this concern: 6640806. While the bug can't be fixed in DS6.3, it will be fixed in a post-DS6.x forthcoming release using completely different code.
DS6.x Import Model (ldif2db)
Same model as for DS5.x, the good news is that DS6.x re-import is in general as fast as DS5.x for the same platform and LDIF input. A DS6.x re-import will only be slower than DS5.x re-import in the cases where the FOREMAN thread or the indexes are not the bottleneck, which is really the exception.
Because of all of the above, it is in general a good idea in DS6.x to re-import instead of re-index, specially if the number of indexes to be re-indexed is consequent.
The following comments are about a post-DS6 forthcoming release, not ready yet for external use:
Re-index will be changed to benefit from the same source code than re-import, which means parallel threads doing the re-index. This should give back impressive perfs for re-index.
Additionally, there will be N PRODUCER threads instead of 1, and that will speed up the imports in many cases.