The truth about nsslapd-search-tune

The Directory Server has a configuration attribute named nsslapd-search-tune that isn't really addressed in our documentation, and it's not really all that well understood by a lot of the people that know about it. It can certainly be useful under the right circumstances, but it's not a panacea and probably shouldn't be used if it isn't needed. I'll try to clear things up here to help people better understand what this configuration attribute actually does and under what conditions it should (or should not) be used.

For the purposes of this discussion, two of the most important things that happen during the course of search processing in the Directory Server are:
  1. The server will use the indexes to put together a candidate list containing the IDs of all the entries that could match the filter. This phase was discussed in an earlier post.
  2. The server will retrieve each entry in the candidate list, check to see if it actually matches the search criteria, and if so return it to the client.

An astute observer might ask why we need the second of these steps if we have confidence in our indexing mechanism. The main reason is that the candidate list has the potential to contain entry IDs for entries that don't actually match the search filter. This could happen if the search filter contained at least one component that wasn't indexed and therefore we can't be sure that all of the entries identified in the candidate list match the criteria associated with that filter component. It could also occur if an optimization in our search filter processing caused the server to short-circuit out of the index processing portion before we had evaluated all of the filter components. There are also a few other special cases that can cause this to happen, but it ultimately all boils down to the fact that we can be sure that the candidate list contains all of the possible matches, but we can't necessarily be sure that all of the entries in the candidate list actually match the provided search criteria.

Most of the time, the fact that we do this secondary evaluation on each candidate entry doesn't really cause any problems. Determining whether a particular entry matches a given filter is generally a very fast operation and the benefits that it provides far outweigh any added cost that might be incurred. Of course, there's always an exception to the rule and in this case one exception happens to be an indexed equality filter component that targets an attribute with a very large number of values. This most commonly manifests itself in the form of static groups with lots of members (e.g., getting into at least the tens or hundreds of thousands of members). In these kinds of entries, there can be a notable performance hit that results from that secondary evaluation, and it can be the case that skipping it can make things significantly faster. I should point out that it's especially evident for large static groups because not only are there large numbers of values, but processing them may require a lot of DN normalization which can become expensive.

This is where the nsslapd-search-tune configuration attribute comes into play. It can be used to change the way that the server processes search operations such that if it is possible to ensure that all of the filter components were indexed, and if we are confident that the candidate list doesn't contain any entries that don't actually match the criteria, then we can skip the potentially costly secondary evaluation. For example, consider the following very simple search filter:

This is a very common type of search that might be used to identify the set of static groups in which the user uid=john.doe,ou=People,dc=example,dc=com is a member. In this type of filter (unless that user is or has been a member of more than ALLIDs groups), the candidate list should be exactly the set of entries that match that filter, and the process of actually checking each of those entries against the filter doesn't add any additional value.

Similarly, let's take this slightly more complicated filter:
(&(cn=My Group)(member=uid=john.doe,ou=People,dc=example,dc=com))

This is a fairly common way for clients to determine if the user john.doe,ou=People,dc=example,dc=com is a member of the group named "My Group". Even though it's a logical AND of multiple components, if we know that indexes were used to obtain the ID lists for each of those components, then their intersection should exactly equal the ID list for entries that match the entire filter. However, in this case there's more that could happen to make it infeasible to rely purely on the candidate list. One case is that one of the components could be unindexed or have hit the ALLIDs threshold, in which case the resulting candidate list would only take into account the other filter component (meaning that it could include IDs for entries that don't match the other component). Another could be an optimization in our filter processing code that causes the index evaluation to stop after the first component (meaning that the candidate list could include IDs for entries that don't match the second).

The nsslapd-search-tune configuration attribute can be used to allow the server to skip this secondary filter evaluation and just rely on the candidate list when it's possible to ensure that it doesn't contain any non-matching entries. If you do wish to use it, then it should go in the "cn=config,cn=ldbm database,cn=plugins,cn=config" entry. Its value is an integer that is interpreted as a bitmap, meaning that different bits are taken to mean different things. The most important bits are:
  • 1 -- This controls whether the feature will actually be enabled. If it is not set, then there will always be a secondary check against the filter before returning each entry. If it is set, then this secondary check can be skipped under the right conditions.
  • 8 -- This controls whether to allow skipping the secondary filter test even if an attribute used in the filter is also included in the values returned to the client. More on this later.
  • 16 -- This controls whether to allow skipping the secondary filter test for compound filters (i.e., those containing AND or OR components). If it is not set, then only simple equality filters will be candidates for skipping the check.
  • 32 -- This controls whether to ensure that index processing is always applied to all components in a compound AND filter. If it is set, then it can disable certain optimizations in the index processing code that could allow short-circuiting out of the index evaluation, which may degrade performance in some areas but will increase the conditions under which it is possible to bypass the secondary filter check.

As I mentioned above, the value is a bitmap, which means that you can just add the values of each of the components together to get what you want. For example, if you want to enable all of these options, you would use a value of 57 (1+8+16+32=57). In most deployments, however, it may be recommended to leave out the "8" option, leaving you with a value of 49 (1+16+32=49).

So what's the problem with the "8" option? It has to do with a very tiny race condition that can arise if the secondary filter test is skipped. As noted above, normal search processing is done by first building the candidate list, then retrieving each entry, checking it against the filter, and sending it to the client if it matches. It is possible that in the split-second between the time that the candidate list is constructed and an entry on that list is retrieved and returned to the client, that entry could be modified so that it no longer matches the filter criteria. With the secondary filter check in place, the server would see that the entry no longer matches and wouldn't send it to the client, but if that secondary test is skipped then it is possible that the server could return an entry that no longer matches. In most cases, this isn't a problem since the entry did match the filter at the time that the server started processing the search, but it does have the potential to cause a problem if the client does try to do some validation on the entries that it gets back and "freaks out" in some way if it finds one that doesn't match. The chances of the required conditions are extremely small, and the chance that the client will notice and care about it are even smaller. Nevertheless, if you're concerned about it and want to avoid that possibility, then you can just leave option "8" out of the bitmap.

It's important to note that the nsslapd-search-tune configuration attribute has been around in one form or another for quite some time (it was around in a more limited form in the 4.x server), but there were code changes required to make it "safer" to use (i.e., to reduce the chance of false positives). If you're running the 5.1 version of the server, then those fixes did not integrate until the 5.1SP4 release. If you're running the 5.2 version of the server, then they did not integrate until the 5.2patch2 release. If you're running something older than that, then you really should consider upgrading for a number of reasons, but until you do it's probably best to avoid using nsslapd-search-tune.

It's also important to note that this configuration attribute is not a panacea, and it can slow things down in some cases because of the search optimizations that it can disable. I would advise against using it unless you are seeing a performance problem in dealing with entries containing an attribute with a large number of values (like big static groups). If you're not sure if a particular use case might benefit from nsslapd-search-tune, then open a support case and we can help you figure that out, and we may be able to provide other tuning recommendations that may also help.


One thing you may want to mention is that searches that utilize the search-tune parameter will display notes=F in the access log.

Posted by frank fossa on August 18, 2006 at 12:50 AM CDT #

Post a Comment:
Comments are closed for this entry.



Top Tags
« July 2016