By sin on May 21, 2009
A presence index is used when the search filter uses an asterisk to ascertain that there is a value at the back-end for that attribute. For example, to search an existing entry cn=user,dc=example,dc=com if I use a filter "cn=\*", a presence index will be used behind-the-scene. Note that if you change the filter with a prefix or suffix (i.e. "cn=u\*", "cn=\*r" and "cn="u\*r" etc) , it will rather use the substring index.
While creating our entry, server will create a presence index key-value pair for all the attributes which have a presence index configured. Interestingly a presence index has "+" as the key and the list of EntryIDs as the value. It means that each attribute will have only one record with a list of EntryIDs corresponding to "+" key.
When you search for this entry using "cn=\*" filter, it figures out a presence index needs to be used and it fetches all values from the index database for "+" key ( since there is only 1 record). Once it retrieves set of EntryIDs from the database, the corresponding entry is grabbed from id2entry database and matched against filter prior to being returned to the client.
As you see above, maintaining a presence index is quite costly both in terms of writing and reading. A substring index may be a better choice depending on the keys.
Dumping presence index using dbtest
sin > bin/dbtest dump-database-container -n userRoot -b "dc=example,dc=com" -d cn.presenceIndexed Value (1 bytes): +
Entry ID List (16 bytes): 2 3
Total Records: 1
Total / Average Key Size: 1 bytes / 1 bytes
Total / Average Data Size: 16 bytes / 16 bytes
Dumping presence index using dbdump
sin > java com.sleepycat.je.util.DbDump -h db/userRoot/ -p -s dc_example_dc_com_cn.presence