substring index in OpenDS

A substring index has the largest footprint in terms of the database size. When a substring index is created on a particular value, a number of combinations are tried to generate the keys. For example, a value "user" will have the keys such as : "user", "ser", "er"  and  "r".  A substring filter (such as "u\*s\*e\*r" ) typically has 3 parts


subInitial  -- It is the value before the \* , i.e. "u"


subAny -- List of all the values between subInitial and subFinal separated by \*, i.e. ["s", "e"]


subFinal -- Final value after the \*,  i.e. "r"


You are free to select any combinations of the these components. For example, "u\*r", "\*r"  and "u\*" etc. However, you may want to use a filter intelligently so that it filters out most of the entries while building a list of EntryIDs. If you are familiar with index limit, you may be aware that if the size of the list crosses this mark, the indexing won't be used and it may be a costly search.  See how the database dump looks like for substring indexes:


using dbtest 


sin > bin/dbtest dump-database-container -n userRoot -b "dc=example,dc=com" -d cn.substring


Indexed Value (3 bytes): abc


Entry ID List (8 bytes): 2 




Indexed Value (2 bytes): bc


Entry ID List (8 bytes): 2 




Indexed Value (1 bytes): c


Entry ID List (8 bytes): 2 




Indexed Value (2 bytes): er


Entry ID List (8 bytes): 3 




Indexed Value (1 bytes): r


Entry ID List (8 bytes): 3 




Indexed Value (3 bytes): ser


Entry ID List (8 bytes): 3 




Indexed Value (4 bytes): user


Entry ID List (8 bytes): 3 


Total Records: 7


Total / Average Key Size: 17 bytes / 2 bytes


Total / Average Data Size: 56 bytes / 8 bytes


Using dbdump 


 sin > java com.sleepycat.je.util.DbDump -h db/userRoot/ -p -s dc_example_dc_com_cn.substring


VERSION=3


format=print


type=btree


dupsort=0


HEADER=END


 abc


 \\00\\00\\00\\00\\00\\00\\00\\02


 bc


 \\00\\00\\00\\00\\00\\00\\00\\02


 c


 \\00\\00\\00\\00\\00\\00\\00\\02


 er


 \\00\\00\\00\\00\\00\\00\\00\\03


 r


 \\00\\00\\00\\00\\00\\00\\00\\03


 ser


 \\00\\00\\00\\00\\00\\00\\00\\03


 user


 \\00\\00\\00\\00\\00\\00\\00\\03


DATA=END



Comments:

Post a Comment:
  • HTML Syntax: NOT allowed
About

This is the blog of a software engineer, specialized in identity management. Kunal Sinha works in Directory Services Engineering (OpenDS) team from Austin,Texas.

Search

Archives
« April 2014
SunMonTueWedThuFriSat
  
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
   
       
Today
Bookmarks