Thursday May 21, 2009

extensible indexes in OpenDS



Extensible index is a new index introduced in OpenDS 2.0. Since there is not much of documentation available, I will explain here in detail. Before you dive into the content here, I will advise you to have a look at the wiki for collation matching rules.  A collation matching rule helps you in performing an internationalized search. As explained on wiki, you can perform equality, ordering (less than, less than equal to , greater than and greater than equal to) and substring-based searches using the matching rules. Another good thing about them is that you can index the matching rules for faster searches. I will also like to clarify that an extensible index is much more than a collation index; however, we have just collation-based indexes at the moment. For the sake of clarify I will use extensible index when I mean only a subset of it - collation indexes.  Interestingly, a collation index can contain all kinds of indexes ( except presence) I have blogged before. So you can configure to use an equality, substring or various kinds of ordering-based indexes. Before you get confused let us see how it works. Let us create an equality, substring and less-than-equal-to index for attribute "cn".


>>>> Configure the properties of the Local DB Index


        Property                        Value(s)


        -----------------------------------------------------------------------


    1)  attribute                       cn


    2)  index-entry-limit               4000


    3)  index-extensible-matching-rule  No extensible matching rules will be indexed.


    4)  index-type                      equality, extensible, ordering,presence, substring




    ?)  help


    f)  finish - apply any changes to the Local DB Index


    c)  cancel


    q)  quit




Enter choice [f]: 3


>>>> Configuring the "index-extensible-matching-rule" property


    The extensible matching rule in an extensible index.


    An extensible matching rule must be specified using either LOCALE or OID of the matching rule.


    Syntax:  LOCALE | OID - A Locale or an OID.




Do you want to modify the "index-extensible-matching-rule" property?


    1)  Keep the default behavior: No extensible matching rules will be indexed.


    2)  Add one or more values




    ?)  help


    q)  quit




Enter choice [1]: 2




Enter a value for the "index-extensible-matching-rule" property [continue]: en.eq




Enter another value for the "index-extensible-matching-rule" property


[continue]: en.sub


Enter another value for the "index-extensible-matching-rule" property


[continue]: en.lte


Enter another value for the "index-extensible-matching-rule" property


[continue]: 




>>>> Configuring the "index-extensible-matching-rule" property (Continued)


The "index-extensible-matching-rule" property has the following values:


    \*)  en.eq


    \*)  en.lte


    \*)  en.sub




Do you want to modify the "index-extensible-matching-rule" property?


    1)  Use these values


    2)  Add one or more values


    3)  Remove one or more values


    4)  Reset to the default behavior: No extensible matching rules will be


        indexed.


    5)  Revert changes




    ?)  help


    q)  quit




Enter choice [1]: 


Press RETURN to continue 


>>>> Configure the properties of the Local DB Index




        Property                        Value(s)


        -----------------------------------------------------------------------


    1)  attribute                       cn


    2)  index-entry-limit               4000


    3)  index-extensible-matching-rule  en.eq, en.lte, en.sub


    4)  index-type                      equality, extensible, ordering,


                                        presence, substring




    ?)  help


    f)  finish - apply any changes to the Local DB Index


    c)  cancel


    q)  quit




Enter choice [f]: 




The Local DB Index was modified successfully




Press RETURN to continue 




Rebuild the indexes because we already have some entries there with cn.


sin > bin/rebuild-index -b dc=example,dc=com -i cn


[21/May/2009:12:49:57 -0500] category=BACKEND severity=INFORMATION msgID=9437595 msg=Local DB backend ds-cfg-backend-id=userRoot,cn=Backends,cn=config does not specify the number of lock tables: defaulting to 53


[21/May/2009:12:49:57 -0500] category=BACKEND severity=INFORMATION msgID=9437594 msg=Local DB backend ds-cfg-backend-id=userRoot,cn=Backends,cn=config does not specify the number of cleaner threads: defaulting to 2 threads


[21/May/2009:12:49:57 -0500] category=JEB severity=NOTICE msgID=8847510 msg=Due to changes in the configuration, index dc_example_dc_com_cn is currently operating in a degraded state and must be rebuilt before it can be used


[21/May/2009:12:49:57 -0500] category=JEB severity=NOTICE msgID=8847497 msg=Rebuild of index(es) cn started with 3 total records to process


[21/May/2009:12:49:57 -0500] category=JEB severity=NOTICE msgID=8847493 msg=Rebuild complete. Processed 3 records in 0 seconds (average rate 115.4/sec)


See the list of all the databases including the extensible ones:


sin > bin/dbtest list-database-containers -n userRoot -b "dc=example,dc=com"


Database Name    Database Type  JE Database Name                   Entry Count


------------------------------------------------------------------------------


dn2id            DN2ID          dc_example_dc_com_dn2id            3


id2entry         ID2Entry       dc_example_dc_com_id2entry         3


referral         DN2URI         dc_example_dc_com_referral         0


id2children      Index          dc_example_dc_com_id2children      1


id2subtree       Index          dc_example_dc_com_id2subtree       1


state            State          dc_example_dc_com_state            23


aci.presence     Index          dc_example_dc_com_aci.presence     0


cn.equality      Index          dc_example_dc_com_cn.equality      2


cn.presence      Index          dc_example_dc_com_cn.presence      1


cn.substring     Index          dc_example_dc_com_cn.substring     7


cn.ordering      Index          dc_example_dc_com_cn.ordering      2


cn.en.shared     Index          dc_example_dc_com_cn.en.shared     2     ----> extensible index


cn.en.substring  Index          dc_example_dc_com_cn.en.substring  7   ---> extensible index




An extensible (only collation-based) index database is named as "attribute.locale.index_type". You may want to note that an equality or ordering index will created a "shared" index database (cn.en.shared) because the content is same in both.  Let us dump the content of each.


Using dbtest


sin > bin/dbtest dump-database-container -n userRoot -b "dc=example,dc=com" -d cn.en.shared


Indexed Value (10 bytes): STU


Entry ID List (8 bytes): 2 




Indexed Value (12 bytes): hfXe


Entry ID List (8 bytes): 3 






Total Records: 2


Total / Average Key Size: 24 bytes / 12 bytes


Total / Average Data Size: 16 bytes / 8 bytes


sin > bin/dbtest dump-database-container -n userRoot -b "dc=example,dc=com" -d cn.en.substring


Indexed Value (6 bytes): STU


Entry ID List (8 bytes): 2 




Indexed Value (4 bytes): TU


Entry ID List (8 bytes): 2 




Indexed Value (2 bytes): U


Entry ID List (8 bytes): 2 




Indexed Value (4 bytes): Xe


Entry ID List (8 bytes): 3 




Indexed Value (2 bytes): e


Entry ID List (8 bytes): 3 




Indexed Value (6 bytes): fXe


Entry ID List (8 bytes): 3 




Indexed Value (8 bytes): hfXe


Entry ID List (8 bytes): 3 






Total Records: 7


Total / Average Key Size: 34 bytes / 4 bytes


Total / Average Data Size: 56 bytes / 8 bytes


 


Using dbdump


sin > java com.sleepycat.je.util.DbDump -h db/userRoot/ -p -s dc_example_dc_com_cn.en.shared


VERSION=3


format=print


type=btree


dupsort=0


HEADER=END


 \\00S\\00T\\00U\\00\\00\\00\\00


 \\00\\00\\00\\00\\00\\00\\00\\02


 \\00h\\00f\\00X\\00e\\00\\00\\00\\00


 \\00\\00\\00\\00\\00\\00\\00\\03


DATA=END


 


sin > java com.sleepycat.je.util.DbDump -h db/userRoot/ -p -s dc_example_dc_com_cn.en.substring


VERSION=3


format=print


type=btree


dupsort=0


HEADER=END


 \\00S\\00T\\00U


 \\00\\00\\00\\00\\00\\00\\00\\02


 \\00T\\00U


 \\00\\00\\00\\00\\00\\00\\00\\02


 \\00U


 \\00\\00\\00\\00\\00\\00\\00\\02


 \\00X\\00e


 \\00\\00\\00\\00\\00\\00\\00\\03


 \\00e


 \\00\\00\\00\\00\\00\\00\\00\\03


 \\00f\\00X\\00e


 \\00\\00\\00\\00\\00\\00\\00\\03


 \\00h\\00f\\00X\\00e


 \\00\\00\\00\\00\\00\\00\\00\\03


DATA=END


 


If you look carefully you would find that the contents of an equality index and extensible equality indexes are different for same value of cn( i.e. 'user" ). Like a normal equality index, the extensible equality index also contains a normalized value as the key. However, the value is normalized according to Unicode standards (NFKC) which requires using java.text.Normalizer.


 

About

This is the blog of a software engineer, specialized in identity management. Kunal Sinha works in Directory Services Engineering (OpenDS) team from Austin,Texas.

Search

Archives
« April 2014
SunMonTueWedThuFriSat
  
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
   
       
Today
Bookmarks