Thursday May 21, 2009

extensible indexes in OpenDS



Extensible index is a new index introduced in OpenDS 2.0. Since there is not much of documentation available, I will explain here in detail. Before you dive into the content here, I will advise you to have a look at the wiki for collation matching rules.  A collation matching rule helps you in performing an internationalized search. As explained on wiki, you can perform equality, ordering (less than, less than equal to , greater than and greater than equal to) and substring-based searches using the matching rules. Another good thing about them is that you can index the matching rules for faster searches. I will also like to clarify that an extensible index is much more than a collation index; however, we have just collation-based indexes at the moment. For the sake of clarify I will use extensible index when I mean only a subset of it - collation indexes.  Interestingly, a collation index can contain all kinds of indexes ( except presence) I have blogged before. So you can configure to use an equality, substring or various kinds of ordering-based indexes. Before you get confused let us see how it works. Let us create an equality, substring and less-than-equal-to index for attribute "cn".


>>>> Configure the properties of the Local DB Index


        Property                        Value(s)


        -----------------------------------------------------------------------


    1)  attribute                       cn


    2)  index-entry-limit               4000


    3)  index-extensible-matching-rule  No extensible matching rules will be indexed.


    4)  index-type                      equality, extensible, ordering,presence, substring




    ?)  help


    f)  finish - apply any changes to the Local DB Index


    c)  cancel


    q)  quit




Enter choice [f]: 3


>>>> Configuring the "index-extensible-matching-rule" property


    The extensible matching rule in an extensible index.


    An extensible matching rule must be specified using either LOCALE or OID of the matching rule.


    Syntax:  LOCALE | OID - A Locale or an OID.




Do you want to modify the "index-extensible-matching-rule" property?


    1)  Keep the default behavior: No extensible matching rules will be indexed.


    2)  Add one or more values




    ?)  help


    q)  quit




Enter choice [1]: 2




Enter a value for the "index-extensible-matching-rule" property [continue]: en.eq




Enter another value for the "index-extensible-matching-rule" property


[continue]: en.sub


Enter another value for the "index-extensible-matching-rule" property


[continue]: en.lte


Enter another value for the "index-extensible-matching-rule" property


[continue]: 




>>>> Configuring the "index-extensible-matching-rule" property (Continued)


The "index-extensible-matching-rule" property has the following values:


    \*)  en.eq


    \*)  en.lte


    \*)  en.sub




Do you want to modify the "index-extensible-matching-rule" property?


    1)  Use these values


    2)  Add one or more values


    3)  Remove one or more values


    4)  Reset to the default behavior: No extensible matching rules will be


        indexed.


    5)  Revert changes




    ?)  help


    q)  quit




Enter choice [1]: 


Press RETURN to continue 


>>>> Configure the properties of the Local DB Index




        Property                        Value(s)


        -----------------------------------------------------------------------


    1)  attribute                       cn


    2)  index-entry-limit               4000


    3)  index-extensible-matching-rule  en.eq, en.lte, en.sub


    4)  index-type                      equality, extensible, ordering,


                                        presence, substring




    ?)  help


    f)  finish - apply any changes to the Local DB Index


    c)  cancel


    q)  quit




Enter choice [f]: 




The Local DB Index was modified successfully




Press RETURN to continue 




Rebuild the indexes because we already have some entries there with cn.


sin > bin/rebuild-index -b dc=example,dc=com -i cn


[21/May/2009:12:49:57 -0500] category=BACKEND severity=INFORMATION msgID=9437595 msg=Local DB backend ds-cfg-backend-id=userRoot,cn=Backends,cn=config does not specify the number of lock tables: defaulting to 53


[21/May/2009:12:49:57 -0500] category=BACKEND severity=INFORMATION msgID=9437594 msg=Local DB backend ds-cfg-backend-id=userRoot,cn=Backends,cn=config does not specify the number of cleaner threads: defaulting to 2 threads


[21/May/2009:12:49:57 -0500] category=JEB severity=NOTICE msgID=8847510 msg=Due to changes in the configuration, index dc_example_dc_com_cn is currently operating in a degraded state and must be rebuilt before it can be used


[21/May/2009:12:49:57 -0500] category=JEB severity=NOTICE msgID=8847497 msg=Rebuild of index(es) cn started with 3 total records to process


[21/May/2009:12:49:57 -0500] category=JEB severity=NOTICE msgID=8847493 msg=Rebuild complete. Processed 3 records in 0 seconds (average rate 115.4/sec)


See the list of all the databases including the extensible ones:


sin > bin/dbtest list-database-containers -n userRoot -b "dc=example,dc=com"


Database Name    Database Type  JE Database Name                   Entry Count


------------------------------------------------------------------------------


dn2id            DN2ID          dc_example_dc_com_dn2id            3


id2entry         ID2Entry       dc_example_dc_com_id2entry         3


referral         DN2URI         dc_example_dc_com_referral         0


id2children      Index          dc_example_dc_com_id2children      1


id2subtree       Index          dc_example_dc_com_id2subtree       1


state            State          dc_example_dc_com_state            23


aci.presence     Index          dc_example_dc_com_aci.presence     0


cn.equality      Index          dc_example_dc_com_cn.equality      2


cn.presence      Index          dc_example_dc_com_cn.presence      1


cn.substring     Index          dc_example_dc_com_cn.substring     7


cn.ordering      Index          dc_example_dc_com_cn.ordering      2


cn.en.shared     Index          dc_example_dc_com_cn.en.shared     2     ----> extensible index


cn.en.substring  Index          dc_example_dc_com_cn.en.substring  7   ---> extensible index




An extensible (only collation-based) index database is named as "attribute.locale.index_type". You may want to note that an equality or ordering index will created a "shared" index database (cn.en.shared) because the content is same in both.  Let us dump the content of each.


Using dbtest


sin > bin/dbtest dump-database-container -n userRoot -b "dc=example,dc=com" -d cn.en.shared


Indexed Value (10 bytes): STU


Entry ID List (8 bytes): 2 




Indexed Value (12 bytes): hfXe


Entry ID List (8 bytes): 3 






Total Records: 2


Total / Average Key Size: 24 bytes / 12 bytes


Total / Average Data Size: 16 bytes / 8 bytes


sin > bin/dbtest dump-database-container -n userRoot -b "dc=example,dc=com" -d cn.en.substring


Indexed Value (6 bytes): STU


Entry ID List (8 bytes): 2 




Indexed Value (4 bytes): TU


Entry ID List (8 bytes): 2 




Indexed Value (2 bytes): U


Entry ID List (8 bytes): 2 




Indexed Value (4 bytes): Xe


Entry ID List (8 bytes): 3 




Indexed Value (2 bytes): e


Entry ID List (8 bytes): 3 




Indexed Value (6 bytes): fXe


Entry ID List (8 bytes): 3 




Indexed Value (8 bytes): hfXe


Entry ID List (8 bytes): 3 






Total Records: 7


Total / Average Key Size: 34 bytes / 4 bytes


Total / Average Data Size: 56 bytes / 8 bytes


 


Using dbdump


sin > java com.sleepycat.je.util.DbDump -h db/userRoot/ -p -s dc_example_dc_com_cn.en.shared


VERSION=3


format=print


type=btree


dupsort=0


HEADER=END


 \\00S\\00T\\00U\\00\\00\\00\\00


 \\00\\00\\00\\00\\00\\00\\00\\02


 \\00h\\00f\\00X\\00e\\00\\00\\00\\00


 \\00\\00\\00\\00\\00\\00\\00\\03


DATA=END


 


sin > java com.sleepycat.je.util.DbDump -h db/userRoot/ -p -s dc_example_dc_com_cn.en.substring


VERSION=3


format=print


type=btree


dupsort=0


HEADER=END


 \\00S\\00T\\00U


 \\00\\00\\00\\00\\00\\00\\00\\02


 \\00T\\00U


 \\00\\00\\00\\00\\00\\00\\00\\02


 \\00U


 \\00\\00\\00\\00\\00\\00\\00\\02


 \\00X\\00e


 \\00\\00\\00\\00\\00\\00\\00\\03


 \\00e


 \\00\\00\\00\\00\\00\\00\\00\\03


 \\00f\\00X\\00e


 \\00\\00\\00\\00\\00\\00\\00\\03


 \\00h\\00f\\00X\\00e


 \\00\\00\\00\\00\\00\\00\\00\\03


DATA=END


 


If you look carefully you would find that the contents of an equality index and extensible equality indexes are different for same value of cn( i.e. 'user" ). Like a normal equality index, the extensible equality index also contains a normalized value as the key. However, the value is normalized according to Unicode standards (NFKC) which requires using java.text.Normalizer.


 

substring index in OpenDS

A substring index has the largest footprint in terms of the database size. When a substring index is created on a particular value, a number of combinations are tried to generate the keys. For example, a value "user" will have the keys such as : "user", "ser", "er"  and  "r".  A substring filter (such as "u\*s\*e\*r" ) typically has 3 parts


subInitial  -- It is the value before the \* , i.e. "u"


subAny -- List of all the values between subInitial and subFinal separated by \*, i.e. ["s", "e"]


subFinal -- Final value after the \*,  i.e. "r"


You are free to select any combinations of the these components. For example, "u\*r", "\*r"  and "u\*" etc. However, you may want to use a filter intelligently so that it filters out most of the entries while building a list of EntryIDs. If you are familiar with index limit, you may be aware that if the size of the list crosses this mark, the indexing won't be used and it may be a costly search.  See how the database dump looks like for substring indexes:


using dbtest 


sin > bin/dbtest dump-database-container -n userRoot -b "dc=example,dc=com" -d cn.substring


Indexed Value (3 bytes): abc


Entry ID List (8 bytes): 2 




Indexed Value (2 bytes): bc


Entry ID List (8 bytes): 2 




Indexed Value (1 bytes): c


Entry ID List (8 bytes): 2 




Indexed Value (2 bytes): er


Entry ID List (8 bytes): 3 




Indexed Value (1 bytes): r


Entry ID List (8 bytes): 3 




Indexed Value (3 bytes): ser


Entry ID List (8 bytes): 3 




Indexed Value (4 bytes): user


Entry ID List (8 bytes): 3 


Total Records: 7


Total / Average Key Size: 17 bytes / 2 bytes


Total / Average Data Size: 56 bytes / 8 bytes


Using dbdump 


 sin > java com.sleepycat.je.util.DbDump -h db/userRoot/ -p -s dc_example_dc_com_cn.substring


VERSION=3


format=print


type=btree


dupsort=0


HEADER=END


 abc


 \\00\\00\\00\\00\\00\\00\\00\\02


 bc


 \\00\\00\\00\\00\\00\\00\\00\\02


 c


 \\00\\00\\00\\00\\00\\00\\00\\02


 er


 \\00\\00\\00\\00\\00\\00\\00\\03


 r


 \\00\\00\\00\\00\\00\\00\\00\\03


 ser


 \\00\\00\\00\\00\\00\\00\\00\\03


 user


 \\00\\00\\00\\00\\00\\00\\00\\03


DATA=END



presence index in OpenDS

A presence index is used when the search filter uses an asterisk to ascertain that there is a value at the back-end for that attribute. For example, to search an existing entry cn=user,dc=example,dc=com if I use a filter "cn=\*", a presence index will be used behind-the-scene. Note that if you change the filter with a prefix or suffix (i.e. "cn=u\*", "cn=\*r" and  "cn="u\*r" etc) , it will rather use the substring index. 


While creating our entry, server will create a presence index key-value pair for all the attributes which have a presence index configured. Interestingly a presence index has "+" as the key  and the list of EntryIDs as the value. It means that each attribute will have only one record with a list of EntryIDs corresponding to "+" key. 


When you search for this entry using "cn=\*" filter, it figures out a presence index needs to be used and it fetches all values from the index database for "+" key ( since there is only 1 record). Once it retrieves set of EntryIDs from the database, the corresponding entry is grabbed from id2entry database and matched against filter prior to being returned to the client.


 As you see above, maintaining a presence index is quite costly both in terms of writing and reading. A substring index may be a better choice depending on the keys.


Dumping presence index using dbtest 


sin > bin/dbtest dump-database-container -n userRoot -b "dc=example,dc=com" -d cn.presenceIndexed Value (1 bytes): +


Entry ID List (16 bytes): 2 3 


Total Records: 1


Total / Average Key Size: 1 bytes / 1 bytes


Total / Average Data Size: 16 bytes / 16 bytes


Dumping presence index using dbdump 


 sin > java com.sleepycat.je.util.DbDump -h db/userRoot/ -p -s dc_example_dc_com_cn.presence


VERSION=3


format=print


type=btree


dupsort=0


HEADER=END


 +


 \\00\\00\\00\\00\\00\\00\\00\\02\\00\\00\\00\\00\\00\\00\\00\\03


DATA=END

equality index in OpenDS

An equality index is used when the search filter uses an equality match. For example, to search an existing entry cn=user,dc=example,dc=com if I use a filter "cn=user", an equality index will be used behind-the-scene. Let us see how an equality index works from its creation to the search. Assuming the first entry for this back-end is "dc=example,dc=com", the corresponding EntryID will be 1. If we create our entry next to this base dn, our entry with a dn: cn=user,dc=example,dc=com will have an EntryID as 2.


While creating our entry server will create an equality index key-value pair for all the attributes which have an equality index configured. Since attribute "cn" has it by default, the server will find its default equality matching rule and normalize the value "user" using that rule. This normalized value is converted into bytes and is available as a key. As rightly guessed, the value is the entryID which we just saw above i.e. 2. EntryID is generated prior to inserting anything into the database.


When you search for this entry using "cn=user" filter, it figures out an equality index needs to be used and searches the equality index database for a key with value "user". Once it retrieves any EntryID from the database, the actual entry is grabbed from id2entry database.


What if we add another entry which has a cn with a value = "user"? Will it overwrite our key in the equality index? No.. it doesn't. I was incorrect while saying that the value in the key-value pair is just an EntryID; it is actually a list of EntryIDs. If there are multiple EntryIDs corresponding to a key, all of them are concatenated. To find the number of EntryIDs, you can divide the length of concatenated byte array with 8. See an example:


I have following 3 entries in my userRoot backend:


dn: dc=example,dc=com
dn: cn=abc,dc=example,dc=com
dn: cn=user,dc=example,dc=com


Get the list of all the databases in this backend


sin > bin/dbtest list-database-containers -n userRoot


Database Name Database Type JE Database Name Entry Count
--------------------------------------------------------------------------------------
Base DN: dc=example,dc=com
dn2id DN2ID dc_example_dc_com_dn2id 3
id2entry ID2Entry dc_example_dc_com_id2entry 3
referral DN2URI dc_example_dc_com_referral 0
id2children Index dc_example_dc_com_id2children 1
id2subtree Index dc_example_dc_com_id2subtree 1
state State dc_example_dc_com_state 20
aci.presence Index dc_example_dc_com_aci.presence 0
cn.equality Index dc_example_dc_com_cn.equality 2
Dump the content of the equality index database


sin > bin/dbtest dump-database-container -n userRoot -b "dc=example,dc=com" -d cn.equality
Indexed Value (3 bytes): abc
Entry ID List (8 bytes): 2
Indexed Value (4 bytes): user
Entry ID List (8 bytes): 3


Using dbdump utility to see non-opends representation of index database contents


sin > export CLASSPATH=/Users/sin/opends/b2.0/opends/build/package/OpenDS-2.0.0/lib/OpenDS.jar:/Users/sin/opends/b2.0/opends/build/package/OpenDS-2.0.0/lib/je.jar:$CLASSPATH


sin > java com.sleepycat.je.util.DbDump -h db/userRoot/ -p -s dc_example_dc_com_cn.equality


VERSION=3
format=print
type=btree
dupsort=0
HEADER=END
abc
\\00\\00\\00\\00\\00\\00\\00\\02
user
\\00\\00\\00\\00\\00\\00\\00\\03
DATA=END


You can see the byte values of entryID against each of the keys.



Indexing in OpenDS

OpenDS wiki provides a good introduction to the index databases and their purposes.  I'll try to dig further into how a particular index database looks like at the sleepycat end. For those who are not familiar with sleepycat or Berkley DB, it is the embedded DB used by openDS. I will try to explain one index in each of my posts. If you are coming from the relational database background, you'd probably be aware of the benefits of the indexing. Typically an index is a B+ Tree where each node stores a key and a value pair.  


Before we begin with the index, it is a prerequisite to know about the EntryID. An EntryID is a long value which is unique for each entry inside an openDS database. While all of the indexes may have different keys based on types of indexes, all of them have the same values : EntryID. The idea behind an index is to search using a key and grab a list of EntryIDs. Further down, this list of EntryIDs is used to retrieve all the entries from another database called id2entry and the filter is applied on each one of those entries to see if they match. The matched ones are returned to the client.  OpenDS provides following indexes:


children index -- system index ( not used by users directly )


subtree index -- system index  (not used by users directly)


equality index -- user configurable


substring index --user configuration


ordering index  -- user configurable


presence index -- user configurable


extensible index -- user configurable


System indexes are used anyway. OpenDS needs to maintain them to build information related to scope of the entries. Since these are not attribute-based, users don't have any control over them. Rest of the indexes are available per attribute and hence available for configuration. For more information on how to create/modify the indexes above have a look at the openDS wiki.  


When you create an entry under a back-end, all the configured indexes are updated with the new entry information. For example, if you add an entry cn=user,dc=example,dc=com , equality indexes will have normalized value(s) of attribute "cn" as the key(s) and its EntryID (say 2), as the value. 



Wednesday Jan 28, 2009

Finding information about svn branches in openDS

This is a general tip about svn. Lately, a number of svn branches have been created for openDS development. Here is what you would do to get a general idea on how the OpenDS is organized:


sin > svn ls https://opends.dev.java.net/svn/opends
README
branches/
tags/
trunk/


sin > svn ls https://opends.dev.java.net/svn/opends/tags
README
build15/

sin > svn ls https://opends.dev.java.net/svn/opends/branches
README
b1.0.1/
b1.2/
core-refactoring/
data-provider-refactoring/
r1.0/
replication-service/


If you are interested in any particular branch, you can check it out. For example:


sin > svn checkout --username sin https://opends.dev.java.net/svn/opends/branches/data-provider-refactoring  opends
A    opends/opends



Tuesday Jan 27, 2009

Internationalized Searches in OpenDS

Those who are interested in running internationalized searches (collation rules) against OpenDS may want to note that this feature is now available in the Trunk. For those who are unaware of what I am talking about, I will explain what is a collation rule and how does it work with OpenDS. For more information, kindly look at : https://www.opends.org/wiki/page/I18NCollationRules.


What is a Collation rule:


Collation rule is a kinda matching rule which allows one to do the equality, substring and other order-based searches. Each collation rule is based upon a Locale. Going by a simple calculation, the number of collation rules = number of supported locales \* 6 ( equality,substring,less than, less than or equal to, greater than, greater than or equal to).


Note that the supported locales vary from one JVM implementation to the other. If you look at the config.ldif, you would notice that some of the locales have commented out:



dn: cn=Collation Matching Rule,cn=Matching Rules,cn=config
objectClass: top
objectClass: ds-cfg-matching-rule
objectClass: ds-cfg-collation-matching-rule
cn: Collation Matching Rule
ds-cfg-java-class: org.opends.server.schema.CollationMatchingRuleFactory
ds-cfg-enabled: true
ds-cfg-matching-rule-type: equality
ds-cfg-matching-rule-type: less-than
ds-cfg-matching-rule-type: less-than-or-equal-to
ds-cfg-matching-rule-type: greater-than
ds-cfg-matching-rule-type: greater-than-or-equal-to
ds-cfg-matching-rule-type: substring
#ds-cfg-collation: af:1.3.6.1.4.1.42.2.27.9.4.1.1
#ds-cfg-collation: am:1.3.6.1.4.1.42.2.27.9.4.2.1
ds-cfg-collation: ar:1.3.6.1.4.1.42.2.27.9.4.3.1
ds-cfg-collation: ar-AE:1.3.6.1.4.1.42.2.27.9.4.4.1

For example, the Locale "af" is not supported by the Sun's JVM. The above configuration shows that the Collation matching rules are enabled
and all 6 types of rules are requested by the administrator.

How it works?

OpenDS uses Java-based Collator APIs for supporting this functionality. See an example below:

//Creates a Collator for the specified Locale.
private Collator createCollator(Locale locale)
{
Collator collator = Collator.getInstance(locale);
collator.setStrength(Collator.PRIMARY);
collator.setDecomposition(Collator.FULL_DECOMPOSITION);
return collator;
}

A Collator object is obtained per Locale and it is used in the PRIMARY and FULL_DECOMPOSITION mode.

Usage
=====


Usage and Examples


As described above, an international search filter may contain
Locale OID (ex. 1.3.6.1.4.1.42.2.27.9.4.34.1), Locale name (en or
en-US) or an exact matching rule type with suffix ( ex.
1.3.6.1.4.1.42.2.27.9.4.34.1.3 or en.eq). Note that a Locale name or
Locale OID without any suffix is treated as an equality type. For
example, a filter such as "cn:en-US:=sanchez" will use an equality
matching rule type.



Some examples below demonstrate how to use this feature.


Equality Search



The following search uses a filter with the en (en-US) Locale OID to perform an equality search:


ldapsearch -D "cn=directory manager" -w password -b "o=test" "cn:1.3.6.1.4.1.42.2.27.9.4.34.1:=sanchez"


Above search should return the entry which has a cn value of SÃnchez.


Alternatively, you can use the following filters:



  • "cn:en:=sanchez"



  • "cn:en.3:=sanchez"



  • "cn:en.eq:=sanchez"



  • "cn:1.3.6.1.4.1.42.2.27.9.4.34.1.3:=sanchez"



Less Than Search



The following search uses a filter with the es (es-ES) Locale and performs a Less-than search:


ldapsearch -D "cn=directory manager" -w password -b "o=test"
"departmentnumber:1.3.6.1.4.1.42.2.27.9.4.49.1.1:=abc120"


Above search should return the entry which has a departmentnumber value as Ãbc119.


Alternatively, you can use the following filters:



  • "departmentnumber:es.1:=abc120"



  • "departmentnumber:es.lt:=abc120"



Less Than or Equal To Search



The following search uses a filter with the es (es-ES) Locale and performs a Less-than-or-equal-to search:


ldapsearch -D "cn=directory manager" -w password -b "o=test"
"departmentnumber:1.3.6.1.4.1.42.2.27.9.4.49.1.2:=abc119"


Above search should return the entry which has a departmentnumber value as Ãbc119.


Alternatively, you can use the following filters:



  • "departmentnumber:es.2:=abc119"



  • "departmentnumber:es.lte:=abc119"



Greater Than or Equal To Search



The following search uses a filter with the fr (fr-FR) Locale and performs a Greater-than-or-Equal-To search:


ldapsearch -D "cn=directory manager" -w password -b "o=test"
"departmentnumber:fr.4:=abc119"


Above search should return an entry which has a departmentnumber value as Ãbc119.


Alternatively, you can use the following filters:



  • "departmentnumber:1.3.6.1.4.1.42.2.27.9.4.76.1.4:=abc119"



  • "departmentnumber:fr.gte:=abc119"



Greater Than Search



The following search uses a filter with the fr (fr-FR) Locale and performs a Greater-than search:


ldapsearch -D "cn=directory manager" -w password -b "o=test"
"departmentnumber:fr.5:=abc119"


Above search should not return an entry which has a departmentnumber value as Ãbc119.


Alternatively, you can use the following filter:



  • "departmentnumber:1.3.6.1.4.1.42.2.27.9.4.76.1.5:=abc119"



  • "departmentnumber:fr.gt:=abc119"



Substring Search



The following search uses a filter with the en (en-US) Locale and performs a substring search:


ldapsearch -D "cn=directory manager" -w password -b "o=test" "sn:en.6:=\*u\*bec"


Above search should return an entry which as an sn value as Québec.


Alternatively, you can use the following filter:



  • "sn:1.3.6.1.4.1.42.2.27.9.4.34.1.6:=\*u\*bec"



  • "sn:en.sub:=\*u\*bec"



I will provide more information about the indexing in the next post.

Tuesday Jul 17, 2007

Maximize performance during bulk deletes in Opends

This article explains you how to gain performance improvement using the ds-cfg-backend-subtree-delete-batch-size configuration attribute. The default recommended value for this is 5000. However, this may vary from machine to machine. The default value seems to be yielding a reasonable output with a Java Heap Size of 512MB.

There is no defined formulae to achieve this value for the best performance gain. Rather, it should be mostly based on the the sleepycat utilities which help determining the DB Cache size, Java Heap Size Settings , number of records stored in different databases.

As per the default Opends setting , the assigned DB Cache Size is 10% of the JVM heap size. So if the JVM heap size for Opends is 512MB then the DB Cache may grow up to 53 MB.

In my setup, I imported nearly 100,000 entries to the opends similar to the follwing one:

dn:cn=user#0,dc=example,dc=com

objectclass: inetorgperson

cn: user#0

sn: common

After the import, following databases were populated:

dc_example_dc_com_dn2id : Record Count is 100001

dc_example_dc_com_id2entry:Record Count is 100001

dc_example_dc_com_referral: Record Count is 0

dc_example_dc_com_id2children: Record Count is 1

dc_example_dc_com_id2subtree:Record Count is 1

dc_example_dc_com_givenName.equality:Record Count is 0

dc_example_dc_com_givenName.presence:Record Count is 0

dc_example_dc_com_givenName.substring:Record Count is 0

dc_example_dc_com_objectClass.equality: Record Count is 2

dc_example_dc_com_member.equality:Record Count is 0

dc_example_dc_com_uid.equality:Record Count is 0

dc_example_dc_com_cn.equality:Record Count is 100000

dc_example_dc_com_cn.presence :Record Count is 1

dc_example_dc_com_cn.substring: Record Count is 212220

dc_example_dc_com_telephoneNumber.equality:Record Count is 0

dc_example_dc_com_telephoneNumber.presence:Record Count is 0 dc_example_dc_com_telephoneNumber.substring:Record Count is 0

dc_example_dc_com_sn.equality:Record Count is 1

dc_example_dc_com_sn.presence:Record Count is 1

dc_example_dc_com_sn.substring:Record Count is 6

dc_example_dc_com_ds-sync-hist.ordering:Record Count is 0

dc_example_dc_com_mail.equality:Record Count is 0

dc_example_dc_com_mail.presence:Record Count is 0

dc_example_dc_com_mail.substring:Record Count is 0

dc_example_dc_com_entryUUID.equality: Record Count is 100001

dc_example_dc_com_aci.presence:Record Count is 0

Total number of records in the databases is 612236 and the memory needed by locks is 612236\*142( Currenlty JE uses a 142 byte lock size for a record) =86937512 bytes = 82 MB. But it doesn't require 82MB of memory in one Go because Not all database locks are acquired together.

I won't go into the details of the lifespan of a database lock. I would rather do the performance comparions between two different batch sizes selected on the basis of looking at the cache misses and various other debug outputs from the berkeley db ( Thanks to Linda for taking so much of pain). The two tests were run with a batch size of 5K and 20K respectively. Whilst we saw almost marginal time difference between the two ( 5K was better), the 5K batch size exhibited a better memory footprint as most of the memory utilized in the deletes ( like lock information) gets released after a while. This provides more headroom compared to the other one for the future operations.

 

Fig 1: Graph of the cacheTotal, lockBytes, and adminBytes fields from the stats, which gives us a picture of how the cache is composed over time

Fig. 2: Graph of Cache Miss

The test run shows that both 5K and 20K runs have the same elapsed time. But JE's overall requirements for memory are falling in the 5K case. Probably  at the beginning of the deleteLeaf operations, JE has to pull a fair bit of the database into cache, in order to do the deletes. As the deletes progress, the btrees are pruned, and the overall size of the database is falling, so JE actually needs less cache for data. But the 20K batch case uses a bigger lump of memory for the locks, and that lump keeps JE's high water memory usage close to the limit of the 53Mb cache.

In the 5K batch case, the lock lumps are smaller, so JE's high water memory usage points are falling, which gives more margin. The 20K batch case has bigger cache miss spikes. The 5K batch case stays nice and even. (Though not really sure why the 5K batch has a bigger spike at the very beginning).

So since the elapsed time is the same, I'd go with the 5K batch.

My sincere thanks goes to Linda Lee because most of the statistics are based on the analysis of my test results by her.

Monday Jul 02, 2007

Implementing Opends Back-end using Sleepycat

Like Previous versions of directory servers, opends backend is built upon sleepycat database. However, unlike previous versions, the sleepycat database is purely Java-based. Sleepycat db provides the Database class which appears like a HashMap where both the keys and the values are represented using the DatabaseEntry objects. An Environment encompasses the databases and it's the unit for cache and transactions among the databases.

Fig. A Database has keys and values. A DatabaseEntry object represents both key and value. You may want to note that a DatbaseEntry object takes only the byte array.

 

 

 

 

 

 

 

  The three main classes of the Opends backend are:

  RootContainer - Represents a backend id. It encapsulates the JE Environment class.

  EntryContainer - Represents a base DN for the RootContainer.

  DatabaseContainer - Represents a particular database.

 As evident from the description, RootContainer contains 1..n EntryContainer objects.

  

  Opends creates a number of databases related to the actual data and the necessary indexes. DBDump utility from Sleepycat can be used to dump the content of a database. For ex,

  bash-2.03$ java com.sleepycat.je.util.DbDump -h ../db -p -s dc_example_dc_com_dn2id

 dc=example,dc=com
 \\00\\00\\00\\00\\00\\00\\00\\01
 cn=user#0,dc=example,dc=com
 \\00\\00\\00\\00\\00\\00\\00\\02
 cn=user#10,dc=example,dc=com

 

 bash-2.03$ java com.sleepycat.je.util.DbDump -h ../db -p -s dc_example_dc_com_id2entry

 \\00\\00\\00\\00\\00\\00\\00\\01
 \\01`h\\02\\01\\00\\04c\\01\\11dc=example,dc=com\\12organizationalUnit\\01ou\\00\\01\\06people\\01entryUUID\\00\\01$54c89c45-327b-3d38-8a2e-b7814de651a4
 \\00\\00\\00\\00\\00\\00\\00\\02
 \\01`x\\02\\01\\00\\04s\\01\\1bcn=user#0,dc=example,dc=com\\0dinetorgperson\\02sn\\00\\01\\06commoncn\\00\\01\\06user#0\\01entryUUID\\00\\01$f1ae0607-4fcf-3878-813b-fff5cc871b37

 Opends Uses a 8-byte AtomicLong to create the entryId.

 An algorithm for the LDAP ADD operation ( Small details ignored)

 Connection Handler and then LDAPRequest Handler thread receives the request from the Client.
 The Add request is passed on to the pool of worker threads
 A worker thread performs the following operations in sequence ( some operations are omitted):
 Grab a READ Lock on the parent entry to ensure that the parent entry doesn't undergo any changes which this operation is in progress.
 Grab a WRITE Lock on the entry to be created.
 Search for the entry in the backend to make sure that it doesn't exist.
 Get the Backend instance for that entry dn. It is found by searching into the TreeMap maintained by the DirectoryServer class.
 Find the EntryContainer associated with the entry DN.
 Create a new Java DB Transaction Object
 Open dn2id database and verify that the key with the dn doesn't exist already.
 Read the EntryID of the parent from dn2id database.
 Get a new EntryID from the RootContainer for the new Entry.
 Insert the new Entry into dn2id database.
 Insert into Id2Entry database.
 Insert into Indexes
 Insert into id2children and id2subtree databases
 Commit/Rollback the transacation
 Return

 

You can find more information onSleepycat andOpends integration in this presentation

Friday Jun 22, 2007

Austin downtown tour

Well..Austin isn't safe. A bunch of robbers robbing the Opends team in the broad daylight ;).
About

This is the blog of a software engineer, specialized in identity management. Kunal Sinha works in Directory Services Engineering (OpenDS) team from Austin,Texas.

Search

Archives
« March 2015
SunMonTueWedThuFriSat
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
    
       
Today
Bookmarks