Wednesday Jan 30, 2008

A day with IBM DB2

It was fun working with DB2 after having already experienced Oracle and mySQL. It started with the installation of DB2 on a Solaris 10 SPARC machine. I was interested in running a command-line installation but sadly it doesn't configure the database for you. Installation is quite easy and I easily sailed past that. I desperately needed some reference to configure the db2. Nevertheless, a search on google brought me to It is a good site which explains the necessary steps like user creation etc to configure the db2 (You might want to be little cautious as it discusses an old release of db2). It also lets u know how to create a sample database.

Once the configuration was done, I wanted to test the sample database access using my java code. I set  the db2jcc4.jar ( found in the /opt/IBM/db2/V9.5/java) in the CLASSPATH and a sample application below worked.

 public static void main(String[] args) throws Exception

 String url = "jdbc:db2://HOST_NAME:50002/sample";
 Connection conn = DriverManager.getConnection(url, "ldasdb", "ldasdb";);
 Statement stmt = conn.createStatement() ;

  // Execute the query
  ResultSet rs = stmt.executeQuery( "SELECT \* FROM staff" ) ;

  // Loop through the result set
  while( )
       System.out.println( rs.getString(1) ) ;

   // Close the result set, statement and the connection
   rs.close() ;
   stmt.close() ;
   conn.close() ;

Nice! I am almost done and I need to use the existing SQLs to create a database and some tables. See how it goes below:

 1. go to the installation path and run db2 for a db2  prompt ( use ldasdb for this)
 2. create a database or use sample database mentioned in the page
 db2 => create database vdtest
 3. Connect to the database
 db2=> connect to vdtest

4. Create a table

Now my old SQL fails because  you need to mention in the primary key that it is not null. Well, I don't know if it was obvious or not :) but I did change the SQL to "UID       VARCHAR(15)     NOT  NULL PRIMARY KEY" from "UID       VARCHAR(15)      PRIMARY KEY" and the table is created.

5. Inserting records

Once again I hit a roadblock here. But it turned out to be the column names following the table name in the INSERT statement. Workaround was to take the column names from the INSERT.


Didn't have the time to investigate how to insert only for a few select columns...Saving it for some other time :)


Tuesday Jul 17, 2007

Maximize performance during bulk deletes in Opends

This article explains you how to gain performance improvement using the ds-cfg-backend-subtree-delete-batch-size configuration attribute. The default recommended value for this is 5000. However, this may vary from machine to machine. The default value seems to be yielding a reasonable output with a Java Heap Size of 512MB.

There is no defined formulae to achieve this value for the best performance gain. Rather, it should be mostly based on the the sleepycat utilities which help determining the DB Cache size, Java Heap Size Settings , number of records stored in different databases.

As per the default Opends setting , the assigned DB Cache Size is 10% of the JVM heap size. So if the JVM heap size for Opends is 512MB then the DB Cache may grow up to 53 MB.

In my setup, I imported nearly 100,000 entries to the opends similar to the follwing one:


objectclass: inetorgperson

cn: user#0

sn: common

After the import, following databases were populated:

dc_example_dc_com_dn2id : Record Count is 100001

dc_example_dc_com_id2entry:Record Count is 100001

dc_example_dc_com_referral: Record Count is 0

dc_example_dc_com_id2children: Record Count is 1

dc_example_dc_com_id2subtree:Record Count is 1

dc_example_dc_com_givenName.equality:Record Count is 0

dc_example_dc_com_givenName.presence:Record Count is 0

dc_example_dc_com_givenName.substring:Record Count is 0

dc_example_dc_com_objectClass.equality: Record Count is 2

dc_example_dc_com_member.equality:Record Count is 0

dc_example_dc_com_uid.equality:Record Count is 0

dc_example_dc_com_cn.equality:Record Count is 100000

dc_example_dc_com_cn.presence :Record Count is 1

dc_example_dc_com_cn.substring: Record Count is 212220

dc_example_dc_com_telephoneNumber.equality:Record Count is 0

dc_example_dc_com_telephoneNumber.presence:Record Count is 0 dc_example_dc_com_telephoneNumber.substring:Record Count is 0

dc_example_dc_com_sn.equality:Record Count is 1

dc_example_dc_com_sn.presence:Record Count is 1

dc_example_dc_com_sn.substring:Record Count is 6

dc_example_dc_com_ds-sync-hist.ordering:Record Count is 0

dc_example_dc_com_mail.equality:Record Count is 0

dc_example_dc_com_mail.presence:Record Count is 0

dc_example_dc_com_mail.substring:Record Count is 0

dc_example_dc_com_entryUUID.equality: Record Count is 100001

dc_example_dc_com_aci.presence:Record Count is 0

Total number of records in the databases is 612236 and the memory needed by locks is 612236\*142( Currenlty JE uses a 142 byte lock size for a record) =86937512 bytes = 82 MB. But it doesn't require 82MB of memory in one Go because Not all database locks are acquired together.

I won't go into the details of the lifespan of a database lock. I would rather do the performance comparions between two different batch sizes selected on the basis of looking at the cache misses and various other debug outputs from the berkeley db ( Thanks to Linda for taking so much of pain). The two tests were run with a batch size of 5K and 20K respectively. Whilst we saw almost marginal time difference between the two ( 5K was better), the 5K batch size exhibited a better memory footprint as most of the memory utilized in the deletes ( like lock information) gets released after a while. This provides more headroom compared to the other one for the future operations.


Fig 1: Graph of the cacheTotal, lockBytes, and adminBytes fields from the stats, which gives us a picture of how the cache is composed over time

Fig. 2: Graph of Cache Miss

The test run shows that both 5K and 20K runs have the same elapsed time. But JE's overall requirements for memory are falling in the 5K case. Probably  at the beginning of the deleteLeaf operations, JE has to pull a fair bit of the database into cache, in order to do the deletes. As the deletes progress, the btrees are pruned, and the overall size of the database is falling, so JE actually needs less cache for data. But the 20K batch case uses a bigger lump of memory for the locks, and that lump keeps JE's high water memory usage close to the limit of the 53Mb cache.

In the 5K batch case, the lock lumps are smaller, so JE's high water memory usage points are falling, which gives more margin. The 20K batch case has bigger cache miss spikes. The 5K batch case stays nice and even. (Though not really sure why the 5K batch has a bigger spike at the very beginning).

So since the elapsed time is the same, I'd go with the 5K batch.

My sincere thanks goes to Linda Lee because most of the statistics are based on the analysis of my test results by her.

Monday Jul 02, 2007

Implementing Opends Back-end using Sleepycat

Like Previous versions of directory servers, opends backend is built upon sleepycat database. However, unlike previous versions, the sleepycat database is purely Java-based. Sleepycat db provides the Database class which appears like a HashMap where both the keys and the values are represented using the DatabaseEntry objects. An Environment encompasses the databases and it's the unit for cache and transactions among the databases.

Fig. A Database has keys and values. A DatabaseEntry object represents both key and value. You may want to note that a DatbaseEntry object takes only the byte array.








  The three main classes of the Opends backend are:

  RootContainer - Represents a backend id. It encapsulates the JE Environment class.

  EntryContainer - Represents a base DN for the RootContainer.

  DatabaseContainer - Represents a particular database.

 As evident from the description, RootContainer contains 1..n EntryContainer objects.


  Opends creates a number of databases related to the actual data and the necessary indexes. DBDump utility from Sleepycat can be used to dump the content of a database. For ex,

  bash-2.03$ java -h ../db -p -s dc_example_dc_com_dn2id



 bash-2.03$ java -h ../db -p -s dc_example_dc_com_id2entry


 Opends Uses a 8-byte AtomicLong to create the entryId.

 An algorithm for the LDAP ADD operation ( Small details ignored)

 Connection Handler and then LDAPRequest Handler thread receives the request from the Client.
 The Add request is passed on to the pool of worker threads
 A worker thread performs the following operations in sequence ( some operations are omitted):
 Grab a READ Lock on the parent entry to ensure that the parent entry doesn't undergo any changes which this operation is in progress.
 Grab a WRITE Lock on the entry to be created.
 Search for the entry in the backend to make sure that it doesn't exist.
 Get the Backend instance for that entry dn. It is found by searching into the TreeMap maintained by the DirectoryServer class.
 Find the EntryContainer associated with the entry DN.
 Create a new Java DB Transaction Object
 Open dn2id database and verify that the key with the dn doesn't exist already.
 Read the EntryID of the parent from dn2id database.
 Get a new EntryID from the RootContainer for the new Entry.
 Insert the new Entry into dn2id database.
 Insert into Id2Entry database.
 Insert into Indexes
 Insert into id2children and id2subtree databases
 Commit/Rollback the transacation


You can find more information onSleepycat andOpends integration in this presentation

This is the blog of a software engineer, specialized in identity management. Kunal Sinha works in Directory Services Engineering (OpenDS) team from Austin,Texas.


« April 2014