Wednesday Feb 04, 2009

New ASN1 library brings performance boost to OpenDS

One of my fellow OpenDS core team developers, Bo Li, has spent the last two or three months busily refactoring our ASN1 APIs. Tonight his work is finally complete and we have merged it onto the trunk in time for OpenDS 2.0.

It's never easy to refactor low-level APIs like these and nor is it, at first glance, very rewarding since you do not have a nice new shiny feature to play around with when your work is complete.

So what was the point in refactoring these APIs? What do they bring us? Well, before going into too many details, they say a picture speaks a thousand words, so here goes:

Average etime (ms) ASN1 vs Trunk

In the graph above the vertical axis represents the average time taken in milliseconds for a search operation to complete. The horizontal axis represents the size of the entry in kilobytes being returned to the user. We return the entire entry which, in the case of the 4 KB entry, contains around 80 attributes.

As you can see, the refactored ASN1 library (the red line) brings a significant performance improvement: the time taken for a search operation to complete has approximately halved when compared against the original ASN1 library (the blue line).

The old ASN1 libraries, whilst relatively easy to use, were not very efficient and, in particular, forced the developer to waste enormous amounts of memory allocating ASN1Element objects in order to construct complex ASN1 sequence structures. This inefficiency was further compounded during serialization which, for search results containing possibly many attribute values (the ASN1 "leaf" elements), was particular costly.

To see why, lets take a look at an LDAP search entry response message:

  LDAPMessage ::= SEQUENCE
  {
  . messageID           MessageID,
  . searchResEntry      [APPLICATION 4] SEQUENCE
  . {
  . . objectName        LDAPDN,
  . . attributes        SEQUENCE OF
  . . {
  . . . SEQUENCE
  . . . {
  . . . . type          AttributeDescription,
  . . . . vals          SET OF
  . . . . {
  . . . . . value       AttributeValue
  . . . . }
  . . . }
  . . }
  . },
  . controls            [0] Controls OPTIONAL
  }

The innermost elements are the individual attribute values contained by the entry. These represent the real content of the entry - a 4KB entry will contain 4KB of attribute values. As you can see in the illustration above, the attribute values are nested 5 levels deep in the LDAP search entry message structure.

The old ASN1 libraries would duplicate each ASN1 element within an ASN1 sequence two times during serialization: once to determine the element's content length and once to concatenate the content onto the element header. This meant that attribute values were duplicated 11 times (5x2+1).

A similar story applies when decoding LDAP messages using the old ASN1 libraries.

In addition, our principle back-end implementation also stores entries using an ASN1 encoded format similar to the LDAP search entry message structure. So in order to read a single entry from the back-end, decode it, and then serialize it would consume approximately 20 times the amount of memory required by the entry.

Our new ASN1 library uses a streaming based design and typically performs at most one copy during encoding and decoding. We have split encoding and decoding into two distinct interfaces:

  org.opends.server.protocols.asn1.ASN1Reader
  org.opends.server.protocols.asn1.ASN1Writer

A factory class, ASN1, is used to create new ASN1Readers and ASN1Writers from various underlying data sources and data sinks respectively.

Here's an example of how the ASN1Writer interface should be used:

  long msgID = 123; 
  Entry entry = ...;
  OutputStream stream = ...;

  ASN1Writer writer = ASN1.getWriter(stream);

  writer.writeStartSequence();
  {
    // LDAP message
    writer.writeInteger(msgID);
    writer.writeStartSequence();
    {
      // Search entry
      writer.writeOctetString(entry.getDN().toString());
      writer.writeStartSequence();
      for (Attribute attribute : ...)
      {
        // Attribute list
        writer.writeStartSequence();
        {
          // Attribute
          writer.writeOctetString(attribute.getNameWithOptions());
          writer.writeStartSet();
          for (AttributeValue value : attribute)
          {
            // Attribute value
            writer.writeOctetString(value.getValue());
          }
          writer.writeEndSet();
        }
        writer.writeEndSequence();
      }
      writer.writeEndSequence();
    }
    writer.writeEndSequence();
  }
  writer.writeEndSequence();

Decoding follows a similar pattern using the ASN1Reader interface.

Part of this work also required that we provided some simple APIs for manipulating byte arrays. We have implemented a similar design to the JDK CharSequence, String, and StringBuilder classes. Specifically we now have the following classes and interfaces defined in org.opends.server.types:

  • ByteSequence - this is similar to CharSequence and provides a read-only interface for manipulating sequences of bytes.
  • ByteString - this is similar to String and provides an immutable sequence of bytes.
  • ByteStringBuilder - this is similar to StringBuilder and provides methods for incrementally constructing a byte sequence.
  • ByteSequenceReader - this is similar to StringReader and provides methods for incrementally decoding content from a byte sequence.

I believe we now have a pretty solid low level API stack in OpenDS which should prove to be stable and capable of meeting all our requirements for the foreseeable future (famous last words!).

Please feel free to give us any feedback.

About

This is the blog of Matthew Swift, a developer working on OpenDS - Sun's open source next generation LDAP Directory Server.

Search

Categories
Archives
« April 2014
SunMonTueWedThuFriSat
  
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
   
       
Today