OpenDS, Java, and Performance

I've seen a few questions, comments, and concerns recently about how OpenDS will perform. In particular, many of them have focused on the fact that OpenDS is written in Java. I wanted to take this opportunity to address them here.

If you've heard anything about Sun's Directory Server in the last several years, it's hard to miss the fact that performance and scalability are very important to us. We've been continually working on improving this in our current Sun Java System Directory Server, and it is one of the main goals of OpenDS to be even faster and more scalable. If we really thought that Java was going to significantly hold us back, then the project probably wouldn't have even gotten off the ground.

First, though, I should say that we haven't yet spent a lot of time on performance optimization, primarily because it's too early in the process. Since there's still a lot more code to be written, anything that we do to optimize performance now might get negated by changes that we make later when adding other features. We'd rather get things working first and then we'll see what we need to do to make it as fast as possible. However, I will say that we're definitely writing the code with performance and scalability in mind, and we have run a few performance tests that do show that we're in pretty good shape. For example, our modify performance is about twice as fast as the best that I've seen out of Directory Server 6. Our LDIF import code is also faster and more scalable. Search performance right now can range from a little slower to a lot faster, but we also know of a few things that can be improved to give us some significant boosts. I do find it a little hard to resist the temptation to jump into performance analysis, but at least it does allow development to progress more quickly.

So how does Java fit into all of this? In many ways, it does play a very big role because the JVM is what is actually executing the code. The performance of the JVM has improved dramatically over time, and it keeps getting better. This works very much in our favor, since switching to a faster JVM can give OpenDS a notable performance boost without changing a line of code. Further, new releases of Java can introduce new features and capabilities that we may be able to use to our advantage. For example, the Java 5 release added a number of libraries for better concurrency that allows us to reduce locking and improve performance and scalability, and improved upon the NIO features added in the 1.4 release for fast and scalable network communication. Back in the Java 1.1 or 1.2 days, we probably would have been crazy to attempt something like this and expect decent performance, but things have come a long way since then. Note that I am talking primarily about Sun's JVM implementation, since I don't have a lot of experience with those from other vendors. It may or may not be the case that other JVMs share the same level of performance, but Sun's implementation certainly shows what's possible.

Other things to consider with regard to Java and performance include:
  • Java is not an interpreted language. The HotSpot JIT compiler translates the bytecode into native machine code and then executes that. It is true that when the JVM is just started and the program initially starts running that it may be interpreted for a period of time, but once it has been compiled then it is running natively.

  • It is true that garbage collection does run in the background. However, if it's done right then this has a negligible impact on performance. In fact, in many some ways the memory management approach that Java uses can be faster than most native applications because it's optimized for this kind of usage and allocating memory for small temporary objects is very efficient. Further, there are several different garbage collector models that you can choose from based on your requirements.

  • Java is an extremely observable language. It's easy to see what's going on inside the JVM, and there are a plethora of tools that can be used to help analyze the performance of Java applications. There are interfaces that profilers can plug into, and it is also possible to use DTrace on Solaris 10 that can be used to examine code running in production. Similarly, we've even built a simple profiler into OpenDS itself that you can use to collect data about what's going on while the server is running (and without the need to restart). All of this makes it very easy to identify the portions of the code that can benefit most from performance optimization to help us know where to spend our time.

  • There are a lot of configurable options available for use with the JVM, and some of them can have a significant impact on performance. This can include settings that impact memory usage, garbage collection, native compilation, lock behavior, etc. This can provide a number of benefits for different kinds of systems without the need for us to write any special code to take advantage of them.

  • Java can also help us write code more quickly, since there are a number of IDEs that can help with rapid development, and we also don't need to spend as much time worrying about things like memory management and cross-platform compatibility.

However, there are also a number of improvements in the area of performance and scalability that are not directly related to our use of Java. We have made a number of architectural changes based on our experiences with the current Directory Server that allow us to avoid problems that we've encountered in the past. We have focused heavily on avoiding the need to acquire locks wherever possible, and to minimize the scope of those locks that are necessary. We have changed the way in which we store entries in the backend database to make them much more efficient to encode and decode so that it is still possible to get very good performance even when not using an entry cache, and we've changed our approach to entry caching to allow for different implementations that may be suited for different kinds of use cases. The algorithms used for interacting with indexes in the backend can allow us to be more efficient and reduce the number of entries that we need to look at in several cases. We've increased the parallelization in several areas like connection handling and backend processing to be faster and more scalable. As development continues, we'll add other improvements throughout the code.

So to summarize, the performance today is pretty impressive and we know that there are ways to make it better. Java is not going to be a performance inhibitor, and it's only going to keep getting better. You can get it now to try it out for yourself, or watch our development to see how we progress in the future. If you still have concerns, or if you have ideas about how we can make it better, then feel free to participate on our mailing lists.

Comments:

It's funny how this question arise when Java is used to write a serious application. It seems that some people still consider java as a toy langage.
Back in 1999, I remember playing with Java - it was 1.1.8 - and we found the it was two times slower than C to parse big HTML files, and it was quite surprising. It convinced me that Java was good enough to be use in, production system.
Now, I find it funny when I hear about people complaining about java slowiness. If you consider a java Ldap server, for instance, and specifically Apache Directory Server - because we have done some performances benchmark on it -, on a 3Ghz dual core Intel server, we were able to reach 900 searches request per second. If there is 5 or 10 implementations in the world which need this level of performance, it would be a surprise. However, as you said, using different JVM could produce different results :
  • JRockit = 900 req/s
  • IBM 5.0 = 830 req/s
  • Sun 1.5 = 550 req/s


  • Garbage collection : what I saw on many different application is that GC does use less than 1% of CPU. This is really a surprise too.

    And there is a point which is really important : a Ldap server, written in C or in Java, represent more than 100K lines of code (ADS is currently more than 200K, test included). With this amount of code, I really think that it's easier to optimize java code than C code, because you don't have to focus on pointers, debuggers are better, IDEs are better, and tools are really better.

    I totally agree with the analyze : Java is good enough, reliable enough and fast enough for a Ldap server. After all, do you know a lot of application server written in C ? me neither :)

    Posted by Emmanuel Lécharny on August 16, 2006 at 10:30 AM CDT #

    You are correct that people just need "good enough". However, there are certainly different levels of "good enough", and we do see a very real need for high performance. These fit into a few different categories:
    • Some environments do have a legitimate need for thousands or tens of thousands of individual operations per second under peak load.
    • Some directory-enabled applications issue several LDAP requests for a single discrete event (e.g., a single SiteMinder authentication can turn into ten LDAP operations). In cases like this, the number of concurrent requests needed can add up quickly.
    • Some environments have limited time windows during which they can complete potentially disruptive operations (e.g., lots of writes), or just need a lot of operations done quickly.
    • Some environments may not need tremendous throughput, but have very low response time requirements, and being able meet those also implies that the server is capable of high rates of operations.

    With what we've currently got in OpenDS, I've seen over 35,000 searches per second or 5,500 modifies per second on my Ultra 40 workstation (2 dual-core Opteron processors). There's still a lot of work to be done, and there will definitely be changes that impact performance, but hopefully our optimizations will win out over the additional processing that we'll need to add into the code path.

    Posted by Neil Wilson on August 17, 2006 at 06:11 AM CDT #

    Post a Comment:
    Comments are closed for this entry.
    About

    cn_equals_directory_manager

    Search

    Top Tags
    Categories
    Archives
    « April 2014
    SunMonTueWedThuFriSat
      
    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    21
    22
    23
    24
    25
    26
    27
    28
    29
    30
       
           
    Today