[This is a repost of the 11/10/05 Sleepycat blog entry.]
We've been working a lot on performance improvements to JE and recently
we devised a nice way to speed up a couple of operations by an order of
magnitude or so. It's called Sorted LSN Tree Walking, and the basic
idea is that rather than walk the tree in key order, we walk it in Log
Sequence Number (LSN) order.
To do this, we walk the tree by
starting at the root, descend as far down as we can until we find a
node that is not in memory. Then we take all the LSNs of the
out-of-cache children and add them to an array. We keep descending down
all the various paths in the tree until we've accumulated all of the
LSN's of out-of-cache children. Then, we sort those LSNs, and for each
LSN in the sorted set, read in the node and accumulate all of the child
LSNs. Lather, rinse, repeat. We see all the nodes in the tree that way,
and while we're fetching them from disk, we do it in a sorted order so
that the disk head doesn't dance around in random access patterns.
We devised this scheme when a customer needed improved performance for Database.truncate() and Database.remove().
You might wonder why these two methods would need to scan all the nodes
in the B+Tree for a database. The answer is two fold. First, depending
on the arguments passed to the truncate/remove methods, in some cases
we need to return the count of entries to the caller. While this is an
important reason for needing to walk the tree, it's nevertheless
relatively easy to tell people that if they want a fast execution of
truncate() or remove() that they shouldn't expect an accurate return
for the count. The second reason is that we need to update the
Utilization Profile to correctly reflect the newly freed space. The
Utilization Profile is used by the cleaner to determine both which
files are going to give us the most bang for the buck when we clean
them as well as which records in the file are obsolete. If we remove a
database, then all the entries for that database become obsolete (and
thus more interesting to the cleaner). In order to update the
Utilization Profile, we have to look at each node in the database's
B+Tree. But it doesn't matter what order we walk that tree in. So an
LSN sorted order turns out to be very fast if the database is not in
cache. Not to mention that we don't have to worry about concurrency and
locking because the database is closed.
Now that we have developed this technology, it has other uses. One that we're thinking about is improving the Database.preload()
method. After all, the method doesn't make any guarantees about what
actual items it will load -- just that it will load up as much as it
can. Another possible use is DbDump. Presently this utility dumps in
key-sorted order and this means that a restore of the database using
DbLoad will be optimal (key order insertion). However, if a user is
willing to trade-off DbLoad speed for DbDump speed, it may make more
sense to use a SortedLSN based dump.
This technology will be built into the 2.1 release of JE, scheduled to be released in the first quarter of 2006.