Scalin' Dictionaries 2: Electric BST

I've been working on the scalability properties of the dictionaries again. The last time, thanks to Jeff, we managed to get the dictionary to scale pretty well up to 32 threads and acceptably up to 64 threads by removing the synchronization bottlenecks on allocating lookup states during dictionary lookups and by using positional reads on the files containing the dictionaries.

Once we'd done this, I went back to collect/analyze and had a look at where the synchronization time was being spent during a run of our dictionary test program with 128 threads. I was surprised to see that a lot of synchronization time was being spent adding elements to the caches that we're mainitaining. This was a bit weird, because there's no explicit synchronization in the java.util.LinkedHashMap that we were using for the cache. I suspected that we were hitting an allocation problem with the objects that it allocated to put on the linked list that preserves the insertion order.

Aside from the synchronization problems, the main problem with the caches, in particular the entry-by-name cache, is that we're not getting that many hits on the cache during querying. In some of our query tests, the cache hit rate is about 10%, which means we're doing a lot of synchronization for not a lot of benefit.

So, I did away with both the entry-by-name cache and the name-by-position cache that we were using. The name-by-position cache actually was used: over time it was building up the top few layers of the binary search tree for the names in the dictionary. Unfortunately, this useful property was overwhelmed by the need to synchronize on the cache to fetch out the entry name at a given position while binary searching.

So I decided to add an honest-to-goodness binary search tree to the dictionary. This tree is populated when the dictionary is loaded, and since it never changes, it's completely thread safe. Because we took out the other caches and because the BST is so simple, we can afford to devote more entries to the BST. Every level that we add to the tree may save us a read from the underlying file, which is good, because the fastest read is the one that you don't have to make.

Each node in the BST stores the name at that position in the dictionary and the upper and lower extents of the search, so that once we've traversed the BST we know which chunk of the dictionary entries to continue the search in.

Here's the results for the new dictionary test, where we use a 1024 node BST (i.e., a tree 9 levels deep), along with a graph (note the log scale!) comparing this one to the old one:

# of threadsTotal Number of LookupsAverage Lookup Time (ms)
1485006.20
2855007.04
41520007.91
82640009.16
1645750010.63
3275650012.86
6481700023.86
12852500074.22

That gives us a speedup of about 1.8 times at 64 and 128 threads, and a slightly smaller speedup of 1.6 times with smaller numbers of threads.

The last time I did this, I got a comment asking why we were even hitting the disk when we had so much memory on this box. That's a good question, but I'm pretty sure that we are hitting the disk. If we copy the index into a RAM disk and run the dictionary test, here are the numbers and the corresponding graph:

# of threadsTotal Number of LookupsAverage Lookup Time (ms)
16850000.43
210720000.56
419270000.62
833875000.71
1638580001.24
3239335002.44
6439605004.86
12839925009.67

So, yeah, I'd say we're hitting the disk. That 9.67ms per lookup with 128 threads is pretty nice. That's about 123 times faster than original code was doing with a disk-based lookup.

While I was debugging my BST, I went ahead and modified the DictionaryFactory so that when you open a dictionary that is smaller than the specified cache size, we just return you a CachedDiskDictionary that loads the dictionary into a hash table when the dictionary is opened, since it all would have been cached eventually anyways.

Comments:

Post a Comment:
Comments are closed for this entry.
About

This is Stephen Green's blog. It's about the theory and practice of text search engines, with occasional forays into recommendation and other technologies that can use a good text search engine. Steve is the PI of the Information Retrieval and Machine Learning project in Oracle Labs.

Search

Archives
« April 2014
SunMonTueWedThuFriSat
  
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
   
       
Today