Berkeley DB Java Edition: Why Database.preload() doesn't always help

A customer sent us a simple program and an environment with data. The program opened the environment (approx. 2GB) and scaned the records of one of the databases in primary key order. The records had been inserted in random (i.e. non-key-sequential order) order and this caused lots of random IO during the scan. The customer wanted to know how to make the scan go faster. We suggested using Database.preload() since that would sort the LSNs of all of the records in the database and then load the cache by reading the records in LSN order rather than key order. The customer's program set the cache size to a fixed size of 1200 * 1024 * 1024 bytes. Interestingly enough, the call to preload() made the overall time longer than just doing the scan and taking the hit from the random IO.

The reason is that preload() will stop when it has filled the cache. In this case, 1.2GB was not large enough to hold all of the records in the database. Once preload() had filled the cache, it returned a status of PreloadStatus.CACHE_FILLED after which the scan commenced. Whereas the preload had read the records in LSN order, the scan was reading the records in key order (effectively random LSN order). Since the cache had already been filled by preload(), any cache miss by the scan would cause something to be evicted from the cache, and if the evicted record had not already been used by the scan, the work done by preload() to load the cache for that record was wasted. So with too small a cache, some IO done by preload() was inevitably wasted, thereby causing lower throughput.

Increasing the cache size to a level where preload() could fill the cache resulted in a significant speed-up.

Comments:

Post a Comment:
  • HTML Syntax: NOT allowed
About

Anything related to Oracle NoSQL Database and/or Berkeley DB Java Edition.

Search

Categories
Archives
« April 2014
SunMonTueWedThuFriSat
  
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
   
       
Today