February 1, 2010

Berkeley DB Java Edition Android/Google Maps Demo

Thanks to Chris Eastland at Nebula Software Systems for the screen shot of this cool Google Maps/Android app built on BDB JE. The location data is stored in a JE database running on the device.

LocationManager.png

Berkeley DB Java Edition Group Welcomes Sun

Finally(!) we can talk about the Sun acquisition. From the point of view of the Berkeley DB Java Edition group, this is really good news. For the past several years we've had a good relationship with many groups at Sun, but there was always the little issue of being in two different companies standing between us. The acquisition means that we can now work closer with those groups as well as new groups without worry about disclosing intellectual property on either side of the table. The level of collaboration between us should improve significantly.

This may seem like bad news for the non-Sun (Oracle) BDB JE users, but it's anything but bad. By enabling us to work more closely with our Sun colleagues -- colleagues who are pushing the JE envelope every day -- we will be able to make JE faster, better, stronger.

We're looking forward to moving forward. Welcome Sun!

January 13, 2010

Harmony/Dalvik IdentityHashMap Issue affecting Berkeley DB Java Edition on Android

While debugging a failing JE unit test issue on Android, I ran into an interesting bug with the Harmony implementation of IdentityHashMap on Dalvik (the Android runtime).

It seems that if you call IdentityHashMap.entrySet() you get back Map.Entry instances which do not actually allow mutation of the underlying IdentityHashMap. Down in the guts of the Checkpointer code we use IHM to create a mapping from DatabaseImpls (internal database handles) to "the highest level" in the INDirtyList (the list of Internal Nodes in memory). We use that mapping to determine how high in the tree we should flush during a "flushAll" checkpoint. This failure means that there could be potential issues with Internal Nodes not being flushed when we think they are (Android only).

While we can't get the Dalvik runtime fixed for a while (i.e. it will require a new release of Android), we can put in a workaround (the details still TBD) to our code. This will appear in the next release of JE 4.0.x.

November 24, 2009

Berkeley DB Java Edition: Handling Transactions in JE 4.0

In his blog, Jeff Alexander of Sun's project Aura describes how they've standardized on a calling convention for JE which provides uniform, yet flexible, transaction and exception handling.

November 9, 2009

Berkeley DB Java Edition 4.0 Released

It's official: Berkeley DB Java Edition 4.0 was released today.

This release has High Availability and Replication (HA), improved IO performance (especially on Linux/ext3), and a jconsole plugin which should make performance tuning quite a bit easier for JE users.

You can download it at: http://www.oracle.com/technology/software/products/berkeley-db/je/index.html

October 27, 2009

Berkeley DB Java Edition vs Windows 7 IO

In a previous post I raised the topic of IO problems on Windows 7 (fsync was resulting in "Incorrect Function" IOExceptions). This again showed up more recently in this OTN thread. Fortunately, the reporting user supplied us with a reproducible test case which allowed us to characterize the problem.

At this point I am reasonably certain that the problem has to do with a write() call being initiated on a RandomAccessFile when an fsync() is already in progress in another thread (i.e. a concurrent fsync and write on the same file, but not with the same file descriptors). JE routinely performs concurrent IO operations on a given file. In the particular test case, it is by virtue of the checkpointer initiating an fsync while the user application thread is writing.

It turns out that in ext3 we previously encountered a performance slowdown because that file system takes an exclusive mutex on the inode for any IO operation, and therefore an fsync will block reads and writes. JE 4.0 has a "fix" to this problem which is described here. While the 4.0 "write queue" work improves performance on file systems like ext3 which take exclusive mutexes on the inode for any IO operation, it also has the added benefit that JE no longer performs concurrent fsync and write operations.

That said, there seems to be a true bug in Windows 7 IO, if for no other reason than I observe corruption on sector boundaries in the log files which are produced by the test case (JE does no operations on sector boundaries).

October 15, 2009

Berkeley DB Java Edition: Why Database.preload() doesn't always help

A customer sent us a simple program and an environment with data. The program opened the environment (approx. 2GB) and scaned the records of one of the databases in primary key order. The records had been inserted in random (i.e. non-key-sequential order) order and this caused lots of random IO during the scan. The customer wanted to know how to make the scan go faster. We suggested using Database.preload() since that would sort the LSNs of all of the records in the database and then load the cache by reading the records in LSN order rather than key order. The customer's program set the cache size to a fixed size of 1200 * 1024 * 1024 bytes. Interestingly enough, the call to preload() made the overall time longer than just doing the scan and taking the hit from the random IO.

The reason is that preload() will stop when it has filled the cache. In this case, 1.2GB was not large enough to hold all of the records in the database. Once preload() had filled the cache, it returned a status of PreloadStatus.CACHE_FILLED after which the scan commenced. Whereas the preload had read the records in LSN order, the scan was reading the records in key order (effectively random LSN order). Since the cache had already been filled by preload(), any cache miss by the scan would cause something to be evicted from the cache, and if the evicted record had not already been used by the scan, the work done by preload() to load the cache for that record was wasted. So with too small a cache, some IO done by preload() was inevitably wasted, thereby causing lower throughput.

Increasing the cache size to a level where preload() could fill the cache resulted in a significant speed-up.

October 7, 2009

Berkeley DB Java Edition 4.0 on Android

I've just finished checking that JE 4.0 works properly on Android 1.6 (it does). But even better is that we'll be shipping a je-android.jar file with 4.0 when it's available ("by the end of CY2009"). This means that JE/Android users will no longer need to copy the JE sources into a project directory in order to modify the sources to replace references to javax.transaction.xa.* with references to stub classes. Instead, they will just need to copy the je-android.jar file into their project libs directory and they'll be ready for action. The HOWTO-Android.html is a lot smaller now.

Further, DPL now works on Android. That sure made life a lot easier when getting the demos for OOW ready.

All of this willmake the entire JE/Android experience a lot simpler.

October 6, 2009

Boston Big Data Summit

For those in the Boston area:

The Boston "Big Data Summit" will be holding its first meeting on Thursday, October 22nd 2009 at 6pm at the Emerging Enterprise Center at Foley Hoag in Waltham, MA.

The Boston area is home to a large number of companies involved in the collection, storage, analysis, data integration, data quality, master data management, and archival of "Big Data". If you are involved in any of these, then the meeting of the Boston "Big Data Summit" is something you should plan to attend. Save the date!

More info here.

October 2, 2009

Berkeley DB Java Edition High Availability Session at Oracle Open World

Here are all of the Berkeley DB presentations at Oracle Open World. I've highlighted the presentation that my colleague Sam Haradhvala will be giving on Berkeley DB Java Edition High Availability.

----------------------------------------------------------------------
Where to find Berkeley DB people & sessions at Oracle Open World
----------------------------------------------------------------------

Hello and welcome to the upcoming Oracle Open World 2009 in San Francisco. We're less than 10 days away and the excitement is palpable. If you are interested in hearing about Berkeley DB, learning more about how the Berkeley DB products can be integrated into your applications/appliances/devices, seeing exciting Berkeley DB customer use cases or speaking directly with one of the Berkeley DB product development engineers, you can find us here:

* Session ID#: S311365, Title: Oracle Berkeley DB: Lightning-Fast Key Value Storage Just Got Faster, Date: Sun, Oct. 11th, Time: 15:45 - 16:45, Venue: Hilton Hotel, Room: Golden Gate 1, Track: Oracle Develop: Database, Speaker: David Segleau (Director Product Management), including customer presentations from Lucas Vogel, Managing Partner at EndPoint Systems and Madhu Bhimaraju, Database Architecture Engineer at Verizon Wireless. We will be covering several of the new features in Berkeley DB 4.8 and discussing how Verizon Wireless uses Berkeley DB to provide services to an ever-growing customer base.

* Session ID#: S311364, Title: Oracle Berkeley DB Java Edition High Availability: Java Persistence at Network Speeds, Date: Sun, Oct 11th, Time: 14:30 - 15:30, Venue: Hilton Hotel, Room: Golden Gate 1, Track: Oracle Develop: Database, Speaker: Sam Haradhvala (Senior Engineer on the Berkeley DB Java Edition product). We will be discussing the new High Availability/Replication functionality that will soon be available in Berkeley DB Java Edition -- how it works, how it can improve your application performance, reliability and throughput, as well as common use cases and configurations.

* Berkeley DB Applications Demo/Lunch. Date: Mon, Oct. 12th, Meeting Time: 11:00 - 13:00, Venue: Hilton Hotel - Union Square Room 6, 4th Fl. Greg Burd (Senior Product Manager) and several Berkeley DB engineers will be showing several Berkeley DB application demos, discussing how they were implemented and how similar functionality can be part of your application. Join us for lunch, some interesting demos and a open question and answer session.

* The Berkeley DB Product booth in the Exhibition Hall in Moscone West. We're workstation W-035, under the Database Track, in the Embedded Database sub-area just like last year. The booth is open Monday and Tuesday from 10:30-6:30 and Wednesday from 9:15-5:15. We're always delighted to talk with existing users, potential users and anyone who is curious about our NoSQL embedded database libraries.