November 9, 2009

Berkeley DB Java Edition 4.0 Released

It's official: Berkeley DB Java Edition 4.0 was released today.

This release has High Availability and Replication (HA), improved IO performance (especially on Linux/ext3), and a jconsole plugin which should make performance tuning quite a bit easier for JE users.

You can download it at: http://www.oracle.com/technology/software/products/berkeley-db/je/index.html

October 27, 2009

Berkeley DB Java Edition vs Windows 7 IO

In a previous post I raised the topic of IO problems on Windows 7 (fsync was resulting in "Incorrect Function" IOExceptions). This again showed up more recently in this OTN thread. Fortunately, the reporting user supplied us with a reproducible test case which allowed us to characterize the problem.

At this point I am reasonably certain that the problem has to do with a write() call being initiated on a RandomAccessFile when an fsync() is already in progress in another thread (i.e. a concurrent fsync and write on the same file, but not with the same file descriptors). JE routinely performs concurrent IO operations on a given file. In the particular test case, it is by virtue of the checkpointer initiating an fsync while the user application thread is writing.

It turns out that in ext3 we previously encountered a performance slowdown because that file system takes an exclusive mutex on the inode for any IO operation, and therefore an fsync will block reads and writes. JE 4.0 has a "fix" to this problem which is described here. While the 4.0 "write queue" work improves performance on file systems like ext3 which take exclusive mutexes on the inode for any IO operation, it also has the added benefit that JE no longer performs concurrent fsync and write operations.

That said, there seems to be a true bug in Windows 7 IO, if for no other reason than I observe corruption on sector boundaries in the log files which are produced by the test case (JE does no operations on sector boundaries).

October 15, 2009

Berkeley DB Java Edition: Why Database.preload() doesn't always help

A customer sent us a simple program and an environment with data. The program opened the environment (approx. 2GB) and scaned the records of one of the databases in primary key order. The records had been inserted in random (i.e. non-key-sequential order) order and this caused lots of random IO during the scan. The customer wanted to know how to make the scan go faster. We suggested using Database.preload() since that would sort the LSNs of all of the records in the database and then load the cache by reading the records in LSN order rather than key order. The customer's program set the cache size to a fixed size of 1200 * 1024 * 1024 bytes. Interestingly enough, the call to preload() made the overall time longer than just doing the scan and taking the hit from the random IO.

The reason is that preload() will stop when it has filled the cache. In this case, 1.2GB was not large enough to hold all of the records in the database. Once preload() had filled the cache, it returned a status of PreloadStatus.CACHE_FILLED after which the scan commenced. Whereas the preload had read the records in LSN order, the scan was reading the records in key order (effectively random LSN order). Since the cache had already been filled by preload(), any cache miss by the scan would cause something to be evicted from the cache, and if the evicted record had not already been used by the scan, the work done by preload() to load the cache for that record was wasted. So with too small a cache, some IO done by preload() was inevitably wasted, thereby causing lower throughput.

Increasing the cache size to a level where preload() could fill the cache resulted in a significant speed-up.

October 7, 2009

Berkeley DB Java Edition 4.0 on Android

I've just finished checking that JE 4.0 works properly on Android 1.6 (it does). But even better is that we'll be shipping a je-android.jar file with 4.0 when it's available ("by the end of CY2009"). This means that JE/Android users will no longer need to copy the JE sources into a project directory in order to modify the sources to replace references to javax.transaction.xa.* with references to stub classes. Instead, they will just need to copy the je-android.jar file into their project libs directory and they'll be ready for action. The HOWTO-Android.html is a lot smaller now.

Further, DPL now works on Android. That sure made life a lot easier when getting the demos for OOW ready.

All of this willmake the entire JE/Android experience a lot simpler.

October 6, 2009

Boston Big Data Summit

For those in the Boston area:

The Boston "Big Data Summit" will be holding its first meeting on Thursday, October 22nd 2009 at 6pm at the Emerging Enterprise Center at Foley Hoag in Waltham, MA.

The Boston area is home to a large number of companies involved in the collection, storage, analysis, data integration, data quality, master data management, and archival of "Big Data". If you are involved in any of these, then the meeting of the Boston "Big Data Summit" is something you should plan to attend. Save the date!

More info here.

October 2, 2009

Berkeley DB Java Edition High Availability Session at Oracle Open World

Here are all of the Berkeley DB presentations at Oracle Open World. I've highlighted the presentation that my colleague Sam Haradhvala will be giving on Berkeley DB Java Edition High Availability.

----------------------------------------------------------------------
Where to find Berkeley DB people & sessions at Oracle Open World
----------------------------------------------------------------------

Hello and welcome to the upcoming Oracle Open World 2009 in San Francisco. We're less than 10 days away and the excitement is palpable. If you are interested in hearing about Berkeley DB, learning more about how the Berkeley DB products can be integrated into your applications/appliances/devices, seeing exciting Berkeley DB customer use cases or speaking directly with one of the Berkeley DB product development engineers, you can find us here:

* Session ID#: S311365, Title: Oracle Berkeley DB: Lightning-Fast Key Value Storage Just Got Faster, Date: Sun, Oct. 11th, Time: 15:45 - 16:45, Venue: Hilton Hotel, Room: Golden Gate 1, Track: Oracle Develop: Database, Speaker: David Segleau (Director Product Management), including customer presentations from Lucas Vogel, Managing Partner at EndPoint Systems and Madhu Bhimaraju, Database Architecture Engineer at Verizon Wireless. We will be covering several of the new features in Berkeley DB 4.8 and discussing how Verizon Wireless uses Berkeley DB to provide services to an ever-growing customer base.

* Session ID#: S311364, Title: Oracle Berkeley DB Java Edition High Availability: Java Persistence at Network Speeds, Date: Sun, Oct 11th, Time: 14:30 - 15:30, Venue: Hilton Hotel, Room: Golden Gate 1, Track: Oracle Develop: Database, Speaker: Sam Haradhvala (Senior Engineer on the Berkeley DB Java Edition product). We will be discussing the new High Availability/Replication functionality that will soon be available in Berkeley DB Java Edition -- how it works, how it can improve your application performance, reliability and throughput, as well as common use cases and configurations.

* Berkeley DB Applications Demo/Lunch. Date: Mon, Oct. 12th, Meeting Time: 11:00 - 13:00, Venue: Hilton Hotel - Union Square Room 6, 4th Fl. Greg Burd (Senior Product Manager) and several Berkeley DB engineers will be showing several Berkeley DB application demos, discussing how they were implemented and how similar functionality can be part of your application. Join us for lunch, some interesting demos and a open question and answer session.

* The Berkeley DB Product booth in the Exhibition Hall in Moscone West. We're workstation W-035, under the Database Track, in the Embedded Database sub-area just like last year. The booth is open Monday and Tuesday from 10:30-6:30 and Wednesday from 9:15-5:15. We're always delighted to talk with existing users, potential users and anyone who is curious about our NoSQL embedded database libraries.

September 30, 2009

Berkeley DB Java Edition DPL Support on Android

In a previous blog entry I wrote that JE worked on Android but that the DPL didn't work because of a lack of support for various methods (e.g. Class.getAnnotations()) in the Dalvik JVM.

I have verified that the Android 1.5 (and presumably recently released 1.6) SDK supports JE and DPL. JE/DPL on Android adds a major chunk of infrastructure to the Android platform in that a pure Java transactional POJO datastore is now available.

You can expect to see stepped up support of JE on Android in the near future. For instance, we plan on making je-android.jar libraries available so that developers won't have to compile the JE src under Dalvik.

September 22, 2009

Berkeley DB Java Edition: Customer Use of BeanShell Saves the Day

Three of us were working on a customer crisis late last night. The customer had a running environment which had some sort of transient state "issues". We were able to debug this live because they had the foresight to incorporate BeanShell into their system. This allowed us to look at lots of state and get a pretty good idea of what was going on. I strongly recommend taking a look at this library for use as a debugging tool, especially for deployed systems.

September 17, 2009

Berkeley DB Java Edition jconsole Plugin

My colleague Tao Zhang has been working on a jconsole plugin which will let you monitor Berkeley DB Java Edition stats in real time. This will be especially useful for helping our customers debug performance issues. We're all really excited about this new feature which will be available in JE 4.0 (the corporate lawyers will only let me say "GA in CY 10"). I want to share some screen shots of this. Here's a picture of the plugin. Note the two new tabs ("JE Statistics" and "JE Replicated Statistics"). The plugin allows you to log statistics to a csv file at a specified interval. It also lets you graph stats in real time: Show-graph.JPG

September 14, 2009

BDB and BDB XML Releases

Here is the press release for the latest BDB ("core") and BDB XML releases.

eWeek also has an article about this.