Replicated Replicants

Oracle has just recently released a new version of Oracle Berkeley DB Java Edition including new High Availability features. These allow you to keep multiple database instances in sync (using a single master). Some time ago I was asked if we'd like to help evaluate a pre-release version of the code, and of course I said yes. We've been waiting for HA features to be available in the database to implement our own replication support so it was a perfect fit for evaluation. Some very insightful people had good things to say about it. Since we had a head start, I already have a working version of the AURA Data Store with replication.

The AURA Data Store has three tiers - the Data Store Head serves as the communication point to the outside world and also distributes queries to each of the many Partition Clusters. Partition Clusters each represent a single partition of the data, replicated across a cluster of Replicants. Until recently we only supported a single Replicant, but thanks to BDB-JE/HA we now have support for multiple replicas. If you're following along in the source code you'll want to update to the "replication" branch to see the work being done. Adding support was fairly straightforward once I got a handle on how each of the nodes in a replication group are connected to each other. We already had infrastructure that expected to work this way, so integration was smooth. When setting up the group, you specify how closely synchronized the replicas need to be, and when committing transactions you can specify just how far into the disk or network you want the commit to go before returning. So we commit to the master and in a few seconds we can expect to see changes in the replicas.

The only catch was that we maintain both a database and a search engine. We haven't put any support in the search engine for HA (although a single index can have multiple readers and writers if we were sharing it). So for the time being I have a thread that picks up any changed item keys from the master and informs the replicas that they should fetch those items from their respective databases and re-index them. What would be nice would be if we could get an event stream from the database in the replicas notifying us that a new or updated item just came in. Another option might be to actually store the search engine's data in the database and let it do the replication, but the nature of the inverted file doesn't really lend itself to this (at least, not with any hope of performing well).

Anyway, the real excitement here was that for the first time, we got to see our data store visualization show us multiple Replicants for each Partition Cluster:

This is a screenshot showing a very small ("two-way") data store. It is running with only one DataStoreHead, and the data is divided across two partitions. Each partitions has three Replicants. While the Replicants are drawn inside the Partition Clusters, it should be noted that they are actually separate processes running on separate machines. The grouping is just a logical view. I opened up the overall Item Stats display to show that we only have a small number of items. To make the screenshot more interesting, I'm running a 2,000 user load test that simulates the actual operations users would perform (basically, a weighted mix of getting item details, searching, and getting live recommendations) when using an application such as the Music Explaura.

As you can see in the image, we're distributing query load fairly evenly across all Replicants in each Partition Cluster. Replicants do most of the heavy lifting in terms of data crunching. In order to benefit from the greater amount of cache afforded us by the greater number of processes/memory, we distribute what queries we can based on the hash code of the item key, thereby asking for the same items from the same Replicants. The Partition Clusters are doing a little work in distributing those queries, and the Data Store Head is doing a little more work in that it needs to combine results from across Partition Clusters.

I plan to do some more posting about how BDB-JE/HA is integrated into the AURA Data Store, so stay tuned!


I'm one of the essay writers in our IT college and I found this post informational. I'll be expecting more of your blogs. thank you.

Posted by Essay Writers on November 24, 2009 at 08:31 AM EST #

Any news about how BDB-JE/HA is integrated into the AURA Data Store?

Posted by Custom Essay on January 02, 2010 at 05:49 AM EST #

Your helpful article made me look on some Oracle problems from another point of view...thanks for sharing...

Posted by Writing Research Papers on January 02, 2010 at 05:50 AM EST #

Post a Comment:
  • HTML Syntax: NOT allowed

Jeff Alexander is a member of the Information Retrieval and Machine Learning group in Oracle Labs.


« July 2016