Oracle NoSQL Database vs Berkeley DB Java Edition
By Charles Lamb on Oct 06, 2011
I've been watching the twitter-sphere for comments about Oracle NoSQL Database. There are a number of common questions and misconceptions floating around that I'll address here:
Misconception #1: "Oracle NoSQL Database is just Berkeley DB Java Edition rebranded."; "Oracle NoSQL Database sounds like it's just Berkeley DB with extra bits."
When we built NoSQL Database, we recognized that Berkeley DB Java Edition HA provided us with lots of necessary, but not sufficient, elements for a NoSQL store. For instance, JE/HA gives us:
- ACID Transactions
- High Availability
- High Throughput
- Large Capacity
- Lights out administration
And you could even argue that its key/value data model is already "NoSQL". But we believe that NoSQL means something more to most people. Like
- Data distribution
- Dynamic partitioning (aka "sharding")
- Load balancing
- Monitoring and Administration
- Predictable latency
- Multi-node backup
So although NoSQL Database is built using BDB JE/HA as the underlying, battle-tested, storage system (why reinvent the wheel?), NoSQL Database adds a large amount of infrastructure on top of it to bring it into the NoSQL realm. As my colleague Chao Huang says, "BDB JE is like an engine. NoSQL Database is the car built with the engine."
Misconception #2: "Oracle NoSQL Database has the same API as Berkeley DB Java Edition"
I realize that at the time of this writing we have not released the software so the reader has no way of looking at the javadoc to see the actual NoSQL Database API, but suffice it to say that the API is not the same as BDB JE. The interface is Java, and it provides CRUD, iteration, and CAS (aka "RMW") capabilities on key/value pairs. There is also a major/minor key capability. All key/value pairs with the same major key reside on the same "Rep Group" (a Rep Group is just a BDB JE HA replication group of a master and N replicas). That way, records can be clustered (e.g. put all records related to "Fred" on the same node). One other (slight) difference between the BDB JE and NoSQL Database APIs is that the former uses byte for keys and the latter uses Strings for keys. Both use byte for the data portion.
(Non-) Misconception #3: "Oracle is adding network bindings to Berkeley DB Java, branding it Oracle NoSQL. I am curious how easy setup and develoment will be."
Let me address the second question first (ease of setup/development). Although this isn't a misconception, it is a good question. In general it is difficult for the average developer who wants to try out a large distributed store to find sufficient hardware to get a reasonable sized cluster going. Well, maybe it's difficult not for you, but it sure is for all of us -- we have to claw and scratch for every machine we use(*). So George (one of developers) put together what we call "kvlite", a single process version of Oracle NoSQL Database. kvlite is really easy to start up (one simple command line invocation) and gives the user a good way of trying out the API without a lot of muss and fuss. The "server side" is in no way tuned for performance, but it lets you get things going really quickly so you can kick the tires, try out your application code, etc. while your sysadmins and IT folks scrounge the real hardware for you to use for deployment.
(*) We actually have several large clusters to do development and performance testing at our disposal.
And now the first part of the question (adding network bindings to Berkeley DB Java Edition). Hmm, that's kind of, sort of true. Let me try to reframe the statement. BDB JE HA allows a user to perform operations on either the master (for updates and reads) or the replicas (for reads). The most common objection that we encounter is that the application has to "know" which nodes are the master and the replicas (for routing updates and read requests appropriately). There is no network layer in BDB JE/HA to handle this for you. Oracle NoSQL Database provides this capability. You link in the kvclient.jar (the "driver") to your application, and presto, you can make your CRUD (or iteration) method calls on your K/V Store. The kvclient.jar figures out which node to route the request to (it knows which Rep Group holds the key value pair and which node in that Rep Group is the master). So in that sense, it adds a network layer to BDB, but the API is different from BDB so I wouldn't exactly call it a network binding. There's a lot of infrastructure and intelligence (e.g. load balancing) built into the kvclient "driver".