By Charles Lamb on Nov 02, 2011
Here are my slides from my HPTS. There are some slides with performance figures starting at slide 28.
Oracle NoSQL Database's simple K/V pair model utilizes a B+Tree on each node to index by the key of each record. Is a Key-Value store useful with only primary key indexing? Absolutely.
In an unstructured or semi-structured environment, primary-key indexing is very often sufficient. Further, consider the case of Map/Reduce post-processing of NoSQL Database data in any of the above scenarios. During the M/R steps, secondary indices, sometimes ad-hoc, are effectively generated on-the-fly.
The Oracle NoSQL Database doc is available on OTN at:
The Oracle NoSQL Database development team has been working closely with the Cisco UCS team. This is a great partnership in that we work closely on performance and scalability testing using their UCS C-Series Rack-Mount Servers and Cisco Nexus 550 Series Switching and have access Cisco’s large cluster to run tests at massive scales and proof of concepts.
I am planning to write some blog entries describing the results. Cisco has produced a solution brief about Oracle NoSQL Database on the UCS platform.
I've been watching the twitter-sphere for comments about Oracle NoSQL Database. There are a number of common questions and misconceptions floating around that I'll address here:
Misconception #1: "Oracle NoSQL Database is just Berkeley DB Java Edition rebranded."; "Oracle NoSQL Database sounds like it's just Berkeley DB with extra bits."
When we built NoSQL Database, we recognized that Berkeley DB Java Edition HA provided us with lots of necessary, but not sufficient, elements for a NoSQL store. For instance, JE/HA gives us:
And you could even argue that its key/value data model is already "NoSQL". But we believe that NoSQL means something more to most people. Like
So although NoSQL Database is built using BDB JE/HA as the underlying, battle-tested, storage system (why reinvent the wheel?), NoSQL Database adds a large amount of infrastructure on top of it to bring it into the NoSQL realm. As my colleague Chao Huang says, "BDB JE is like an engine. NoSQL Database is the car built with the engine."
Misconception #2: "Oracle NoSQL Database has the same API as Berkeley DB Java Edition"
I realize that at the time of this writing we have not released the software so the reader has no way of looking at the javadoc to see the actual NoSQL Database API, but suffice it to say that the API is not the same as BDB JE. The interface is Java, and it provides CRUD, iteration, and CAS (aka "RMW") capabilities on key/value pairs. There is also a major/minor key capability. All key/value pairs with the same major key reside on the same "Rep Group" (a Rep Group is just a BDB JE HA replication group of a master and N replicas). That way, records can be clustered (e.g. put all records related to "Fred" on the same node). One other (slight) difference between the BDB JE and NoSQL Database APIs is that the former uses byte for keys and the latter uses Strings for keys. Both use byte for the data portion.
(Non-) Misconception #3: "Oracle is adding network bindings to Berkeley DB Java, branding it Oracle NoSQL. I am curious how easy setup and develoment will be."
Let me address the second question first (ease of setup/development). Although this isn't a misconception, it is a good question. In general it is difficult for the average developer who wants to try out a large distributed store to find sufficient hardware to get a reasonable sized cluster going. Well, maybe it's difficult not for you, but it sure is for all of us -- we have to claw and scratch for every machine we use(*). So George (one of developers) put together what we call "kvlite", a single process version of Oracle NoSQL Database. kvlite is really easy to start up (one simple command line invocation) and gives the user a good way of trying out the API without a lot of muss and fuss. The "server side" is in no way tuned for performance, but it lets you get things going really quickly so you can kick the tires, try out your application code, etc. while your sysadmins and IT folks scrounge the real hardware for you to use for deployment.
(*) We actually have several large clusters to do development and performance testing at our disposal.
And now the first part of the question (adding network bindings to Berkeley DB Java Edition). Hmm, that's kind of, sort of true. Let me try to reframe the statement. BDB JE HA allows a user to perform operations on either the master (for updates and reads) or the replicas (for reads). The most common objection that we encounter is that the application has to "know" which nodes are the master and the replicas (for routing updates and read requests appropriately). There is no network layer in BDB JE/HA to handle this for you. Oracle NoSQL Database provides this capability. You link in the kvclient.jar (the "driver") to your application, and presto, you can make your CRUD (or iteration) method calls on your K/V Store. The kvclient.jar figures out which node to route the request to (it knows which Rep Group holds the key value pair and which node in that Rep Group is the master). So in that sense, it adds a network layer to BDB, but the API is different from BDB so I wouldn't exactly call it a network binding. There's a lot of infrastructure and intelligence (e.g. load balancing) built into the kvclient "driver".
Oracle NoSQL Database provides network-accessible multi-terabyte distributed key/value pair storage that offers predictable latency. That is, it services network requests to store and retrieve data which is organized into key-value pairs. It offers full Create, Read, Update and Delete (CRUD) operations, with adjustable durability guarantees. Oracle NoSQL Database is designed to be a highly available and extremely scalable system, with predictable levels of throughput and latency, while requiring minimal administrative interaction.
My colleagues and I have been working hard to bring this project to fruition and it's truly exciting for all of us to see it roll out the door (as well as to be able to finally talk about it in public). It will come in two versions, an Open Source Community Edition, and a value-add "Enterprise Edition". Initially, both Editions will have the same feature set, but in subsequent releases there will be differentiation between the two. My colleague Margo Seltzer has written a fine whitepaper which describes the system. If you have the time, it's an easy read.
In future posts to this blog I hope to talk about some of the great performance and scaling numbers we're seeing in our tests. To demonstrate the system's capabilities, we've been working with two very fine corporate partners to run tests on clusters of up to 192 nodes.
We also announced the Oracle Big Data Appliance, an "engineered system" which will run (among other things) Oracle NoSQL Database.
Anything related to Oracle NoSQL Database and/or Berkeley DB Java Edition.