Pat Shuff's Blog

  • November 13, 2006

Berkeley DB - intro to concepts

so let's dive a little deeper into Berkeley DB

Berkeley DB is a general purpose embedded database engine. It is extremely fast. It is compiled and linked into your application. It runs in the same process space as your application. The database can store upto 256 terabytes of data and support 4 gigabytes of record keys.

There are four different in-memory cache designs: BTree, Hash, Queue, and Recno. The BTree and Hash are both for fast indexing and retreival. BTree is good for data that has locality of reference, each element relates to each other in some way. Hash is good for extremely large data sets. Queue is used for fast insert at the tail of the queue. Queue is good for high degrees of concurrency. Renco provides support for databases whose permenant storage is a flat text file.

To insert data into the database, it is different from the other versions of databases offered from Oracle. If you look at the ExampleDatabaseLoad.java that comes with the binary distribution, you can see that entries are stored in the database with a put command.

   myDbs.getVendorDB().put(null, theKey, theData);

In the command, we reference an already opened database, get the database instance known as VendorDB and call the put function to insert data. The data is inserted as a key and data elements. The key is a single element and the data is an array of elements. The function getVendorDB returns a string that points to a database that we create using the BTree construct in a file. This is done with the

   new Database("file", null, DbConfig)

function. The file parameter points to a directory and a file to store data.

Note that with Berkeley DB, you need to manage where everything is, how things are created, and how to add and search elements from the repository. It does not differ much from keeping records in a file but it gives you a good way of indexing and searching files that could potentially contain large amounts of data. Data is read using a get function to retrieve data as if it were an element of a structure. The key is used to point to the right element so that the right data is accessed.

Records can also be deleted using the delete function. It is important to remember that records are not truly deleted until the cache has not been written to the disk. This can be done manually with a sync or a close function call.

Cursors can be used to iterate over records in a database. If a database allows duplicate records off one key, then the cursor is the easiest way to access something other than the first record. Records are read using the cursor.getNext() function or the getPrev() function. Data can be written using cursors either with the putNoDupData, putNoOverwrite,  putKeyFirst, or putKeyLast functions. Updates are done with cursor.putCurrent and deletes are done with cursor.delete.

It is important to remember that you can open multiple databases at the same time and run them separately in different threads and even join data between multiple databases. I will not go into this detail here. The intention of this blog entry is to introduce the concept of Berkeley DB and how to insert, delete, update, and search for database elements.

Join the discussion

Comments ( 2 )
  • Fernando Espósito Saturday, December 2, 2006
    I work for a financial institution here in Buenos Aires, Argentina.
    I've came to your blog through my search for info about Berkeley DB.
    I've liked very much the way you explained how Berkeley Db works, thank you.
    I have a few questions that you might possible answer, if you don't mind.
    Now in my company, we're under a massive reengineering process. Old cobol/mainframe apps are scheduled for replacement by new java/j2ee/oracle/Unix ones.
    We've been evaluating several choices, including an in-house developed framework (which we built, in fact).
    We're now tinkering with Spring Framework and Hibernate and using Oracle 9i as a "bit bucket".
    Apps are deployed to the IAS app server in Unix and used from a thin GUI in the client (Windows PCs)
    Using Oracle 91 as a "bit bucket" is something that developers do these days, as you probably know. I really don't like it, It's like using a Ferrari to carry grocery stuff.
    So, my question is: Could Berkeley DB replace Oracle 9i for a typical multi user application in this context?
    How does it fit into an app server?
    Thank you very much.
  • Pat Shuff Sunday, December 3, 2006
    I would be wary to replace Oracle 9i with BerkeleyDB. The knowledge set and development environments are different. If you are familiar with database layout and SQL, go with something like Oracle Standard Edition or Oracle Lite. If you are familiar with C, C++, and plsql BerkeleyDB is a viable alternative.  The architecture of your applications will change as well as your interface with the "bit bucket repository".

    On a separate note, BerkeleyDB is not a multi-user enabled database. If you will have multiple applications and users attaching to the database you will have to worry about locking, consistency, and transaction commits. Yes Enterprise Edition is a bad tool for milk runs. I would look at alternate editions that are a little lighter.

    Some questions that you need to ask:
    1) how big is your repository? more or less than 4G
    2) how many users will connect to this repository at the same time?
    3) what is the knowledge base of your staff and your development tools
    4) what processor requirements are needed? one? two? more than two?

    Good luck.

Please enter your name.Please provide a valid email address.Please enter a comment.CAPTCHA challenge response provided was incorrect. Please try again.