Sunday Jan 23, 2011

The C++ Standard Template Library as a BDB Database (part 2)

In the first entry I touched on some of the features available when using Berkeley DB's C++ STL API. Now it's time to dive into more detail.

Every copy of Berkeley DB comes with this API, there is nothing extra to download or add-on. Simply download the latest version and configure it to build the C++ and STL APIs and you're ready to go.

We've worked to make it very easy to use dbstl if you already have C++ STL programming experience. It is especially easy to map your C++ STL containers existing code to dbstl. In the following sections I will list several pieces of C++ code which uses C++ STL, and show you how to convert them to use dbstl instead.

If you want to use dbstl to manage purely in-memory data structures there is very little conversion is needed from standard STL, but most of the time you will choose Berkeley DB for its persistence, concurrency and transactional recovery features. To use those features you will need to use some of the basic features of dbstl.

I. Basic Features

1. Suppose we have the following code which uses C++ STL std::vector container:
<script src=""></script>

int vector_test(int, char**)
  typedef std::vector<double> dbl_vec_t; // (1)
  dbl_vec_t v1; // Empty vector of doubles. (1)
  // The rest follows ...
view raw stl-ex-1a.cpp This Gist brought to you by GitHub.


This is the code to use dbstl instead, note that only lines 3 and 4 are modified.
<script src=""></script>

int vector_test(int, char**)
  typedef dbstl::db_vector <double, ElementHolder<double> > dbl_vec_t; // (1)
  dbl_vec_t v1; // Empty vector of doubles. (1)
  // The rest follows ...
view raw stl-ex-1b.cpp This Gist brought to you by GitHub.


The reason for the change is:

a. All dbstl classes and global functions are defined inside "dbstl" namespace.

b. For all dbstl container class templates, we must add one more type parameter ElementHolder if T is a primitive data type like int, double, char*, etc; If T is a class type, this type parameter is not needed.

c. Here we used default constructor for v1, in the dbstl case, this means an anonymous in-memory database will be created and used only by the v1 vector. You can only share the v1 vector or its database handle to share the underlying database in current process only.

Alternatively, you can create a database environment and open a database inside it explicitly and use the opened handles to create a container like the following:
<script src=""></script>

int vector_test(int, char**)
  typedef dbstl::db_vector <double, ElementHolder<double> > dbl_vec_t;
  DbEnv *penv = new DbEnv(DB_CXX_NO_EXCEPTIONS); // (2)
    flags | DB_CREATE | DB_INIT_MPOOL | DB_PRIVATE, 0777); // (3)
  pdb = dbstl::open_db(penv, "vector2.db",
    DB_RECNO, DB_CREATE | dboflags, 0); // (4)
  dbstl::register_db_env(penv); // (5)
  dbstl::register_db(pdb); // (6)
  dbl_vec_t v1(penv, pdb); // (7)
  // The rest follows ...
view raw stl-ex-1c.cpp This Gist brought to you by GitHub.


This snippets contains the majority of code you need to add to convert your C++ STL application into a dbstl-enabled application.

In line 3, we must create the DbEnv object using "new" operator, this is also a requirement to create a Db object; In line 4, the "flags" variable can be set to open a transactional environment, a concurrent data store environment, or simply a data store environment. Actually any valid use of Berkeley DB via its C/C++ API can be used here.
Between lines 3 and 4, you can set various flags or callback functions to configure the environment, in the same way you used Berkeley DB C++ API. There is a helper function dbstl::open_env to open an environment in one call, but with less configurations to do.

In line 5, we used a helper function dbstl::open_db to open the database, and optionally set various flags to the database. It helps you to open a database easier, though there are something you can't do with it, for example setting callback functions. So use it if it is sufficient for you, or simply open a database in the same way you use Berkeley DB C++ API.

We need to pay attention that different type of containers have different requirement to its database handles, see dbstl API documentation for details. Here db_vector requires the database to be of DB_RECNO type.

In lines 6 and 7, we must register the created database and environment handles into dbstl in each thread using the handles, see the documentation for the two register functions for details.

In line 8, we pass the database and environment handles to v1, so that v1 is backed by pdb database. Other thread of control can also open the database and access it concurrently, using dbstl, or simply using DB C/C++ API.

2. Apart from the above code to construct a dbstl container, the rest of the code does not need any modification. The following snippet can be appended to the above three snippets to be three complete functions doing basically the same thing:
<script src=""></script>

  for(dbl_vec_t::iterator itr = v1.begin(); itr != v1.end(); ++itr)
    *itr *= 2;
  for (int i = 0; i < v1.size(); i++)
    v1[i] += 3;
  v1.swap(v2); // Swap the vector's contents.
  v2 = v1; // Assign one vector to another.
  assert(v1 == v2);
  std::reverse(v1.begin(), v1.end());
  // More standard features follow ...
  for(dbl_vec_t::reverse_iterator ritr = v1.begin(); ritr != v1.rend(); ++ritr)
    *ritr /= 2;
  v1.insert(v1.begin(), 34);
  assert(v1.front() == 34);
  v2.assign(v1.begin(), v1.end());
  return 0;
view raw stl-ex-1d.cpp This Gist brought to you by GitHub.


Here in line 1, there are some more parameters in v1.begin() to control the behavior of the created iterator in dbstl. Refer to the dbstl API documentation for more information.

In 2, after this call, the new value for this element is stored into database, also true for line 5.

In line 4, you can make the v1.size() call to compute faster but not precisely. This is helpful when the database contains millions of key/data pairs. Like size(), there are some more similar methods in all container classes which have default parameters to work like C++ STL, but can be configured to work better with Berkeley DB in special situations.

In 7 the key/data pairs in v2's backing database is written to v1's backing database, after the data in v1 is truncated. Also true for line 8, where v1's key/data pairs written into v2 after data in v2 is truncated.

In line 10, almost all algorithm functions in C++ STL library can work with dbstl, because dbstl has standard iterators and containers, the default behaviors of dbstl containers and iterators follow C++ STL specifications. The exception to this fact is the "inplace_merge", "find_end" and "stable_sort" in the STL library of GCC compiler, these three functions don't work with dbstl correctly. Apart from them, all C++ STL algorithms are always applicable to dbstl.

Starting at line 14, dbstl containers and iterators have all methods that each corresponding C++ STL containers have, and each with identical default behaviors. So you can use dbstl just the same way you use C++ STL containers/iterators to access Berkeley DB.

In our next post...
Next up we'll dive even deeper into more advanced features of the dbstl API. For now if you'd like to read ahead the code is here.

Monday Nov 15, 2010

Berkeley DB TechCast Live, Watch at 10AM/PST TODAY!

Today is a big news day for Berkeley DB. We're on, the second banner. It's because today Dave Segleau, the Director of Product Management for Oracle Berkeley DB products, will chat with Justin Kestelyn about embedded databases and how Berkeley DB is the right technology at the right time for you edge/embedded/application needs. From cloud services to phone storage, Berkeley DB has you covered. Please join us on by watching the tech cast at 10AM/PST today.

Tuesday Nov 02, 2010

Berkeley DB Java Edition 4.1.6

Yesterday we released a new version of Berkeley DB Java Edition. This new release has some major enhancements for speed. BDB JE has always been as fast as the I/O + stable storage (disk) system for writes due to its write-once, append-only log-based architecture for fully durable commits (semi-durable, those which commit to operating system buffers rather than to the stable storage, operate at in-memory speeds). The issue until now was with random reads. Now, even with modest sized caches (512MB), you can experience predictable latency for random out-of-cache reads even for multi-TB databases.

This is a first in the pure-Java world. BDB JE is the only solution when you need large scale, predictable ACID storage for non-relational data. Imagine configuring your heap to 2GB and BDB JE's cache to 512MB then accessing TBs of data on disk knowing that your application will have 1.5GB of memory in the JVM to use.

Memory management and GC have always been tricky to get right when building large scale Java systems. With this release of Berkeley DB Java Edition we help take you one step closer to a predictable database in pure-Java.

Read more on Charlie Lamb's blog.

Friday Oct 15, 2010

Open SQL Camp, Boston

[Read More]

Friday Sep 17, 2010

Berkeley DB 11gR2 ( Released!

[Read More]

Information about Berkeley DB products directly from the people who build them.


« May 2016