Data management in unexpected
When you think of network switches, routers, firewall
appliances, etc., it may not be obvious that at the heart of these kinds of
solutions is an engine that can manage huge amounts of data at very high
throughput with low latencies and high availability.
Consider a network router that is processing tens (or
hundreds) of thousands of network packets per second. So what really happens inside a router? Packets are streaming in at the rate of tens
of thousands per second. Each packet has
multiple attributes, for example, a destination, associated SLAs etc. For each packet, the router has to determine
the address of the next “hop” to the destination; it has to determine how to
prioritize this packet. If it’s a high
priority packet, then it has to be sent on its way before lower priority
packets. As a consequence of
prioritizing high priority packets, lower priority data packets may need to be
temporarily stored (held back), but addressed fairly. If there are security or privacy requirements
associated with the data packet, those have to be enforced. You probably need to keep track of statistics
related to the packets processed (someone’s sure to ask). You have to do all this (and more) while
preserving high availability i.e. if one of the processors in the router goes
down, you have to have a way to continue processing without interruption (the
customer won’t be happy with a “choppy” VoIP conversation, right?). And all this has to be achieved without ANY
intervention from a human operator – the router is most likely to be in a
remote location – it must JUST CONTINUE TO WORK CORRECTLY, even when bad things
How is this implemented? As soon as a packet arrives, it is interpreted by the receiving software. The software decodes the packet headers in
order to determine the destination, kind of packet (e.g. voice vs. data), SLAs
associated with the “owner” of the packet etc. It looks up the internal database of “rules” of how to process this
packet and handles the packet accordingly. The software might choose to hold on to the packet safely for some
period of time, if it’s a low priority packet.
Ah – this sounds very much like
a database problem. For each packet, you
have to minimally
· Look up the most efficient next “hop” towards
the destination. The “most efficient”
next hop can change, depending on latency, availability etc.
· Look up the SLA and determine the priority of
this packet (e.g. voice calls get priority over data ftp)
· Look up security information associated with
this data packet. It may be necessary to retrieve the context for this network
packet since a network packet is a small “slice” of a session. The context for the “header” packet needs to
be stored in the router, in order to make this work.
· If the priority of the packet is low, then
“store” the packet temporarily in the router until it is time to forward the
packet to the next hop.
· Update various statistics about the packet.
In most cases, you have to do all this in the context of a
single transaction. For example, you
want to look up the forwarding address and perform the “send” in a single
transaction so that the forwarding address doesn’t change while you’re sending
the packet. So, how do you do all this?
Berkeley DB is a proven, reliable, high performance, highly
available embeddable database, designed for exactly these kinds of usage
scenarios. Berkeley DB is a robust, reliable, proven solution that is currently
being used in these scenarios.
First and foremost, Berkeley DB (or BDB for short) is very
very fast. It can process tens or
hundreds of thousands of transactions per second. It can be used as a pure in-memory database,
or as a disk-persistent database. BDB
provides high availability – if one board in the router fails, the system can
automatically failover to another board – no manual intervention required. BDB is self-administering – there’s no need
for manual intervention in order to maintain a BDB application. No need to send a technician to a remote
site in the middle of nowhere on a freezing winter day to perform maintenance
BDB is used in over
200 million deployments worldwide for the past two decades for mission-critical
applications such as the one described here. You have a choice of spending valuable resources to implement similar
functionality, or, you could simply embed BDB in your application and off you
go! I know what I’d do – choose BDB, so I can focus on my business
problem. What will you do?