Learn all about Oracle BerkeleyDB family of databases here. Scalable and high-performance data management services.

A LocusTree to search genomic loci using Berkeley DB Java Edition

Guest Author

Pierre Lindenbaum has adapted Jan Aerts' LocusTree implementation written in Ruby to work on top of Berkeley DB Java Edition. In his blog post he writes about using this combination to build a genome browser.

Genomic research is rapidly growing. There is an explosion of new data and along with that an explosion of new questions to be asked of and answered by that data. George Church is behind the Personal Genome Project working to collect huge amounts of new data, but gathering this data is only the first step. Storing and analyzing it is where the discovery process needs technological advancements. To compound the issue even more, today we examine base pairs within genes by understanding their proximity to other sections of the genome, there is new work to examine the proximity of base pairs which are near each other in the folds of a DNA strand rather than their proximity along the length of the strand. Could it be important to understand how two segments overlap? For both problems the solution is a sturdy technical foundation which allows you to search quickly and find interesting features within the data.

One area where there is always a need for more work is that of drug discovery. The central question of this article in the Economist is that "a toxic mix of science and economics" is preventing research and discovery. A large piece of the economics half has to do with the cost associated with making discoveries. Using new techniques which are faster and manage more data can help to change that by lowering the cost of research.

In this case, searching genomic data sets, the use of the Berkeley DB Java Edition database make perfect sense. It is a non-relational dataset, this data isn't tabular or relational at all rather it's commonly organized in key/value pairs. BDB JE is a spectacular database for indexing large data sets without the overhead of other more complex systems and a perfect fit for Pierre's visualization project.

Be the first to comment

Comments ( 0 )
Please enter your name.Please provide a valid email address.Please enter a comment.CAPTCHA challenge response provided was incorrect. Please try again.