X

The Oracle NoSQL Database Blog covers all things Oracle NoSQL Database. On-Prem, Cloud and more.

  • September 16, 2014

Using multiDelete for efficient cleanup of old data

In a recent project one of our field engineers ( Gustavo Arango ) was confronted by a problem, he needed to efficiently delete millions of keys beneath a key space where he did not know the complete major key path, which ended in a time stamp.   He quickly discovered a way to efficiently find these major key paths and then use them to perform high speed multi-value deletions without causing unusual chattiness ( a call for each key-value pair ) on the network.   I thought it would be useful to review this solution here and give you a link to a Github example of working code to play with and understand the behavior. 

This is possible by using Oracle NoSQL's method storeKeysIterator( ) on your connection to the cluster.    This iterator can be used to obtain a list of store keys in a specific range, in a single call and without loading all of the value data beneath the keys:

First, you need a partial key path:

Key key = Key.createKey( "/root/pieceOfKey" );

Next you need a range: 

KeyRange kr = new KeyRange( "aastart" ,true, "zzfinish", true);

Now you can get your Key iterator  ( getKVStore( ) returns a connection handle to cluster, an instance of KVStore ):

Iterator<Key> iter = getKVStore().storeKeysIterator(Direction.UNORDERED, 0, key, kr, Depth.DESCENDANTS_ONLY);

So, this is nice as the storeKeysIterator will do a query to the cluster and return ONLY the keys that start with that partial path and optionally in the range specifier, no need to give a range if you want all the keys.    

Now, to efficiently delete many keys with a single call to the cluster, you need to have a full major key path.  So, now that you have the whole set of keys, you can ask them what is their full major key path and then use that information to do the efficient multiDelete operation, which in Gustavo's case of storing and managing dense time series data, meant millions of keys/values being deleted with only a small number of actual network calls and very little data being transferred across the network. 

boolean enditerate = false;  

          while(! enditerate){

              if(iter.hasNext()){

                Key iterkey = iter.next();

                iterkey = Key.createKey(iterkey.getMajorPath());

                int delNb = getKVStore(). multiDelete(iterkey, kr, Depth.PARENT_AND_DESCENDANTS, durability, 0, null) ;   

                res += delNb;

              }else{  enditerate = true; }


If you want to get tricky and do this in parallel, you can wrap this in a worker thread and place the initialization of the iterator inside the while loop, effectively treating the iterator like a queue.  It would cause a new query to happen every time the loop iterates, but in the mean time, a lot of other threads may have deleted half of that queue.

In fact, there are also parallel versions of the store iterator that will query the entire cluster in parallel using the number of threads that works best given your hardware configuration and the number of shards in your cluster.  You can learn more about in the online documentation.

If you are interested in playing with the ideas in this blog, there is a Github example project in Eclipse that will get you started. 

Be the first to comment

Comments ( 0 )
Please enter your name.Please provide a valid email address.Please enter a comment.CAPTCHA challenge response provided was incorrect. Please try again.Captcha
Oracle

Integrated Cloud Applications & Platform Services