X

The Oracle NoSQL Database Blog covers all things Oracle NoSQL Database. On-Prem, Cloud and more.

  • February 1, 2016

Oracle NoSQL BulkPut

A Chandak
Product Manager
Our customers have often asked us “what’s the fastest and most efficient way to insert a large number of records in the Oracle NoSQL database? “ Very recently, a shipping company reached out to us with a specific requirement of using Oracle NoSQL database for their ship management application, which is used to track the movements of their container ships that moves the cargo from port to port. The cargo ships are all fitted with GPS and other tracking devices, which relays ship's location after a few seconds into the application.
 
The application is then queried for  1) The location of all the ships displayed on the map 2) A specific ship's trajectory over a given period of time displayed on the map too. As the volume of the location data started growing, the company is finding hard to scale the application and is now looking at a back-end system that can ingest this large data-set very efficiently.
 
Historically, we have supported the option to execute a batch of operations for records that share the same shard key, which is what our large airline customer (Airbus) has done. They pre-sort the data by the shard key and then perform a multi-record insert when the shard key changes. Basically, rather than sending and storing a record at a time they can send a large number of records in a single operation. This certainly saved network trips, but they could only batch insert records that shared the same shard key. With Oracle NoSQL Database release 3.5.2, we have added the ability to do a bulk insert or a bulk put records across different shards, allowing application developers to work more effectively with very large data-sets.
 
The BulkPut API is available for the table as well as the key/Value data  model. The API provides significant performance gains over single row
inserts by reducing the network traffic round trips as well as by doing ordered inserts in batch on internally sorted data across different shards in parallel. This feature is released in a controlled fashion, so there aren’t java docs available for this API with this release, but we encourage you to use it and give us feedback.
 

API

KV interface:
loads Key/Value pairs supplied by special purpose streams into the store.

public
void put(List<EntryStream<KeyValue>> streams, BulkWriteOptions
bulkWriteOptions)

 

Table
interface
:
loads rows supplied by
special purpose streams into the store.

public void put(List<EntryStream<Row>>
streams, BulkWriteOptions
bulkWriteOptions)
streams the streams that
supply the rows to be inserted.

bulkWriteOptions non-default
arguments controlling the behavior the bulk write operations

Stream Interface :

public interface EntryStream<E>
{

String name();
E getNext();
void completed();
void keyExists(E entry);
void catchException(RuntimeException
exception, E entry);

}

 

Performance

We
ran the YCSB benchmark with the new Bulk-Put API on 3x3 (3 shards each with 3 copies of data) NoSQL Cluster running
on bare metal servers, ingesting 50M records per shard or 150M records across
the datastore, using 3 parallel thread per shard or total 9 ( 3x3) for the
store and 6 parallel input streams per SN or total 54 ( 6 *9) across the store.
The results for the benchmark run are shown in the graph below

 

The
above graph compares the throughput (ops/sec) of Bulk vs Simple Put API with
NoSQL store having 1000 partitions
with durability
settings
of None and Simple Majority.
As seen from the above charts there is over a 100% increase in throughput with
either durability settings.

Sample Example

Here's link program uploaded to the github repository, the sample demonstrate how to use the BulkPut API in your application code. refer to the readme file for details related to the program execution.

 

{C}

{C}{C}{C}

Summary

If
you are looking at bulk loading data into Oracle NoSQL Database the latest Bulk
Put API provides the most efficient and fastest (as demonstrated by the YCSB) way
to ingest large amount of data. Check it out now and download the latest
version of the Oracle NoSQL Database at: www.oracle.com/nosql.

I'd like to thanks my colleague Jin Zhao for inputs on the performance numbers.

 

 

 

 

 

Join the discussion

Comments ( 2 )
  • Vibhas Saturday, February 6, 2016

    I anxiously wait for your posts and they are timely and excellent. Thank you. This post and sample code was much awaited. I have a question. Can BulkPutAPI be used to load a CSV file or "select * from DB" into NoSQL database. If yes, then request you to kindly suggest how to create such a Streaming input for this API.

    Thanks and Regards,


  • Anand Monday, February 8, 2016

    Vibhas, thanks for writing to me. As, I described above, the bulkput api takes EntryStream as input. So, you can implement your own version of EntryStream Interface to read from CSV file or read from database and load it into Oracle NoSQL Database. As such the API is agnostic of what the source is data is and is driven by users or consumers of the API.


Please enter your name.Please provide a valid email address.Please enter a comment.CAPTCHA challenge response provided was incorrect. Please try again.Captcha
Oracle

Integrated Cloud Applications & Platform Services