X

The Oracle NoSQL Database Blog covers all things Oracle NoSQL Database. On-Prem, Cloud and more.

  • January 4, 2016

Oracle NoSQL BulkGet API

A Chandak
Product Manager
As seen from the timing graph (a)  above, getting 30M rows using simple get API  would have taken us  420 seconds which reduces to 149ms with 72 executor threads (Plotted on X-Axis) using the bulk get API. This is almost a 3X improvement! And as seen on the graph (b) the throughput went to 200K ops/s with 72 executor threads from 68k ops/sec using simple get operation. That is again a 3X improvement!

In the above charts, the bulk-X is the maximum number of concurrent request that specifies the maximum degree of parallelism (in effect the maximum number of NoSQL Client side threads) to be used when running an iteration. To achieve the good performance we want to keep the Replication Nodes working as much as possible, so as a starting point we suggest using 3 * available RN (Replication Nodes). For e.g., if you are running a 3x3 store then you could start with 3*9 ( 3 shards each with 3 copies of data) = 27 concurrent requests. Of course, the optimal value varies based on the nature of the application requirements - some may want to be unobtrusive and use minimal resources (but efficiently) with elapsed time being a lower priority for e.g running analytic on secondary zones and on the other hand you may want strong a real-time latency running multi-get on Primary Zones.

Please refer the java documentation for the API. We support both the key/value and table interfaces for the API

Performance 

In our internal Yahoo Server Scalability Benchmark runs we found that we could retrieve 30M rows in 149 sec with 72 executor threads, running on 3x3 shards, with 90 reader threads 

and each record size is 100 bytes.  Refer to the below chart  for details of the benchmark runs

                                        (a)                                                                                                      (b)

As seen from the timing graph (a)  above, getting 30M rows using simple get api  would have taken us 420 seconds which reduces to 149ms with 72 executor threads (Plotted on X-Axis) using the bulk get API. This is almost a 3X improvement ! And as seen on the graph (b) the throughput went to 200K ops/s with 72 executor threads from 68k ops/sec using simple get operation. That is again a 3X improvement!

In the above charts the bulk-X  is the maximum number of concurrent request that specifies the maximum degree of parallelism (in effect the maximum number of NoSQL Client side threads) to be used when running an iteration. To achieve good performance we want to keep the Replication Nodes working as much as possible, so as a starting point we suggest to use 3 * available RN (Replication Nodes). For e.g.  if you are running a 3x3 store then you could start with 3*9 ( 3 shards each with 3 copies of data) = 27 concurrent requests. Off course, the optimal value varies based on the nature of the application requirements - some may want to be unobtrusive and use minimal resources (but efficiently) with elapsed time being a lower priority for e.g. running analytic on secondary zones and on the other hand you may want strong a real-time latency running multi-get on Primary Zones.

 

Table Definition and Sample BulkGet Example :

Below, is some sample code to demonstrate how to use this API

 

PhoneTable (manufacturer string,
               price double , 
               …
               primary key(shard(manufacturer), price));

 

import java.util.ArrayList;

import java.util.List;

import oracle.kv.Direction;

import oracle.kv.KVStore;

import oracle.kv.KVStoreConfig;

import oracle.kv.KVStoreFactory;

import oracle.kv.StoreIteratorException;

import oracle.kv.table.FieldRange;

import oracle.kv.table.MultiRowOptions;

import oracle.kv.table.PrimaryKey;

import oracle.kv.table.Row;

import oracle.kv.table.Table;

import oracle.kv.table.TableAPI;

import oracle.kv.table.TableIterator;

import oracle.kv.table.TableIteratorOptions;

 

public class BulkGetExample {

 

 // hard-coded connection parameters in the example, this could be taken as input to the program or can be read from configuration file

 

    private final String storeName = "kvstore";

    private final String hostName = "localhost";

    private final int port = 5000;

    private final String tableName = "phones";

    final KVStore store;

    public static void main(final String args[]) {

        try {

            BulkGetExample runTest = new BulkGetExample();

            runTest.run();

        } catch (Exception e) {

            System.err.print("BulkGetExample run failed: " + e.getMessage());

        }

    }

 

    BulkGetExample() {

        store = KVStoreFactory.getStore

            (new KVStoreConfig(storeName, hostName + ":" + port));

    }

 

    void run() {

        final String[] manufacturers = {"Nokia", "Apple", "Samsung", "Motorola"};

        final List<PrimaryKey> keys =

            new ArrayList<PrimaryKey>(manufacturers.length);

        final TableAPI tableAPI = store.getTableAPI();

        final Table phoneTable = tableAPI.getTable("PhoneTable");

        if (phoneTable == null) {

            throw new IllegalArgumentException("Table not found: " + tableName);

        }

        for (String manufacturer : manufacturers) {

            final PrimaryKey pk = phoneTable.createPrimaryKey();

            pk.put("manufacturer", manufacturer);

            keys.add(pk);

        }

 

       /* Initialize multiRowOption: price range in [200, 500].  */

        final FieldRange range = phoneTable.createFieldRange("price");

        range.setStart(200d, true).setEnd(500d, true);

        final MultiRowOptions mro = new MultiRowOptions(range, null, null);

       /* Initialize TableIteratorOptions */

//  Setting batch size parameter as 200, this number indicates the maximum number of results batches that can be held in the No-SQL database client before its processed by the replication node

        final int batchResultsSize = 200;

//  Setting the maximum number of concurrent threads of executor threads to 9, I ran this on 3x3 shard

        final int parallelism = 9;

 

        final TableIteratorOptions tio =

           new TableIteratorOptions(Direction.UNORDERED /* Direction */,

                                     null /* Consistency */,

                                     0 /* RequestTimeOut */,

                                     null /*TimeUnit*/,

                                     parallalism,

                                     batchResultsSize);

        TableIterator<Row> itr = null;

        int count = 0;

        try {

 

            itr = tableAPI.tableIterator(keys.iterator(), mro, tio);

            while (itr.hasNext()) {

                final Row phone = itr.next();

                System.out.println(phone.toJsonString(false));

                count++;

                /* ... */

            }

            System.out.println(count + " rows returned.");

        } catch (StoreIteratorException sie) {

            /* Handle exception.. */

        } finally {

            if (itr != null) {

                itr.close();

            }

        }

    }

}

The above example returns an iterator over the keys matching the manufacturers within the price range [200-500] supplied by the supplied by
iterator, if along with other details it also desired to retrieve the  images of all the phones then the images can be modeled as child table (for efficiency reasons) and the same can be retrieved in the single API call.

Summary

Oracle NoSQL
BulkGet API provides the most effective and per-format way to fetch the data in
bulk from Oracle NoSQL database. As demonstrated by the  YCSB runs, using this API you can expect between
a 2 and 3 times performance improvement for retrieving data in bulk

 

 

 

 

Join the discussion

Comments ( 12 )
  • guest Tuesday, January 5, 2016

    Excellent writeup!


  • Swati Wednesday, January 6, 2016

    Very informative. Thanks! . Is there anything to bulk inserts ?


  • Anand Chandak Wednesday, January 6, 2016

    @Swati, Yes, infact in the latest release we added the capability to do bulk inserts. I will be posting another blog very shortly. Are you looking for anything particular, please let me know.


  • guest Friday, February 10, 2017

    Hi,

    What if you want to retrieve a range say Date > and Date <.Date being the primary key of the table.

    Any pointers would be helpful.

    Thanks,

    Rajesh


  • guest Friday, February 10, 2017

    Hi,

    If I want to retrieve the date > and date < range of rows.Date being the primary key of the table.

    Thanks,

    Rajesh


  • Rajesh Friday, February 10, 2017

    Hi,

    If I want to retrieve date < and date > range of data(rows).Date being the primary key of the table.

    Thanks,

    Rajesh


  • Anand Friday, February 10, 2017

    @rajesh, yes you can use multi-row options to do range query on the primary key. read the java doc http://docs.oracle.com/cd/NOSQL/html/javadoc/index.html


  • Rajesh Monday, February 27, 2017

    Thanks Anand.


  • Rajesh Monday, February 27, 2017

    Hi,

    I am facing some issues when inserting the child record.

    I am having a Parent Table and its corresponding child table.

    After inserting to the parent table.Parent table has on primary key.

    I am then trying to insert the child table using the primary key along with the childs primary key.

    I am keeping it very simple for testing.

    I am just testing using java. I am using API getTableAPI().putIfAbsent(row, null, null);

    It is throwing NullPointer.

    Is it something related to the version? I am using kv-ee-4.3.10.

    Any help would be appreciated.

    Thanks & Regards,

    Rajesh


  • Rajesh Monday, February 27, 2017

    Hi,

    Do we need to override equals() and hashcode() for the models.If we have more than one primary keys?

    And we are using getTableAPI().putIfAbsent

    Thanks,

    Rajesh


  • Rajesh Tuesday, February 28, 2017

    Hi,

    While getting the record/data, can I use a non primary field along with the primary to get the data?

    In case of FieldRange , can I add a field along with the range keys?

    Thanks,

    Rajesh


  • Rajesh Tuesday, February 28, 2017

    Hi All,

    Observed a strange behaviour.Even though the primary key ,of the parent table ,value is not present in the primary table.Data is getting inserted in the child table.

    Note: I am using the primary key of the parent while inserting the data in child table.

    Just that the value I am inserting is not present in the parent table.

    It is allowing to insert the record.I believe it shouldn't allow to insert , in the child table if the primary key value is not there in the parent table.

    Appreciate your help on this.

    Thanks,

    Rajesh


Please enter your name.Please provide a valid email address.Please enter a comment.CAPTCHA challenge response provided was incorrect. Please try again.