Thursday Mar 12, 2015

When does "free" challenge that old adage, "You get what you pay for"?

This post generated a lot of attention and follow on discussion about the "numbers" seen in the post and related YCSB benchmark results.

Some readers commenting, "Only 26,000 tps on 3 machines? I can get 50,000+ on one."   and    

"Only 10,000 YCSB Workload B tps on 6 nodes? I can get more than that on one."

Thought it was worth stating the obvious, because sometimes what is perfectly clear to one person is completely opaque to another.   Numbers like "1 million tps" are meaningless without context.   A dead simple example to illustrate the point, I might be able to do 50K inserts of an account balance (50K txns) in half a second on a given server machine, but take that same server and try to insert 50K finger print images (50K txns) and if you can get that done in half a second, call me cause magic of that nature is priceless and we should talk. 

So for clarity,

[Read More]

Tuesday Dec 02, 2014

Using Nosql Tables with Spark

This post goal is to explain how to use Nosql tables and how to put their content into a file on hdfs using the java API for Spark. In hdfs, the table content will be presented in a comma separated style (CSV).

Oracle (latest) Big Data Appliance "X4-2", offers Cloudera Enterprise Technology software including Cloudera CDH, and Oracle NoSql database including tables.

The Cloudera part offers several ways of integration with Spark (see Using Nosql and Spark) : Standalone or via Yarn (see Running Spark Applications)

The Nosql part allows the use of tables. Tables can be defined within the Nosql console by issuing the following command:

java -Xmx256m -Xms256m -jar $KVHOME/lib/kvstore.jar runadmin -host <host> -port <store port> -store <store name>

There are two parts for defining and creating a table. Define which includes table name, table fields, primary key and shared-key which is a "prefix" of the primary key, ends with the keyword "exit"

table create -name flightTestExtract

add-field -name param -type STRING

add-field -name flight -type STRING

add-field -name timeref -type LONG

add-field -name value -type INTEGER

primary-key -field timeref -field param -field flight 

shard-key -field timeref

exit

Plan which allows table creation and index definition and creation:

plan add-table -wait -name flightTestExtract

plan add-index -wait -name flightIndex -table  flightTestExtract -field flight -field param -field timeref

plan add-index -wait -name paramIndex -table  flightTestExtract -field param -field flight -field timeref

Inserting into the table can be done by the put command as:

put table -name flightTestExtract -json "{\"param\":\"11\",\"flight\":\"8\",\"timeref\":61000000000002,\"value\":1764248535}"

put table -name flightTestExtract -json "{\"param\":\"12\",\"flight\":\"8\",\"timeref\":61000000000002,\"value\":-1936513330}"

put table -name flightTestExtract -json "{\"param\":\"11\",\"flight\":\"6\",\"timeref\":61000000000013,\"value\":1600130521}"

put table -name flightTestExtract -json "{\"param\":\"11\",\"flight\":\"8\",\"timeref\":61000000000013,\"value\":478674806}"

The last patch of Nosql, 3.1.7, has some new java classes that could be used to get table data into hadoop. The class oracle.kv.hadoop.table.TableInputFormat can be used as a Spark JavaRDD:

JavaPairRDD<PrimaryKey, Row> jrdd = sc.newAPIHadoopRDD(hconf, TableInputFormat.class, PrimaryKey.class, Row.class);

The oracle.kv.table.PrimaryKey.class correspond to the fields of the primary key of the table, for example in json style:

{"timeref":61000000000013, "param":"11","flight":"8"}

The oracle.kv.table.Row.class correspond to the fields of table row, for example in json style:

{"param":"11","flight":"8","timeref":61000000000013,"value":478674806}

If we want to save the content of the table on hdfs in a csv style we have to:

  • apply a flatMap on the rows of the RDD 
    flatMap(func) each input item can be mapped to 0 or more output items (so func should return a Seq rather than a single item). 
  • save the result on hdfs

The following inner class defines the map:

     static class FlatMapRow_Str implements FlatMapFunction<Row, String> {

        @Override

        public Iterable<String> call(Row s) {

            List<String> lstr = s.getFields();

            String tabedValues = "";

            for (String field : lstr)

                tabedValues += s.get(field) + ",";

            return Arrays.asList(tabedValues);

        }

    }

The code to do the job is: 

//Obtain the Row RDD       

JavaRDD<Row> rddvalues = jrdd.values();

//Obtain the csv style form of the RDD 

JavaRDD<String> csvStr = rddvalues .flatMap(new FlatMapRow_Str());

//Save the results on hdfs 

csvStr.saveAsTextFile(pathPrefix + "/" + tableName + "csvStr");

The last step is to test using Yarn:

spark-submit --master yarn --jars /u01/nosql/kv-ee/lib/kvclient.jar --class table.SparkNosqlTable2HadoopBlog /u01/nosql/kv-ee/examples/table/deploy/sparktables.jar <nosql store name> <nosql store url> <table name> <path prefix>

<nosql store url> is <store host>:<store port> 

You can get the java source code here

Tuesday Oct 21, 2014

Loading into Nosql using Hive

The main purpose of this post is to  show how strongly we can tied NoSql and Hive, the focus will be the upload of data into Nosql from Hive.

The post  (here) discussed about the use of Hive external tables to select data from Oracle Nosql. We used a HiveStorageHandle implementation. We have reworked on this implementation to load data from hdfs or a local system via Hive into Nosql. Only uploading of text data is currently supported.

Two kinds of data files can be uploaded:

Case 1: Files containing plain text data like the following comma separated lines:

  • 10,5,001,545973390
  • 10,5,010,1424802007
  • 10,5,011,164988888 

Case 2: Files containing a JSON field corresponding to a given AVRO schema like the following tab separated lines:

  •  10 5 173 {"samples": [{"delay": 0, "value": -914351026}, {"delay": 1, "value": 1842307749}, {"delay": 2, "value": -723989379}, {"delay": 3, "value": -1665788954}, {"delay": 4, "value": 91277214}, {"delay": 5, "value": 1569414562}, {"delay": 6, "value": -877947100}, {"delay": 7, "value": 498879656}, {"delay": 8, "value": -1245756571}, {"delay": 9, "value": 812356097}]}
  •  10 5 174 {"samples": [{"delay": 0, "value": -254460852}, {"delay": 1, "value": -478216539}, {"delay": 2, "value": -1735664690}, {"delay": 3, "value": -1997506933}, {"delay": 4, "value": -1062624313}]}

How to do it ?

1. Define the external table

2. Create and load a native Hive table

3. Insert into the external table a selection from the native Hive table

Case 1:

1.Define the external table

CREATE EXTERNAL TABLE MY_KV_PI_10_5_TABLE (flight string, sensor string, timeref string, stuff string)

      STORED BY 'nosql.example.oracle.com.NosqlStorageHandler'

      WITH SERDEPROPERTIES ("kv.major.keys.mapping" = "flight,sensor", "kv.minor.metadata" = "false", "kv.minor.keys.mapping" = "timeref", "kv.key.prefix" = "PI/10/5", "kv.value.type" = "string", "kv.key.range" = "", "kv.host.port" = "bigdatalite:5000", "kv.name" = "kvstore","kv.key.ismajor" = "true");

2. Create and load a native Hive table

CREATE TABLE kv_pi_10_5_load (flight string, sensor string, timeref string, stuff string) ROW FORMAT DELIMITED FIELDS TERMINATED BY '\054' STORED AS TEXTFILE;

LOAD DATA LOCAL INPATH '/home/oracle/hivepath/pi_10_5' OVERWRITE INTO TABLE kv_pi_10_5_load;

3. Insert into the external table a selection from the native Hive table

INSERT INTO TABLE my_kv_pi_10_5_table SELECT * from kv_pi_10_5_load;

The external table generation defines a major key and its complete key components, this definition is used when inserting, the flight, and sensor values of the data are ignored, timeref elements are loaded the Nosql operation API for batching the insertions.

Case 2:

1.Define the external table

CREATE EXTERNAL TABLE MY_KV_RI_10_5_TABLE (flight string, sensor string, timeref string, stuff string)

      STORED BY 'nosql.example.oracle.com.NosqlStorageHandler'

      WITH SERDEPROPERTIES ("kv.major.keys.mapping" = "flight,sensor", "kv.minor.metadata" = "false", "kv.minor.keys.mapping" = "timeref", "kv.key.prefix" = "RI/10/5", "kv.value.type" = "avro", "kv.key.range" = "","kv.key.ismajor" = "true", "kv.avro.schema" = "com.airbus.zihb.avro.SampleIntSet","kv.host.port" = "bigdatalite:5000", "kv.name" = "kvstore");

 When creating the external table used for upload into Nosql a new parameter is used "kv.avro.schema" = "com.airbus.zihb.avro.SampleIntSet"

It is the Nosql name for an avro schema. Talking about avro schema definition, its the schema namespace "." schema name. 

 2. Create and load a native Hive table

 CREATE TABLE kv_ri_10_5_load (flight string, sensor string, timeref string, stuff string) ROW FORMAT DELIMITED FIELDS TERMINATED BY '\011' STORED AS TEXTFILE;

LOAD DATA LOCAL INPATH '/home/oracle/hivepath/ri_10_5' OVERWRITE INTO TABLE kv_ri_10_5_load;

 3. Insert into the external table a selection from the native Hive table

 LOAD DATA LOCAL INPATH '/home/oracle/hivepath/ri_10_5' INTO TABLE my_kv_ri_10_5_table;

How to verify the upload ? 

Two possibilities:

  • a select query on Hive
  • a get on the kvstore

Let's do it on the Nosql client command line

Case 1: Verify a random line existence

 kv-> get kv  -key /PI/10/5/-/010 -all

/PI/10/5/-/010

1424802007

1 Record returned

Case 2: Verify a random line existence

kv-> get kv  -key /RI/10/5/-/173 -all
/RI/10/5/-/173
{
  "samples" : [ {
    "delay" : 0,
    "value" : -914351026
  }, {
    "delay" : 1,
    "value" : 1842307749
  }, {
    "delay" : 2,
    "value" : -723989379
  }, {
    "delay" : 3,
    "value" : -1665788954
  }, {
    "delay" : 4,
    "value" : 91277214
  }, {
    "delay" : 5,
    "value" : 1569414562
  }, {
    "delay" : 6,
    "value" : -877947100
  }, {
    "delay" : 7,
    "value" : 498879656
  }, {
    "delay" : 8,
    "value" : -1245756571
  }, {
    "delay" : 9,
    "value" : 812356097
  }

 ]

}

1 Record returned

Let's do it on the hive command line

Case 1: Verify a random line existence

select *  from MY_KV_PI_10_5_TABLE where timeref = "010";

OK

10 5 010 1424802007

Case 2: Verify a random line existence

hive> select *  from MY_KV_RI_10_5_TABLE where timeref = "173";

... 

OK

10 5 173 {"samples": [{"delay": 0, "value": -914351026}, {"delay": 1, "value": 1842307749}, {"delay": 2, "value": -723989379}, {"delay": 3, "value": -1665788954}, {"delay": 4, "value": 91277214}, {"delay": 5, "value": 1569414562}, {"delay": 6, "value": -877947100}, {"delay": 7, "value": 498879656}, {"delay": 8, "value": -1245756571}, {"delay": 9, "value": 812356097}]}

You can get a Jdeveloper 12c project here

We have done, a return trip between Nosql and Hive:

  1. Key value subsets of a Nosql database, can be viewed using the select query language of Hive 
  2. Data from Hive tables can be uploaded into Nosql key-value pairs

 

Monday Sep 15, 2014

Using multiDelete for efficient cleanup of old data

In a recent project one of our field engineers ( Gustavo Arango ) was confronted by a problem, he needed to efficiently delete millions of keys beneath a key space where he did not know the complete major key path, which ended in a time stamp.   He quickly discovered a way to efficiently find these major key paths and then use them to perform high speed multi-value deletions without causing unusual chattiness ( a call for each key-value pair ) on the network.   I thought it would be useful to review this solution here and give you a link to a Github example of working code to play with and understand the behavior. 

This is possible by using Oracle NoSQL's method storeKeysIterator( ) on your connection to the cluster.    This iterator can be used to obtain a list of store keys in a specific range, in a single call and without loading all of the value data beneath the keys:

First, you need a partial key path:

Key key = Key.createKey( "/root/pieceOfKey" );

Next you need a range: 

KeyRange kr = new KeyRange( "aastart" ,true, "zzfinish", true);

Now you can get your Key iterator  ( getKVStore( ) returns a connection handle to cluster, an instance of KVStore ):

Iterator<Key> iter = getKVStore().storeKeysIterator(Direction.UNORDERED, 0, key, kr, Depth.DESCENDANTS_ONLY);

So, this is nice as the storeKeysIterator will do a query to the cluster and return ONLY the keys that start with that partial path and optionally in the range specifier, no need to give a range if you want all the keys.    

Now, to efficiently delete many keys with a single call to the cluster, you need to have a full major key path.  So, now that you have the whole set of keys, you can ask them what is their full major key path and then use that information to do the efficient multiDelete operation, which in Gustavo's case of storing and managing dense time series data, meant millions of keys/values being deleted with only a small number of actual network calls and very little data being transferred across the network. 

boolean enditerate = false;  

          while(! enditerate){

              if(iter.hasNext()){

                Key iterkey = iter.next();

                iterkey = Key.createKey(iterkey.getMajorPath());

                int delNb = getKVStore(). multiDelete(iterkey, kr, Depth.PARENT_AND_DESCENDANTS, durability, 0, null) ;   

                res += delNb;

              }else{  enditerate = true; }


If you want to get tricky and do this in parallel, you can wrap this in a worker thread and place the initialization of the iterator inside the while loop, effectively treating the iterator like a queue.  It would cause a new query to happen every time the loop iterates, but in the mean time, a lot of other threads may have deleted half of that queue.

In fact, there are also parallel versions of the store iterator that will query the entire cluster in parallel using the number of threads that works best given your hardware configuration and the number of shards in your cluster.  You can learn more about in the online documentation.

If you are interested in playing with the ideas in this blog, there is a Github example project in Eclipse that will get you started. 

Monday May 12, 2014

Article on NoSQL Database by James Anthony (CTO - e-DBA)

http://www.otechmag.com/2014/otech-magazine-spring-2014/

James Anthony recently published this article about the latest release of Oracle NoSQL Database.  James does an excellent job describing the basic NoSQL concepts such as CAP theorem and ACID transactions.  His insights into how to use database systems and NoSQL systems are based on extensive experience building large production applications.

 Definitely worth a read.

Monday Dec 09, 2013

Look I shrunk the keys

Whether we use relational or non-relational database to durably persist our data on the disk, we all are aware of the fact that how indexes plays a major role in accessing the data in real time. There is one aspect most of us tend to overlook while designing the index-key i.e. how to efficiently size them.

Not that I want to discount the fact but in traditional databases where we used to store from hundreds of thousands to few million of records in a database, sizing the index-key didn’t come (that often) as a very high priority but in NoSQL database where you are going to persist few billion to trillions of records every byte saved in the key goes a long mile.

That is exactly what came up this week while working on one of the POC and I thought I should share with you the best practices and tricks of the trade that you can also use in developing your application. So here is my numero uno recommendation for designing the index Keys:

  • Keep it Small.

Now there is nothing new there that you didn't know already, right? Right but I just wanted to highlight it, so if there is anything you remember from this post then it is this one bullet item.

All right, here is what I was dealing with this week: couple billion records of telematics/spacial data that we needed to capture and query based on the timestamp (of the feed that was received) and x and y co-ordinates of the system. To run the kind of queries we wanted to run (on spacial data), we came up with this as an index-key:

/S/{timestamp}{x-coordinate}{y-coordinate}

How we used above key structure to run spacial queries is another blog post but for this one I would just say that when we plugged in the values to the variables our key became 24 bytes (1+13+5+5) long. Here’s how:

Table Prefix => type char = 1 byte (eg. S)

Timestamp => type long = 13 bytes (eg.1386286913165)

X co-ordinate => type string = 5 bytes (eg. 120.78 degree, 31.87 degree etc)

Y co-ordinate => type string = 5 bytes (eg. 132.78 degree, 33.75 degree etc)

With amount of hardware resource we had available (for POC) we could create 4 shards cluster only. So to store two billion records we needed to store (2B records/4 shards) 500 million records on each of the four shards. Using DBCacheSize utility, we calculated we would need about 32 GB of JE cache on each of the Replication Node (RN).

$java -d64 -XX:+UseCompressedOops -jar $KVHOME/lib/je.jar DbCacheSize -records 500000000 

-key 24 

=== Database Cache Size ===
 Minimum Bytes        Maximum Bytes          Description
---------------       ---------------        -----------
 29,110,826,240   32,019,917,056         Internal nodes only

But we knew that if we can shrink the key size (without losing the information) we can save lot of memory and can improve the query time (as search is a function of # of records and size of each record) as well. So we built a simple encoding program that uses range of 62 ASCII characters (0-1, a-z, A-Z) to encode any numeric digit. You can find the program from here or build your own but what is important to note here is that we were able to represent same information with less number of bytes:

13 Byte Timestamp (e.g. 1386286913165) became 7 byte (e.g. opc2BTn)

5 byte X/Y co-ordinate (e.g. 132.78) became 3 byte each (e.g. a9X/6dF)

i.e. 14 byte encoded key (1 + 7 byte + 3 byte + 3 byte). So what’s the fuss that we shrunk our keys (it’s just 10 bytes saving), you would ask? Well, we plugged in the numbers again to DBCacheSize utility and this time the verdict was that we needed only 20GB of JE cache to store same half a billion records on each RN. That’s 40% improvement (12GB of saving/Replication Node) and is definitely an impressive start.

$java -d64 -XX:+UseCompressedOops -jar $KVHOME/lib/je.jar DbCacheSize -records 500000000 
-key 14 

=== Database Cache Size ===
 Minimum Bytes        Maximum Bytes          Description
---------------       ---------------        -----------
 16,929,008,448       19,838,099,264         Internal nodes only

To conclude: you just seen how simple encoding technique can save you big time when you are dealing with billions of records. Next time when you design an index-key just think little harder on how you can shrink it down!

Thursday Oct 31, 2013

Oracle Social Analytics with the Big Data Appliance

Found an awesome demo put together by one of the Oracle NoSQL Database partners, eDBA, on using the Big Data Appliance to do social analytics.

In this video, James Anthony is showing off the BDA, Hadoop, the Oracle Big Data Connectors and how they can be used and integrated with the Oracle Database to do an end-to-end sentiment analysis leveraging twitter data.   A really great demo worth the view. 

Friday Oct 11, 2013

Accolades - Oracle NoSQL customers speak out with praise

For all of those participating in the Oracle NoSQL Database community and following the product evolution, there have been a number of changes emerging on Oracle OTN for the NoSQL Database.

In particular, on the main page Dave's Segleau's NoSQL Now presentation on Enterprise NoSQL is prominently displayed.  This is a great discussion on the trends involved in NoSQL adoption which highlights the most important aspects of NoSQL technology selection and what Oracle in particular is bringing to the movement.    Many of you know that for Oracle getting companies to speak up publicly on their use of our technology is much harder than it is for pure open source startups.  So, I am particularly pleased with the accolades starting to emerge from the users of Oracle NoSQL.   Plus, there is new content getting published every day to help our growing community to champion NoSQL technology adoption within their teams and organizations.

Starting to grow: I've noticed that our Meetup group is also gaining a lot of momentum.  We are now over 400 members strong and growing aggressively.   There is an awesome Meetup coming next week ( Oct 15th at Elance 441 Logue Avenue, Mountain ViewCA ) where Mike Olson, co-founder and Chief Strategy Officer of Cloudera will be talking about the virtues of NoSQL key-value stores.  There are already 88 people signed up for this event, so hurry up and join now or you may end up on a wait-list. 

Spread the word, tell your friends, an Enterprise backed NoSQL is on the move!!

Friday Oct 04, 2013

Flexible schema management - NoSQL sweetspot

I attended a few colleague sessions at Oracle Open World focusing on NoSQL Database use cases.   Dave Segleau from the Oracle NoSQL Database team did some work on the challenges associated with Web Scale personalization.   The main point he was emphasizing is that these personalization kind of applications have very simple data lookup semantics, but that the data itself is quite volatile in nature and comes in all shapes and sized making it difficult to store in traditional relational database technology.   The other challenges then follow, which are commonly involved in most NoSQL based applications, dealing with this data of variety at scale and in near real-time. Here are some references to those session which are worth a review:

http://www.slideshare.net/slideshow/embed_code/26877748

http://www.slideshare.net/slideshow/embed_code/26876561 

Then the other day, I stumbled upon this story about how Airlines are planning to provide a more personalized shopping experience in the travel process.  I could not help be see the parallels between the requirements found in the online shopping world and those found in ticketing within the Airline industries plans to roll out new personalized services to the travelers.   Clearly, this is a great application area to be considering the use of NoSQL Database technology.  Data variety, scale, responsiveness, all the ingredients that make for an ideal use case for employing NoSQL technology in the solution. 

Monday Jul 15, 2013

High-Five for Rolling Upgrade in Oracle NoSQL Database

In today’s world of e-commerce where businesses are operated 24/7, the equations for revenue made or lost are some times expressed in terms of latency i.e. the faster you serve your customer the more likely they are going to do the business with you. Imagine in this scenario (where every millisecond count) your online business going to be inaccessible for few minutes to couple of hours because you needed to apply an important patch or an upgrade.

I think you got an idea that how important it is to stay online and available even during the course of any planned hardware or software upgrade. Oracle NoSQL Database 12c R1 (12.1.2.1.8) release puts you on a track where you can upgrade your NoSQL cluster with no disruption to your business services. And it makes it possible by providing some smart administration tools that calculates the safest combination of storage nodes that can be brought down in parallel and upgraded keeping all the shards in the database available for read/write at all the time.

Let's take a look at a real world example. Say I have deployed a 9x3 database cluster i.e 9 shards, with 3 replica each shard (total of 27 replication-nodes) on 9 physical nodes. I got a highly available cluster (thanks to the intelligent topology feature shipped in 11gR2.2.0.23) with replicas of each shards spread across three physical nodes so that there is no single point of failure. All right so here is how my topology looks like:

[user@host09 kv-2.1.1]$ java -jar lib/kvstore.jar runadmin -port 5000  -host host01

kv-> ping
Pinging components of store mystore based upon topology sequence #136
mystore comprises 90 partitions and 9 Storage Nodes
Storage Node [sn1] on host01:5000    Datacenter: Boston [dc1]    Status: RUNNING   Ver: 12cR1.2.1.1
        Rep Node [rg2-rn1]      Status: RUNNING,REPLICA at sequence number: 45 haPort: 5012
        Rep Node [rg3-rn1]      Status: RUNNING,MASTER at sequence number: 41 haPort: 5013
        Rep Node [rg1-rn1]      Status: RUNNING,REPLICA at sequence number: 45 haPort: 5011
Storage Node [sn2] on host02:5000    Datacenter: Boston [dc1]    Status: RUNNING   Ver: 12cR1.2.1.1
        Rep Node [rg1-rn2]      Status: RUNNING,MASTER at sequence number: 45 haPort: 5010
        Rep Node [rg3-rn2]      Status: RUNNING,REPLICA at sequence number: 41 haPort: 5012
        Rep Node [rg2-rn2]      Status: RUNNING,REPLICA at sequence number: 45 haPort: 5011
Storage Node [sn3] on host03:5000    Datacenter: Boston [dc1]    Status: RUNNING   Ver: 12cR1.2.1.1
        Rep Node [rg2-rn3]      Status: RUNNING,MASTER at sequence number: 45 haPort: 5011
        Rep Node [rg3-rn3]      Status: RUNNING,REPLICA at sequence number: 41 haPort: 5012
        Rep Node [rg1-rn3]      Status: RUNNING,REPLICA at sequence number: 45 haPort: 5010
Storage Node [sn4] on host04:5000    Datacenter: Boston [dc1]    Status: RUNNING   Ver: 12cR1.2.1.1
        Rep Node [rg6-rn1]      Status: RUNNING,MASTER at sequence number: 41 haPort: 5012
        Rep Node [rg4-rn1]      Status: RUNNING,REPLICA at sequence number: 45 haPort: 5010
        Rep Node [rg5-rn1]      Status: RUNNING,REPLICA at sequence number: 45 haPort: 5011
Storage Node [sn5] on host05:5000    Datacenter: Boston [dc1]    Status: RUNNING   Ver: 12cR1.2.1.1
        Rep Node [rg6-rn2]      Status: RUNNING,REPLICA at sequence number: 41 haPort: 5012
        Rep Node [rg4-rn2]      Status: RUNNING,REPLICA at sequence number: 45 haPort: 5010
        Rep Node [rg5-rn2]      Status: RUNNING,MASTER at sequence number: 45 haPort: 5011
Storage Node [sn6] on host06:5000    Datacenter: Boston [dc1]    Status: RUNNING   Ver: 12cR1.2.1.1
        Rep Node [rg5-rn3]      Status: RUNNING,REPLICA at sequence number: 45 haPort: 5011
        Rep Node [rg4-rn3]      Status: RUNNING,MASTER at sequence number: 45 haPort: 5010
        Rep Node [rg6-rn3]      Status: RUNNING,REPLICA at sequence number: 41 haPort: 5012
Storage Node [sn7] on host07:5000    Datacenter: Boston [dc1]    Status: RUNNING   Ver: 12cR1.2.1.1
        Rep Node [rg9-rn1]      Status: RUNNING,MASTER at sequence number: 41 haPort: 5012
        Rep Node [rg7-rn1]      Status: RUNNING,REPLICA at sequence number: 45 haPort: 5010
        Rep Node [rg8-rn1]      Status: RUNNING,REPLICA at sequence number: 45 haPort: 5011
Storage Node [sn8] on host08:5000    Datacenter: Boston [dc1]    Status: RUNNING   Ver: 12cR1.2.1.1
        Rep Node [rg8-rn2]      Status: RUNNING,MASTER at sequence number: 45 haPort: 5011
        Rep Node [rg9-rn2]      Status: RUNNING,REPLICA at sequence number: 41 haPort: 5012
        Rep Node [rg7-rn2]      Status: RUNNING,REPLICA at sequence number: 45 haPort: 5010
Storage Node [sn9] on host09:5000    Datacenter: Boston [dc1]    Status: RUNNING   Ver: 12cR1.2.1.1
        Rep Node [rg7-rn3]      Status: RUNNING,MASTER at sequence number: 45 haPort: 5010
        Rep Node [rg8-rn3]      Status: RUNNING,REPLICA at sequence number: 45 haPort: 5011
        Rep Node [rg9-rn3]      Status: RUNNING,REPLICA at sequence number: 41 haPort: 5012


Notice that each storage node (sn1-sn9) is hosting one MASTER node and two REPLICA nodes from entirely different shards. Now if you would like to upgrade the active cluster to a latest version of Oracle NoSQL Database without a downtime then all you need to do is grab the latest binaries from the OTN and lay down the bits (NEW_KVHOME) on each of the 9 nodes (only once if you had a shared drive accessible from all the nodes). From the administration command-line-interface (CLI) simply perform 'show upgrade':

[user@host09 kv-2.1.8]$ java -jar lib/kvstore.jar runadmin -port 5000  -host host01

kv-> show upgrade
Calculating upgrade order, target version: 12.1.2.1.8, prerequisite: 11.2.2.0.23
sn3 sn4 sn7
sn1 sn8 sn5
sn2 sn6 sn9


SNs in each horizontal row represents the storage nodes that can be patched/upgraded in parallel and multiple rows represent the sequential order  i.e. you can upgrade sn3, sn4 & sn7 in parallel and once all three are done, you can then move to the next row (with sn1, sn8 & sn5) and so on and so forth. You must be asking what if you have fairly large cluster and you don't want to manually upgrade them, can you not automate this process by writing a script, well we have already done that for you as well. An example script is available for you to try out and can be found from:
<KVROOT>/examples/upgrade/onlineUpgrade


I hope you would find this feature useful just the way I do. If you need more details on this topic then I would recommend that you visit Upgrading an Existing Oracle NoSQL Database Deployment  in the Administrator's Guide. If you are new to Oracle NoSQL Database, get the complete product documentation from here and learn about the product from self paced web tutorials with some hands on exercises as well.

Wednesday Jul 10, 2013

EclipseLink JPA and Oracle NoSQL Database (ONDB)

Back in 2005, I was the Project Lead for JSR220-ORM tooling in Eclipse. Sun’s JSR220 project was the early POJO persistence standard for Java that became the EBJ 3.0 spec, the predecessor of the JPA Interface found today in every Java download.

In the same timeframe, Oracle announced it was joining the Eclipse Foundation and in that process launching a competing JSR220 tooling project called Dali. Needless to say, it did not take long for the JSR220-ORM and Dali project teams to merge into a single project. Eventually, the Oracle team took lead on Dali and marched into the future.

Here I am now 8 years later at Oracle helping drive the standardization of NoSQL technology. JPA presents a great abstraction layer for dealing with database persistence, allowing users of Java to persist their application objects with literally the push of a button.  Plus, when using JPA with a NoSQL database, it allows the developer to use a soft schema approach to application development, where the data model is driven from the application space rather than the database design and evolution of the application can occur much more rapidly.  In fact, in 2005 when I was V.P. Technology for a NoSQL Database company, one of the things we did was create a JPA interface for standards based access to our NoSQL store, the reason we launched that JSR220 tool project in Eclipse. So, I thought I would poke around a little bit with the Oracle NoSQL Database (ONDB) and JPA interfaces. To my surprise, I found that some folks had already made a great start down that path….pretty cool.   There is an EclipseLink plugin that supports NoSQL Databases , including ONDB.

[Read More]
About

This blog is about everything NoSQL. An open place to express thoughts on this exciting topic and exchange ideas with other enthusiasts learning and exploring about what the coming generation of data management will look like in the face of social digital modernization. A collective dialog to invigorate the imagination and drive innovation straight into the heart of our efforts to better our existence thru technological excellence.

Search

Categories
Archives
« May 2015
SunMonTueWedThuFriSat
     
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
      
Today