Monday Feb 01, 2016

Oracle NoSQL BulkPut

Introduction

Our customers have often asked us “what’s the fastest and most efficient way to insert large number of records in Oracle NoSQL database? “ Very recently, a shipping company reached out to us with specific requirement of using Oracle NoSQL database for their ship management application,which is used to track the movements of their container ships that moves the cargo from port to port. The cargo ships are all fitted with GPS and other tracking devices, which relays ship's location after few seconds into the application . The application is then queried for  1) The location of all the ships displayed on the map 2) A specific ship's trajectory over a given period of time displayed on the map too.As the volume of the location data started growing, company is finding hard to scale the application and is now looking at a back-end system that can ingest this large data-set very efficiently.

Historically, we have supported the option to execute a batch of operations for records that share the same shard key, which is what our large airline customer (Airbus) has done. They pre-sort the data by the shard key and then perform a multi-record insert when the shard key changes. Basically, rather than sending and storing a record at a time they can send a large number of records in a single operation. This certainly saved network trips, but they could only batch insert records that shared same shard key. With Oracle NoSQL Database release 3.5.2, we have added the ability to do a bulk insert or a bulk put records across different shards,allowing application developers to work more effectively with very large data-sets.

The BulkPut API is available for table as well as the key/Value data model. The API provides significant performance gains over single row inserts by reducing the network traffic round trips as well as by doing ordered inserts in batch on internally sorted data across different shards in parallel. This feature is release in a controlled fashion, so there aren’t java docs available for this API with this release, but we encourage you to use it and give us feedback.

API

KV interface: loads Key/Value pairs supplied by special purpose streams into the store.

public void put(List<EntryStream<KeyValue>> streams, BulkWriteOptions bulkWriteOptions)

Table interface: loads rows supplied by special purpose streams into the store.

public void put(List<EntryStream<Row>> streams, BulkWriteOptions bulkWriteOptions)
streams the streams that supply the rows to be inserted.
bulkWriteOptions non-default arguments controlling the behavior the bulk write operations

Stream Interface :

public interface EntryStream<E> {
String name();
E getNext();
void completed();
void keyExists(E entry);
void catchException(RuntimeException exception, E entry);
}


Performance

We ran the YCSB benchmark with the new Bulk-Put API on 3x3 (3 shards each with 3 copies of data) NoSQL Cluster running on bare metal servers, ingesting 50M records per shard or 150M records across the datastore, using 3 parallel thread per shard or total 9 ( 3x3) for the store and 6 parallel input streams per SN or total 54 ( 6 *9) across the store. The results for the benchmark run are shown in the graph below


The above graph compares the throughput (ops/sec) of Bulk vs Simple Put API with NoSQL store having 1000 partitions with durability settings of None and Simple Majority. As seen from the above charts there is over a 100% increase in throughput with either durability settings.

Sample Example

Here's link program uploaded to the github repository, the sample demonstrate how to use the BulkPut API in your application code. refer to the readme file for details related to the program execution.

Summary

If you are looking at bulk loading data into Oracle NoSQL Database the latest Bulk Put API provides the most efficient and fastest (as demonstrated by the YCSB) way to ingest large amount of data. Check it out now and download the latest version of the Oracle NoSQL Database at: www.oracle.com/nosql.

I'd like to thanks my colleague Jin Zhao for inputs on the performance numbers.


Monday Jan 04, 2016

Oracle NoSQL BulkGet API

Recently, we have been getting queries from our developer community and customers who wanted to know what is the most efficient way to retrieve large amounts of data in a single operation using Oracle NoSQL Database. An example of such a request would be on an eCommerce website, where potential customers want to retrieve all the phones in the price range $ 200 to $ 500 from Apple, Samsung, Nokia, Motorola (for example) and a host of other manufacturers to return all the details including the images of the product.

[Read More]

Tuesday Nov 17, 2015

Turn Big Data into Fast Data Using Oracle NoSQL and Fusion ioMemory

Prasad Venkatachar at SanDisk has been working with the Oracle NoSQL Database and with Fusion ioMemory to see how using flash affects the performance of various NoSQL workloads.  The Fusion ioMemory application accelerators deliver extreme performance when running a variety of workloads. Specifically for the Oracle NoSQL Database, the results show an amazing increase in performance compared to hard disk drives. The YCSB benchmark was run using different configurations adn the results are explained for a Write heavy workload (50 % write / 50 % read), a Read heavy workload (5 % write / 95 % read) and a read only workload (100 % read).  The latencies remained constant over larger workloads as did the performance, even as the amount of data grew. Read more details about using Fusion ioMemory by clicking here

Friday Nov 06, 2015

Oracle NoSQL Database demonstrates record performance with SPARC M7

Oracle's SPARC T7-4 server containing the SPARC M7 processor delivered 1.9 million ops/sec on 1.6 billion records for the Yahoo Cloud Serving Benchmark (YCSB) 95% read/5% update workload. Oracle NoSQL Database was used in these tests. NoSQL is important for Big Data Analysis and for Cloud Computing. The details can be found at the Performance blog.

Friday Oct 30, 2015

Oracle NoSQL Database Mentioned in Press Release

See the Oracle press release from October, 27, 2015, regarding the Oracle Cloud Platform for Big Data.  Read more here.

Friday Oct 09, 2015

Announcement - Oracle NoSQL Database 12.1.3.4.7

Oracle announced Oracle NoSQL Database 3.4.7 on Oct, 2015. This release offers new features including new commands to perform the Failover and SwitchOver operation when there's a loss of quorum, Bulk Get API to take input a list of primary keys and return all the records matching those keys, Off Heap Cache to allow users to allocate and use additional memory outside the limits of the Java heap, reducing the impact of Java garbage collection and support for the Big Data SQL and Apache Hive integration. Please read the full annoucments http://bit.ly/1qBUMuP. To download visit our download (http://bit.ly/1mpL2f9) page for the latest releaseless

Check the release presentation here

Thursday Aug 20, 2015

Invoking OracleNoSQL based Java application from PL/SQL

Recently, we ran into an interesting use-case with one of large supermarket customers, who wanted to take the output of a PL/SQL process and store that in  Oracle NoSQL Database to be be later consumed by one of their retail application - very quickly and something that can scale very well to support high volume of data that they are expected.  Oracle NoSQL DB is the obvious choice because it can provide a high throughput, low latency read/write operation and can scale to support large volume of data.

Coming to the integration, one of the highlights of the OracleN SQL Database is that it integrates really very well with other Oracle Tech Stack. The simplest way to write to Oracle NoSQL DB from a PL/SQL procedure is to call a  Java procedure that uses the native NoSQL DB API in order to insert data into the database and the simplest way to read from Oracle NoSQL DB in a stored procedure is to use an External Table in the query so that data from Oracle NoSQL DB can be passed to the Oracle Database query processor. There's another possible option to  use Golden Gate to move data from the Oracle Database to NoSQL DB. We have already blogged about the GoldenGate Integration, so in this blog I am going to focus on the Java Stored procedure based approach.

In case if you are not familiar with  Java Stored Procedure : A Java stored procedure essentially contains Java public static methods that are published to PL/SQL and stored in an Oracle database for general use. This allows a Java stored procedure to be executed from an application as if it were a PL/SQL stored procedure. When called by client applications, a Java stored procedure can accept arguments, reference Java classes, and return Java result values.

So, to help our customer, we created a POC that showcases this integration. Am listing down steps involved in this integration 

  1. First, create a NoSQL DB tables  that would store the data from Oracle Database
  2. Create a Java Application using the native NoSQL Driver to perform CRUD operation on NoSQL DB.
  3. Load the Java Application classes that we created in Step#2 in the oracle database using the load-java utility.
  4. Create a Java Store stored procedure that takes the data from the PL/SQL and updates NoSQL Database
  5. Next, publish Java stored procedures in the Oracle data dictionary. To do that, you write call specs, which map Java method names, parameter types, and return types to their SQL counterparts.
  6. Finally, call the Java store procedure from the PL/SQL Block to perform the updates.

The POC is available for download in a zip file from our OTN page (refer: The PL/SQL Integration in the Demo/Sample Program). The READ-ME file bundled in the zip has all the detailed steps and files needed for this integration.

With this approach, the NoSQL access is transparent to the Oracle DB application . NoSQL DB is an excellent choice here and using this Java Stored Procedure approach, the customer can exploit the advantages of BOTH repositories effectively and with better TCO.



Wednesday Aug 12, 2015

Migrating/Importing MongoDB Documents into Nosql Tables

Summary 

This paper presents a how to to migrate documents in MongoDB's collections into tables and child tables in Oracle Nosql. The idea is to take as example a relatively complex document, define a mapping file to map the basic fields of the document into a table,  and to map the embedded collections of the document into child tables. The java class that we provide will generate the Nosql structure of the tables and insert the data. The set of components of each element of the collection is inserted in the same operation into the store.

A Json example 

 Let's use an example of a family item from a MongoDB collection:

{ "_id" : ObjectId("55c4c6576e4ae64b5997d39e"),

"firstname" : "lena",

"lastname" : "clark",

"gender" : "W",

"childrens" : [

{ "name" : "bob",

"schools" : [ {"street" : "90, pine street","id" : "Saint volume"},

{"street" : "134, mice street","id" : "Saint appearance"}

],

"hobbies" : ["soccer","photo"]

},

{ "name" : "joseph",

"schools" : [ {"street" : "168, merely street","id" : "Saint slipped"} ],

"hobbies" : ["tennis","piano"]

},

{ "name" : "sandy",

"schools" : [{"street" : "227, thread street","id" : "Saint discovery"}],

"hobbies" : ["football","guitar"]

}

]

}

In this case the main document has the the following fields : '_id', 'firstname', 'lastname', 'gender' and childrens'. 'childrens' is an embedded collection, containing 'name', 'schools' and 'hobbies'. 'schools' is again a nested collection with 'street and 'id' fields and 'hobbies' is a list. We can map them into several nested tables:

  • the main table represents FAMILY,
  • FAMILY.CHILDREN  gets 'childrens' items and
  • FAMILY.CHILDREN.SCHOOLS and FAMILY.CHILDREN.HOBBIES store schools and hobbies information.

The mapping file 

The mapping file, is a properties file, it contains also connect information to access MongoDB database and Nosql store:

  • the name of the Nosql store: Nosql.Store=kvstore
  • the host and port of the nosql store: Nosql.URL=bigdatalite:5000
  • the mongodb host: MongoDB.host=localhost
  • the mongodb port: MongoDB.port=27017
  • the mongodb database: MongoDB.DB=gadb

Mapping principles

Define the main collection, its fields and its main table mapping

For each field define its type and its mapping value. Note that this can be a recursive step.

For each table define the primary key index components. 

Mapping extracts

Mapping collection and table with its primary keys

  • mongo.collection=family
  • mongo.collection.map=FAMILY
  • FAMILY.indexcols=LASTNAME,FIRSTNAME
indexcols is the keyword to introduce the comma separated list of columns of the key, order is important. The indexcol prefix is a Nosql table name

Family fields

  • family.fields=lastname,firstname,gender,childrens
  • family.firstname.type=string
  • family.firstname.map=FIRSTNAME
  • family.childrens.type=collection
  • family.childrens.map=CHILDREN
fields is the keyword to introduce the comma separated list of fields of a collection. For each field type corresponds to the type of a column in a Nosql table (string, integer, long, float, double or boolean are accepted). Two other values are used: array or collection. array is for lists of basic types, collection is for more complex collections. When  type is a basic type, map indicates a column of the mapped table, when the type is array or collection, map introduces a new table.

Children mappings

  • CHILDREN.indexcols=NAME
  • childrens.fields=name,schools,hobbies
  • childrens.name.type=string
  • childrens.name.map=NAME
  • childrens.schools.type=collection
  • childrens.schools.map=SCHOOLS
  • childrens.hobbies.type=array
  • childrens.hobbies.map=HOBBIES

School mappings 

  • schools.fields=street,id
  • schools.indexcols=ID
street and id are basic string fields, their type and map are not shown.

Hobbies mappings

  • hobbies.fields=hobbies
  • hobbies.hobbies.type=string
  • hobbies.hobbies.map=HOBBY
  • HOBBIES.indexcols=HOBBY

children.hobbies is an array of strings mapped to child table HOBBIES, there is no name in the main collection for the field, I've chosen to use hobbies (the name of the collection) as the field name to be able to define a mapping. 

Tables generated

Get child tables from FAMILY   

kv-> show tables -parent FAMILY

Tables: 

FAMILY.CHILDREN

 FAMILY.CHILDREN.HOBBIES

 FAMILY.CHILDREN.SCHOOLS

Get table indexes

kv-> show indexes -table FAMILY

Indexes on table FAMILY

FAMILYIndex (LASTNAME, FIRSTNAME)

kv-> show indexes -table FAMILY.CHILDREN

Indexes on table FAMILY.CHILDREN

CHILDRENIndex (NAME)

kv-> show indexes -table FAMILY.CHILDREN.SCHOOLS

Indexes on table FAMILY.CHILDREN.SCHOOLS

SCHOOLSIndex (ID)

kv-> show indexes -table FAMILY.CHILDREN.HOBBIES

Indexes on table FAMILY.CHILDREN.HOBBIES

HOBBIESIndex (HOBBY) 

Getting data from tables

Get our example family

kv-> get table -name FAMILY -field LASTNAME -value "clark" -field FIRSTNAME -value "lena"

{"FIRSTNAME":"lena","LASTNAME":"clark","GENDER":"W"}

Get our family children

kv-> get table -name FAMILY.CHILDREN -field LASTNAME -value "clark" -field FIRSTNAME -value "lena"

{"LASTNAME":"clark","FIRSTNAME":"lena","NAME":"bob"}

{"LASTNAME":"clark","FIRSTNAME":"lena","NAME":"joseph"}

{"LASTNAME":"clark","FIRSTNAME":"lena","NAME":"sandy"}

Get our family children schools

kv-> get table -name FAMILY.CHILDREN.SCHOOLS -field LASTNAME -value "clark" -field FIRSTNAME -value "lena"

{"LASTNAME":"clark","FIRSTNAME":"lena","NAME":"bob","STREET":"134, mice street","ID":"Saint appearance"}

{"LASTNAME":"clark","FIRSTNAME":"lena","NAME":"bob","STREET":"90, pine street","ID":"Saint volume"}

{"LASTNAME":"clark","FIRSTNAME":"lena","NAME":"joseph","STREET":"168, merely street","ID":"Saint slipped"}

{"LASTNAME":"clark","FIRSTNAME":"lena","NAME":"sandy","STREET":"227, thread street","ID":"Saint discovery"} 

Get our family children hoobies

kv-> get table -name FAMILY.CHILDREN.HOBBIES -field LASTNAME -value "clark" -field FIRSTNAME -value "lena"

{"LASTNAME":"clark","FIRSTNAME":"lena","NAME":"bob","HOBBY":"photo"}

{"LASTNAME":"clark","FIRSTNAME":"lena","NAME":"bob","HOBBY":"soccer"}

{"LASTNAME":"clark","FIRSTNAME":"lena","NAME":"joseph","HOBBY":"piano"}

{"LASTNAME":"clark","FIRSTNAME":"lena","NAME":"joseph","HOBBY":"tennis"}

{"LASTNAME":"clark","FIRSTNAME":"lena","NAME":"sandy","HOBBY":"football"}

{"LASTNAME":"clark","FIRSTNAME":"lena","NAME":"sandy","HOBBY":"guitar"}

Running the example

jar files needed

MongoJavaDriver: mongo-java-driver-3.0.0.jar (this is the version we have used)

Nosql client: kvclient.jar (it should be a version containing tables 12.3+, it had been tested with 3.3.4)

main java class : mongoloadnosql.MongoDB2Nosql (java source code is here)

Parameters

The tool has 5 parameters:

  • -limit <integer>, number of documents to load, 0 is for all the documents
  • -skip <integer>, offset of the first document to load, similar to the skip function in MongoDB, 0 means no skip of documents
  • -mapfile <file>, properties file to load
  • -create [true|<anything else than true>], if true the Nosql API functions for creation of tables and indexes are issued
  • -insert [true|<anything else than true>], if true the Nosql API functions  for insertion are issued

Launch command

This command creates nosql tables and indexes if there do not exists, and insert the whole collection items using the properties file  mappingfam.properties:

java -classpath <tool_dir>/classes:<KVHOME>/lib/kvclient.jar:<MONGODB_CLASSPATH>/mongo-java-driver-3.0.0.jar mongoloadnosql.Mongo2Nosql -limit 0 -skip 0 -mapfile mappingfam.properties -create true -insert true

Caveats and warnings

Actually there is no possibility to map  MongoDB references (neither referenced relationships nor DBRefs)

Fields should be presented in the order defined by their primary keys (lastname,firstname) instead of (firstname,lastname) 

The java code attached is just  to illustrate how to  import/migrate MongoDB data into Nosql tables in an efficient and consistent way, it has not been tested in all kinds of situations and it is not intended to be free of bugs.

Bonus

The following mapping file, allows to map MongoDB documents having the structure of a post.  

In this case there is an embedded object "origine" which is defined as: {"owner" : "gus","site" : "recent_safety.com"}) which is not a collection.

There is no primary key other than the MongoDB '_id' field.

Enjoy trying this example also.

Sunday Jun 14, 2015

Uploading NoSQL tables from Golden Gate User Exits

Golden Gate and NoSQL Integration

The aim of this post is to illustrate how to use Golden Gate to stream relational transactions to Oracle NoSQL 12.3.4. We follow the structure of the post which illustrated how to use Golden Gate to stream data into HBase. 

 As shown in the diagram below, integrating database with NoSQL is accomplished by developing a custom handler using Oracle GoldenGate's  and NoSQL's Java APIs.

The custom handler is deployed as an integral part of the Oracle GoldenGate Pump process.   The Pump process and the custom adapter are configured through the Pump parameter file and custom adapter's properties file. The Pump reads the Trail File created by the Oracle GoldenGate Capture process and passes the transactions to the adapter. Based on the configuration, the adapter writes the transactions into NoSQL table.

You can find the Java code for the handler at this Github repository in folder StreamFromGG.

The steps to generate and test an example are:

  1. Prepare the database to stream data from a table
  2. Define a NoSQL table
  3. Define the pump extract parameter file from the data base
  4. Define  the extract parameter file and the adapter properties file for NoSQL
  5. Register the extract process
  6. Start the GoldenGate extract processes
  7. Do some data manipulation on the data base and verify the content on the NoSQL table

Lets take an example

Prepare the database to stream data from a table

This part is not detailed lets say that a database user is defined on Oracle 12c to allow Golden Gate transnational streaming. The database parameters are set to log transnational SQL commands in the appropriate way to satisfy Golden Gate requirements.

We will focus on the database table T2 from the gha schema, whose definition is:

CREATE TABLE "GHA"."T2" 

   ( "ID" NUMBER, 

"CREATED" TIMESTAMP (6), 

"NOM" VARCHAR2(32 BYTE), 

"VILLE" VARCHAR2(128 BYTE), 

CONSTRAINT "PK_T2" PRIMARY KEY ("ID", "CREATED")

Define a NoSQL table

 After connecting to the NoSQL store the following commands create the table T2:

table create -name T2

# Add table fields

add-field -name ID -type STRING

add-field -name NOM -type STRING

add-field -name CREATED -type STRING

add-field -name VILLE -type STRING

# Assign a field as primary key

primary-key -field ID -field CREATED

shard-key -field ID

exit

# Add table to the database

plan add-table -wait -name T2

Define the pump extract parameter file from the data base

The extract for the database requires previously the use of defgen  utility to create what is called a data definition file which contains the definition of the source table.

The content of the extract parameter's file is:

EXTRACT E_ghat2

TARGETDEFS ./dirsql/t2.sql

SETENV (ORACLE_SID=cdb)

userid c##ogg, password ogg

exttrail /u01/ogg/dirdat/T2

GETUPDATEBEFORES

table orcl.gha.t2  TARGET gha.t2;

The extract name is E_ghat2, the table definition file  is t2.sql, the oracle user for the transnational streaming is c##ogg, trail files generated are prefixed with T2, the container of the schema gha is orcl.

Define  the extract parameter file  and the adapter properties file for NoSQL

 When using GoldenGate java adapters, there are two files, one defines the extract parameters, the other gives the java specific properties for the adapter (the default name for this file is <extract_name>.properties, if a different name is used it should be given on the extract parameters. Our extract name is nosqlt2. Par of the content of nosqlt2.properties is:

jvm.bootoptions= -Xms64m -Xmx512M -Dlog4j.configuration=log4j.properties -Djava.class.path=dirprm:/u01/nosql/kv-ee/lib/jackson-core-asl.jar:/u01/nosql/kv-ee/lib/jackson-mapper-asl.jar:/u01/nosql/kv-ee/lib/avro.jar:/u01/ogg/ggjava/oggnosql.jar:/u01/nosql/kv-ee/lib/kvclient.jar:/u01/ogg/ggjava/ggjava.jar:/usr/lib/hadoop/client/commons-configuration-1.6.jar:/etc/hadoop/conf:/usr/lib/hadoop/client/commons-cli.jar

#Nosql Handler.

gg.handlerlist=nosqlhandler

gg.handler.nosqlhandler.type=com.goldengate.delivery.handler.nosql.NosqlHandler

gg.handler.nosqlhandler.NosqlStore=kvstore

gg.handler.nosqlhandler.NosqlUrl=bigdatalite:5000

gg.handler.nosqlhandler.NosqlTable=T2

gg.handler.nosqlhandler.NosqlCols=ID,CREATED,NOM,VILLE

gg.handler.nosqlhandler.NosqlPKCols=ID,CREATED

gg.handler.nosqlhandler.NosqlShardCols=ID

gg.handler.nosqlhandler.NosqlMappings=ID,ID;CREATED,CREATED;NOM,NOM;VILLE,VILLE

The meaning of these properties is: 

  • jvm.bootoptions, gives the path for the nosql java classes including json data managing and the jar for nosql adapter
  • gg.handlerlist, gives the list of handlers in this case noqsqlhandler will be used to identify the properties
  • gg.handler.nosqlhandler.type, gives the class used as adapter
  • gg.handler.nosqlhandler.NosqlStore, gives the name of the  Nosql store to connect to
  • gg.handler.nosqlhandler.NosqlUrl, gives the nosql store url (hostname:port)
  • gg.handler.nosqlhandler.NosqlTable, gives the name of the table
  • gg.handler.nosqlhandler.NosqlCols, gives a comma separated list of the Nosql Table columns
  • gg.handler.nosqlhandler.NosqlPKCols, gives a comma separated list of the Nosql Table primary key columns
  • gg.handler.nosqlhandler.NosqlShardCols, gives a comma separated list of the Nosql Table shard columns (should be a non void subset of the primary key columns)
  • gg.handler.nosqlhandler.NosqlMappings, gives a semi-colon separated list of mapping pairs (source column:target column)

The adapter implementation of NoSQL data manipulation, delete, update, create uses the shard column values to batch operations into the NoSQL database. The execution of the batched operations is done only when the shard stored value changes.

Register the extract process

 Use ggsci utility to issue the following commands (replacing <OGG_HOME> with it's real value): 

add extract E_GHAT2 ,integrated tranlog, begin now

add exttrail  <OGG_HOME>/dirdat/T2 , extract E_GHAT2 , megabytes 10

register extract E_GHAT2  database container (orcl)

add extract NOSQLT2, exttrailsource <OGG_HOME>/dirdat/T2 

 Start the GoldenGate extract processes

 Use ggsci utility to start e_ghat2 and nosqlt2

 Verify that the process are running:

GGSCI (bigdatalite.localdomain) 1> info all


Program     Status      Group       Lag at Chkpt  Time Since Chkpt


MANAGER     RUNNING                                           

EXTRACT     RUNNING     E_GHAT2     00:00:02      00:01:07    

EXTRACT     RUNNING     NOSQLT2     49:22:05      00:01:56    

Data manipulation and verifications between Oracle 12c and NoSQL

Get the table count on NoSQL

kv-> aggregate table -name t2  -count

Row count: 4329

Delete data from t2 on Oracle 12c

delete t2 where id = 135

commit 

2 rows deleted 

Recompute the table count on NoSQL

kv-> aggregate table -name t2  -count

Row count: 4327

Note that last batch of NoSQL operations is flushed when the extract nosqlt2 is stopped 

Thursday Mar 19, 2015

Forrester Wave places NoSQL Database among the leaders

We are very pleased that Oracle NoSQL Database has been recognized as one of the leaders in the key-value NoSQL product category by Forrester Research.  Please see http://www.oracle.com/us/corporate/analystreports/forrester-wave-nosql-2348063.pdf for the full report. 

In the past few years, we’ve witnessed growing adoption of NoSQL technologies to address specific data management problems.   In many of the early adopter scenarios, the NoSQL applications were developed and managed as self-contained, standalone repositories of semi-structured data.

In recent months, it has become clear that such data silos are very expensive to implement and maintain.  Big data and NoSQL users now understand that well integrated NoSQL and SQL systems are the key to effective data management in today’s world.  An integrated set of products for managing NoSQL and relational data is critical for delivering business value in a cost effective manner. Oracle NoSQL Database is fully integrated with Oracle Database and related technologies, thus making it an excellent choice for enterprise-grade, mission-critical NoSQL applications.  As mentioned in the Forrester Wave report, “Many Oracle customers use Oracle NoSQL to balance the need for scale-out workloads of simpler key-value data, with the rich set of relational data management capabilities needed in their core business systems, or when supporting new applications that have frequently changing key-value data, such as profiles for fraud, personalization, and sensor data management”.

Thursday Mar 12, 2015

When does "free" challenge that old adage, "You get what you pay for"?

This post generated a lot of attention and follow on discussion about the "numbers" seen in the post and related YCSB benchmark results.

Some readers commenting, "Only 26,000 tps on 3 machines? I can get 50,000+ on one."   and    

"Only 10,000 YCSB Workload B tps on 6 nodes? I can get more than that on one."

Thought it was worth stating the obvious, because sometimes what is perfectly clear to one person is completely opaque to another.   Numbers like "1 million tps" are meaningless without context.   A dead simple example to illustrate the point, I might be able to do 50K inserts of an account balance (50K txns) in half a second on a given server machine, but take that same server and try to insert 50K finger print images (50K txns) and if you can get that done in half a second, call me cause magic of that nature is priceless and we should talk. 

So for clarity,

[Read More]

Monday Jan 12, 2015

Big Data SQL for Oracle NoSQL Database (part 1)

Introduction

We recently announced Oracle Big Data SQL support for Oracle NoSQL Database. This feature will allow Oracle Database users to connect to external data repositories like Oracle NoSQL Database, and/or Hadoop, and will enable them to fetch data from any or all of the repositories (at once) through single SQL query. We believe this a significant step forward as it streamlines the adoption of big data technologies with traditional technology stack.

If you are following big data space then you would agree that everyone likes the scalability, availability, performance at cost aspect of it but almost everyone struggles to modify their enterprise application to work with big data technologies because they don't talk the same language as traditional database systems do.

For innovations to happen fast what is required is that we lower down the adoption barrier for the developers & the data scientists who have been at the core of this data revolution. And by providing same SQL declarative language for relational and non-relational data we have made it simpler for everyone to adopt these big data technologies.

SQL on Oracle NoSQL Database

Having an access layer using a declarative language is definitely a great thing but a common trend these days is that every NoSQL vendor is writing a custom SQLike layer. Now if you want to use your existing application to access relational data as well as non-relational data stored in the NoSQL database then you can not write a single SQL query to access these two separate stores, which again makes it complicated from application development and manageability point of view.

Oracle Big Data SQL for Oracle NoSQL Database fills this gap by providing exactly the same SQL access layer to both the stores and therefore the queries would look exactly the same no matter which store you access. To achieve this, Oracle Big Data SQL requires that Oracle NoSQL Database support the ability to access its data via Hive. Thus, plugging Oracle NoSQL Database to Oracle Database is as simple as performing 1 (Creating a NoSQL Store and a Table), 2  (Configuring Hive to access NoSQL Table) & 3 (Configuring Oracle Database to talk to HCatalog) steps.

Please note that Big Data SQL provides much better query performance over running a query directly on a generic external tables because with former approach, computation (for selecting, & formatting data) is moved where the data is as apposed to moving all the data to the Oracle Database node first and then applying the predicates. The same proprietary cell technology (developed for Exadata) is used for applying the query predicates in parallel to all the distributed NoSQL partitions, making the overall query to run faster.

In rest of the article I will give you the technical details on how to perform first two steps so you can get the hands-on experience of this feature.



Figure 1: Flow diagram showing steps to access data in Oracle NoSQL Database using hive queries.


Setup



Step 1: Create a TABLE in Oracle NoSQL Database and load some data that we would like to access from HIVE later on.


  • For this demonstration we will be using KVLITE but you can very well deploy a real cluster by following steps covered in this webinar. Please remember to set KVHOME as the environment variable that will point to the directory where you unzipped the latest Oracle NoSQL Database binaries (download here) and KVROOT (eg. /u01/kvroot) where you would like metadata files to be stored.

$java -jar $KVHOME/lib/kvstore.jar kvlite -root $KVROOT -host localhost &


  • Once you have Oracle NoSQL Database cluster deployed (in our case it is single instance), connect to administration shell using runadmin command and then from the KV prompt connect to the store (kvstore in my case).

$java -jar $KVHOME/lib/kvstore.jar runadmin -port 5000 -host localhost

kv-> CONNECT STORE -name kvstore 

  • Create MOVIE table with few columns (to capture movie specific information)

kv->EXEC "CREATE TABLE MOVIE( \
             id INTEGER, \
             original_title STRING,\
             overview STRING,\
             poster_path STRING,\
             release_date STRING,\
             vote_count INTEGER,\
             runtime INTEGER,\
             popularity DOUBLE,\
             genres STRING,\
             PRIMARY KEY (id) )" 

  • SHOW table to confirm it is created successfully & DESCRIBE to make sure all the columns types are appropriate as well.

kv-> EXEC 'SHOW TABLES' 

kv-> EXEC 'DESC AS JSON TABLE MOVIE'


  • Next we will load some MOVIE data into the table. You can download a movie.dat file with details of about 10 movies represented as JSON string. We will directly load these strings into MOVIE table.

kv-> PUT TABLE -name MOVIE -file /tmp/movie.dat
Inserted 10 rows in movie table

  • You can now COUNT the rows and GET the rows to make sure data is indeed there.

kv-> AGGREGATE TABLE -name MOVIE -count
Row count: 10

 

KV-> GET TABLE -name MOVIE


Step 2: Create Hive External Table using Storage Handler for Oracle NoSQL Database (that got shipped in release 3.2.5) .

  • Before we begin doing anything with Hive we first need to install Hadoop on our system.
  • After Hadoop being installed and tested, download the Hive archive from here. I am using apache-hive-0.14.0 at the time of this article but you can download any latest version. Unzip the archive and create environment variable HIVE_HOME to point to this directory.
  • Here are some of the environment variables that are set in my ~/.bashrc file:

export HADOOP_HOME=/u01/home/hadoop/hadoop-2.2.0
export HADOOP_MAPRED_HOME=$HADOOP_HOME
export HADOOP_COMMON_HOME=$HADOOP_HOME
export HADOOP_HDFS_HOME=$HADOOP_HOME
export YARN_HOME=$HADOOP_HOME
export HADOOP_CONF_DIR=$HADOOP_HOME/etc/hadoop
export HADOOP_COMMON_LIB_NATIVE_DIR=$HADOOP_HOME/lib/native
export HADOOP_OPTS="-Djava.library.path=$HADOOP_HOME/lib"

export KVHOME=/u01/nosql/kvhome

export HIVE_HOME=/u01/home/hadoop/apache-hive-0.14.0
export HIVE_AUX_JARS_PATH=$KVHOME/lib/kvclient.jar

export PATH=$PATH:$HOME/bin:$JAVA_HOME/bin:$HIVE_HOME/bin:\
        $HADOOP_HOME/bin:$HADOOP_HOME
 

  •  NOTICE that we have HIVE_AUX_JARS_PATH set to kvclient.jar. We need to set this variable to make sure Hive can find Storage Handler for Oracle NoSQL Database from the client driver itself.
  • Now the only thing that is left is creating an external table to mimic the table exist in Oracle NoSQL Database. Let's connect to hive first and then create the table using this DDL:

$hive

hive-> CREATE EXTERNAL TABLE IF NOT EXISTS 
         MOVIE( id INT, 
                original_title STRING,
                overview STRING, 
                poster_path STRING,
                release_date STRING, 
                vote_count INT, 
                runtime INT, 
                popularity DOUBLE, 
                genres STRING) 
         STORED BY 'oracle.kv.hadoop.hive.table.TableStorageHandler' 
         TBLPROPERTIES  ( "oracle.kv.kvstore"="kvstore",
                          "oracle.kv.hosts"="localhost:5000",
                          "oracle.kv.hadoop.hosts"="localhost",
                          "oracle.kv.tableName"="MOVIE");

Time taken: 0.174 seconds

          Note: STORE BY field is where we define the StorageHandler class for Oracle NoSQL Database and TBLPROPERTIES is where we define name of the store, host name and port where storage node agent is listening at (in our case we have only node), and the name of the table in NoSQL DB. We have MOVIE as the table there so that is what we use as oracle.kv.tableName  field.

  • That is all one need to do to configure Hive to work with Oracle NoSQL Database. Pretty simple, right ? 


Step 2b: Now that we have NoSQL DB & Hive setup done, we just need to test running hive queries.

  • Here are some of the simple queries to fetch MOVIE data using Hive Query Language (HQL):

hive> SELECT id, original_title, release_date, genres 
        FROM MOVIE;

hive> SELECT id, original_title, release_date, genres 
        FROM MOVIE 
        WHERE release_date >1990;

hive> SELECT id, original_title, release_date, genres 
        FROM MOVIE 
        WHERE genres LIKE '%Comedy%'; 

  • Some other queries that would require M/R jobs to be spun off to get the results:

hive> SELECT count(*) 
        FROM MOVIE 
        WHERE genres LIKE '%Drama%'; 

hive> SELECT id, original_title, release_date, genres 
        FROM MOVIE 
        WHERE genres LIKE '%Drama%' 
        ORDER BY release_date;

hive> SELECT MAX(release_date) 
        FROM MOVIE;



Summary

Using the Hive StorageHandler from latest release, one can access NoSQL Data as hive external tables. The metadata of these external tables are stored in the HCatalog, which is later referenced by Oracle Big Data SQL to decide where to dispatch the queries in parallel. The process of configuring Oracle NoSQL Database has become very simple (using the out of the box hive StorageHandler) that it takes only couple of steps to complete.

In the part 2 of the blog I will talk about how hive metastore can be configured with Oracle Database to finally run SQL queries on NoSQL Data.

Tuesday Jan 06, 2015

Announcement: Oracle NoSQL Database 3.2.5

Oracle announced Oracle NoSQL Database 3.2.5 on December 8, 2014. This release offers Big Data SQL support, RESTful API, C Table Driver, SQL like DDL, Apache Hive support and much more. Please read the full announcement to learn more and to download latest release click here.

Tuesday Dec 02, 2014

Using Nosql Tables with Spark

This post goal is to explain how to use Nosql tables and how to put their content into a file on hdfs using the java API for Spark. In hdfs, the table content will be presented in a comma separated style (CSV).

Oracle (latest) Big Data Appliance "X4-2", offers Cloudera Enterprise Technology software including Cloudera CDH, and Oracle NoSql database including tables.

The Cloudera part offers several ways of integration with Spark (see Using Nosql and Spark) : Standalone or via Yarn (see Running Spark Applications)

The Nosql part allows the use of tables. Tables can be defined within the Nosql console by issuing the following command:

java -Xmx256m -Xms256m -jar $KVHOME/lib/kvstore.jar runadmin -host <host> -port <store port> -store <store name>

There are two parts for defining and creating a table. Define which includes table name, table fields, primary key and shared-key which is a "prefix" of the primary key, ends with the keyword "exit"

table create -name flightTestExtract

add-field -name param -type STRING

add-field -name flight -type STRING

add-field -name timeref -type LONG

add-field -name value -type INTEGER

primary-key -field timeref -field param -field flight 

shard-key -field timeref

exit

Plan which allows table creation and index definition and creation:

plan add-table -wait -name flightTestExtract

plan add-index -wait -name flightIndex -table  flightTestExtract -field flight -field param -field timeref

plan add-index -wait -name paramIndex -table  flightTestExtract -field param -field flight -field timeref

Inserting into the table can be done by the put command as:

put table -name flightTestExtract -json "{\"param\":\"11\",\"flight\":\"8\",\"timeref\":61000000000002,\"value\":1764248535}"

put table -name flightTestExtract -json "{\"param\":\"12\",\"flight\":\"8\",\"timeref\":61000000000002,\"value\":-1936513330}"

put table -name flightTestExtract -json "{\"param\":\"11\",\"flight\":\"6\",\"timeref\":61000000000013,\"value\":1600130521}"

put table -name flightTestExtract -json "{\"param\":\"11\",\"flight\":\"8\",\"timeref\":61000000000013,\"value\":478674806}"

The last patch of Nosql, 3.1.7, has some new java classes that could be used to get table data into hadoop. The class oracle.kv.hadoop.table.TableInputFormat can be used as a Spark JavaRDD:

JavaPairRDD<PrimaryKey, Row> jrdd = sc.newAPIHadoopRDD(hconf, TableInputFormat.class, PrimaryKey.class, Row.class);

The oracle.kv.table.PrimaryKey.class correspond to the fields of the primary key of the table, for example in json style:

{"timeref":61000000000013, "param":"11","flight":"8"}

The oracle.kv.table.Row.class correspond to the fields of table row, for example in json style:

{"param":"11","flight":"8","timeref":61000000000013,"value":478674806}

If we want to save the content of the table on hdfs in a csv style we have to:

  • apply a flatMap on the rows of the RDD 
    flatMap(func) each input item can be mapped to 0 or more output items (so func should return a Seq rather than a single item). 
  • save the result on hdfs

The following inner class defines the map:

     static class FlatMapRow_Str implements FlatMapFunction<Row, String> {

        @Override

        public Iterable<String> call(Row s) {

            List<String> lstr = s.getFields();

            String tabedValues = "";

            for (String field : lstr)

                tabedValues += s.get(field) + ",";

            return Arrays.asList(tabedValues);

        }

    }

The code to do the job is: 

//Obtain the Row RDD       

JavaRDD<Row> rddvalues = jrdd.values();

//Obtain the csv style form of the RDD 

JavaRDD<String> csvStr = rddvalues .flatMap(new FlatMapRow_Str());

//Save the results on hdfs 

csvStr.saveAsTextFile(pathPrefix + "/" + tableName + "csvStr");

The last step is to test using Yarn:

spark-submit --master yarn --jars /u01/nosql/kv-ee/lib/kvclient.jar --class table.SparkNosqlTable2HadoopBlog /u01/nosql/kv-ee/examples/table/deploy/sparktables.jar <nosql store name> <nosql store url> <table name> <path prefix>

<nosql store url> is <store host>:<store port> 

You can get the java source code here

Monday Nov 10, 2014

Is your NoSQL ready for SSD - benchmark results from SanDisk

It’s well known that the killer NoSQL use case is scalability for large data sets under high concurrency.   However, what is less talked about is the fact that performance is often the key driver. Given a high performance profile, NoSQL then provides the ability to scale that performance out linearly by adding more servers.   If not for the high performance profile to begin with, often times an alternative technology like a distributed cache would be used instead, fronting a more traditional database technology.

Given that performance is often a key driver, it’s no surprise to find that flash based storage devices are increasingly common for NoSQL deployments.  In fact, popular cloud providers of NoSQL services (Amazon and Google) provide those services exclusively on SSD backed infrastructure.  The SSD form factor gives not only the best performance, but also more manageable service level agreements, by taking disk I/O bottlenecks out of the equation.  At Oracle, we are also seeing customers deploy the Oracle NoSQL Database on servers backed by SSD storage.

[Read More]
About

This blog is about everything NoSQL. An open place to express thoughts on this exciting topic and exchange ideas with other enthusiasts learning and exploring about what the coming generation of data management will look like in the face of social digital modernization. A collective dialog to invigorate the imagination and drive innovation straight into the heart of our efforts to better our existence thru technological excellence.

Search

Categories
Archives
« February 2016
SunMonTueWedThuFriSat
 
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
     
       
Today