The Oracle NoSQL Database Blog covers all things Oracle NoSQL Database. On-Prem, Cloud and more.

  • January 13, 2015

Big Data SQL for Oracle NoSQL Database (part 1)


We recently announced
Oracle Big Data SQL support for Oracle NoSQL Database. This feature will allow Oracle Database users to connect to external data repositories like Oracle NoSQL Database, and/or Hadoop, and will enable them to fetch data from any or all of the repositories (at once) through single SQL query. We believe this a significant step forward as it
streamlines the adoption of big data technologies
with traditional technology stack.

If you are following big data space then you would agree that everyone likes the scalability, availability, performance at cost aspect of it but almost everyone struggles to modify their enterprise application to work with big data technologies because they don't talk the same language as traditional database systems do.

For innovations to happen fast what is required is that we lower down the adoption barrier for the developers & the data scientists who have been at the core of this data revolution. And by providing same SQL declarative language for relational and non-relational data we have made it simpler for everyone to adopt these big data technologies.

SQL on Oracle NoSQL Database

Having an access layer using a declarative language is definitely a great thing but a common trend these days is that every NoSQL vendor is writing a custom SQLike layer. Now if you want to use your existing application to access relational data as well as non-relational data stored in the NoSQL database then you can not write a single SQL query to access these two separate stores, which again makes it complicated from application development and manageability point of view.

Oracle Big Data SQL for Oracle NoSQL Database fills this gap by providing exactly the same SQL access layer to both the stores and therefore the queries would look exactly the same no matter which store you access. To achieve this, Oracle Big Data SQL requires that Oracle NoSQL
Database support the ability to access its data via Hive. Thus, plugging Oracle NoSQL Database to Oracle Database is as simple as performing 1 (Creating a NoSQL Store and a Table), 2  (Configuring Hive to access NoSQL Table) & 3 (Configuring Oracle Database to talk to HCatalog) steps.

Please note that Big Data SQL provides much better query performance over running a query directly on a generic external tables because with former approach, computation (for selecting, & formatting data) is moved where the data is as apposed to moving all the data to the Oracle Database node first and then applying the predicates. The same proprietary cell technology (developed for Exadata) is used for applying the query predicates in parallel to all the distributed NoSQL partitions, making the overall query to run faster.

In rest of the article I will give you the technical details on how to perform first two steps so you can get the hands-on experience of this feature.

Figure 1: Flow diagram showing steps to access data in Oracle NoSQL Database using hive queries.


Step 1: Create a TABLE in Oracle NoSQL Database and load some data that we would like to access from HIVE later on.

  • For this demonstration we will be using KVLITE but you can very well deploy a real cluster by following steps covered in this webinar. Please remember to set KVHOME as the environment variable that will point to the directory where you unzipped the latest Oracle NoSQL Database binaries (download here) and KVROOT (eg. /u01/kvroot) where you would like metadata files to be stored.

$java -jar $KVHOME/lib/kvstore.jar kvlite -root $KVROOT -host localhost &

  • Once you have Oracle NoSQL Database cluster deployed (in our case it is single instance), connect to administration shell using runadmin command and then from the KV prompt connect to the store (kvstore in my case).

$java -jar $KVHOME/lib/kvstore.jar runadmin -port 5000 -host localhost
kv-> CONNECT STORE -name kvstore

  • Create MOVIE table with few columns (to capture movie specific information)

original_title STRING,\
overview STRING,\
poster_path STRING,\
release_date STRING,\
vote_count INTEGER,\
runtime INTEGER,\
popularity DOUBLE,\
genres STRING,\

  • SHOW table to confirm it is created successfully & DESCRIBE to make sure all the columns types are appropriate as well.



  • Next we will load some MOVIE data into the table. You can download a movie.dat file with details of about 10 movies represented as JSON string. We will directly load these strings into MOVIE table.

kv-> PUT TABLE -name MOVIE -file /tmp/movie.dat
Inserted 10 rows in movie table

  • You can now COUNT the rows and GET the rows to make sure data is indeed there.

kv-> AGGREGATE TABLE -name MOVIE -count
Row count: 10


Step 2: Create Hive External Table using Storage Handler for Oracle NoSQL Database (that got shipped in release 3.2.5) .

  • Before we begin doing anything with Hive we first need to install Hadoop on our system.
  • After Hadoop being installed and tested, download the Hive archive from here. I am using apache-hive-0.14.0 at the time of this article but you can download any latest version. Unzip the archive and create environment variable HIVE_HOME to point to this directory.
  • Here are some of the environment variables that are set in my ~/.bashrc file:

export HADOOP_HOME=/u01/home/hadoop/hadoop-2.2.0
export HADOOP_CONF_DIR=$HADOOP_HOME/etc/hadoop
export HADOOP_OPTS="-Djava.library.path=$HADOOP_HOME/lib"
export KVHOME=/u01/nosql/kvhome
export HIVE_HOME=/u01/home/hadoop/apache-hive-0.14.0
export HIVE_AUX_JARS_PATH=$KVHOME/lib/kvclient.jar
export PATH=$PATH:$HOME/bin:$JAVA_HOME/bin:$HIVE_HOME/bin:\

  •  NOTICE that we have HIVE_AUX_JARS_PATH set to kvclient.jar. We need to set this variable to make sure Hive can find Storage Handler for Oracle NoSQL Database from the client driver itself.
  • Now the only thing that is left is creating an external table to mimic the table exist in Oracle NoSQL Database. Let's connect to hive first and then create the table using this DDL:

original_title STRING,
overview STRING,
poster_path STRING,
release_date STRING,
vote_count INT,
runtime INT,
popularity DOUBLE,
genres STRING)
STORED BY 'oracle.kv.hadoop.hive.table.TableStorageHandler'
TBLPROPERTIES ( "oracle.kv.kvstore"="kvstore",
Time taken: 0.174 seconds

          Note: STORE BY field is where we define the StorageHandler class for Oracle NoSQL Database and TBLPROPERTIES is where we define name of the store, host name and port where storage node agent is listening at (in our case we have only node), and the name of the table in NoSQL DB. We have MOVIE as the table there so that is what we use as oracle.kv.tableName  field.

  • That is all one need to do to configure Hive to work with Oracle NoSQL Database. Pretty simple, right ? 

Step 2b: Now that we have NoSQL DB & Hive setup done, we just need to test running hive queries.

  • Here are some of the simple queries to fetch MOVIE data using Hive Query Language (HQL):

hive> SELECT id, original_title, release_date, genres 
hive> SELECT id, original_title, release_date, genres
WHERE release_date >1990;
hive> SELECT id, original_title, release_date, genres
WHERE genres LIKE '%Comedy%';

  • Some other queries that would require M/R jobs to be spun off to get the results:

hive> SELECT count(*) 
WHERE genres LIKE '%Drama%';
hive> SELECT id, original_title, release_date, genres
WHERE genres LIKE '%Drama%'
ORDER BY release_date;
hive> SELECT MAX(release_date)


Using the Hive StorageHandler from latest release, one can access NoSQL Data as hive external tables. The metadata of these external tables are stored in the HCatalog, which is later referenced by Oracle Big Data SQL to decide where to dispatch the queries in parallel. The process of configuring Oracle NoSQL Database has become very simple (using the out of the box hive StorageHandler) that it takes only couple of steps to complete.

In the part 2 of the blog I will talk about how hive metastore can be configured with Oracle Database to finally run SQL queries on NoSQL Data.

Join the discussion

Comments ( 2 )
  • guest Tuesday, March 22, 2016

    Nice tutorial. However, with Hadoop 2.7.2 and Hive 1.2.1 during the CREATE EXTERNAL TABLE command, Hive prints the following error message:

    FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask. org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe.initSerdeParams(Lorg/apache/hadoop/conf/Configuration;Ljava/util/Properties;Ljava/lang/String;)Lorg/apache/hadoop/hive/serde2/lazy/LazySimpleSerDe$SerDeParameters;

  • Jac Saturday, January 21, 2017

    Part 2 not yet available??

Please enter your name.Please provide a valid email address.Please enter a comment.CAPTCHA challenge response provided was incorrect. Please try again.