X

The Oracle NoSQL Database Blog covers all things Oracle NoSQL Database. On-Prem, Cloud and more.

  • October 21, 2014

Loading into Nosql using Hive

The main purpose of this post is to  show how strongly we can tied NoSql and Hive, the focus will be the upload of data into Nosql from Hive.

The post  (here) discussed about the use of Hive external tables to select data from Oracle Nosql. We used a HiveStorageHandle implementation. We have reworked on this implementation to load data from hdfs or a local system via Hive into Nosql. Only uploading of text data is currently supported.

Two kinds of data files can be uploaded:

Case 1: Files containing plain text data like the following comma separated lines:

  • 10,5,001,545973390
  • 10,5,010,1424802007
  • 10,5,011,164988888 

Case 2: Files containing a JSON field corresponding to a given AVRO schema like the following tab separated lines:

  •  10 5 173 {"samples": [{"delay": 0, "value": -914351026}, {"delay": 1, "value": 1842307749}, {"delay": 2, "value": -723989379}, {"delay": 3, "value": -1665788954}, {"delay": 4, "value": 91277214}, {"delay": 5, "value": 1569414562}, {"delay": 6, "value": -877947100}, {"delay": 7, "value": 498879656}, {"delay": 8, "value": -1245756571}, {"delay": 9, "value": 812356097}]}
  •  10 5 174 {"samples": [{"delay": 0, "value": -254460852}, {"delay": 1, "value": -478216539}, {"delay": 2, "value": -1735664690}, {"delay": 3, "value": -1997506933}, {"delay": 4, "value": -1062624313}]}

How to do it ?

1. Define the external table

2. Create and load a native Hive table

3. Insert into the external table a selection from the native Hive table

Case 1:

1.Define the external table

CREATE EXTERNAL TABLE MY_KV_PI_10_5_TABLE (flight string, sensor string, timeref string, stuff string)

      STORED BY 'nosql.example.oracle.com.NosqlStorageHandler'

      WITH SERDEPROPERTIES ("kv.major.keys.mapping" = "flight,sensor", "kv.minor.metadata" = "false", "kv.minor.keys.mapping" = "timeref", "kv.key.prefix" = "PI/10/5", "kv.value.type" = "string", "kv.key.range" = "", "kv.host.port" = "bigdatalite:5000", "kv.name" = "kvstore","kv.key.ismajor" = "true");

2. Create and load a native Hive table

CREATE TABLE kv_pi_10_5_load (flight string, sensor string, timeref string, stuff string) ROW FORMAT DELIMITED FIELDS TERMINATED BY '\054' STORED AS TEXTFILE;

LOAD DATA LOCAL INPATH '/home/oracle/hivepath/pi_10_5' OVERWRITE INTO TABLE kv_pi_10_5_load;

3. Insert into the external table a selection from the native Hive table

INSERT INTO TABLE my_kv_pi_10_5_table SELECT * from kv_pi_10_5_load;

The external table generation defines a major key and its complete key components, this definition is used when inserting, the flight, and sensor values of the data are ignored, timeref elements are loaded the Nosql operation API for batching the insertions.

Case 2:

1.Define the external table

CREATE EXTERNAL TABLE MY_KV_RI_10_5_TABLE (flight string, sensor string, timeref string, stuff string)

      STORED BY 'nosql.example.oracle.com.NosqlStorageHandler'

      WITH SERDEPROPERTIES ("kv.major.keys.mapping" = "flight,sensor", "kv.minor.metadata" = "false", "kv.minor.keys.mapping" = "timeref", "kv.key.prefix" = "RI/10/5", "kv.value.type" = "avro", "kv.key.range" = "","kv.key.ismajor" = "true", "kv.avro.schema" = "com.airbus.zihb.avro.SampleIntSet","kv.host.port" = "bigdatalite:5000", "kv.name" = "kvstore");

 When creating the external table used for upload into Nosql a new parameter is used "kv.avro.schema" = "com.airbus.zihb.avro.SampleIntSet"

It is the Nosql name for an avro schema. Talking about avro schema definition, its the schema namespace "." schema name. 

 2. Create and load a native Hive table

 CREATE TABLE kv_ri_10_5_load (flight string, sensor string, timeref string, stuff string) ROW FORMAT DELIMITED FIELDS TERMINATED BY '\011' STORED AS TEXTFILE;

LOAD DATA LOCAL INPATH '/home/oracle/hivepath/ri_10_5' OVERWRITE INTO TABLE kv_ri_10_5_load;

 3. Insert into the external table a selection from the native Hive table

 LOAD DATA LOCAL INPATH '/home/oracle/hivepath/ri_10_5' INTO TABLE my_kv_ri_10_5_table;

How to verify the upload ? 

Two possibilities:

  • a select query on Hive
  • a get on the kvstore

Let's do it on the Nosql client command line

Case 1: Verify a random line existence

 kv-> get kv  -key /PI/10/5/-/010 -all

/PI/10/5/-/010

1424802007

1 Record returned

Case 2: Verify a random line existence

kv-> get kv  -key /RI/10/5/-/173 -all
/RI/10/5/-/173
{
  "samples" : [ {
    "delay" : 0,
    "value" : -914351026
  }, {
    "delay" : 1,
    "value" : 1842307749
  }, {
    "delay" : 2,
    "value" : -723989379
  }, {
    "delay" : 3,
    "value" : -1665788954
  }, {
    "delay" : 4,
    "value" : 91277214
  }, {
    "delay" : 5,
    "value" : 1569414562
  }, {
    "delay" : 6,
    "value" : -877947100
  }, {
    "delay" : 7,
    "value" : 498879656
  }, {
    "delay" : 8,
    "value" : -1245756571
  }, {
    "delay" : 9,
    "value" : 812356097
  }

 ]

}

1 Record returned

Let's do it on the hive command line

Case 1: Verify a random line existence

select *  from MY_KV_PI_10_5_TABLE where timeref = "010";

OK

10 5 010 1424802007

Case 2: Verify a random line existence

hive> select *  from MY_KV_RI_10_5_TABLE where timeref = "173";

... 

OK

10 5 173 {"samples": [{"delay": 0, "value": -914351026}, {"delay": 1, "value": 1842307749}, {"delay": 2, "value": -723989379}, {"delay": 3, "value": -1665788954}, {"delay": 4, "value": 91277214}, {"delay": 5, "value": 1569414562}, {"delay": 6, "value": -877947100}, {"delay": 7, "value": 498879656}, {"delay": 8, "value": -1245756571}, {"delay": 9, "value": 812356097}]}

You can get a Jdeveloper 12c project here

We have done, a return trip between Nosql and Hive:

  1. Key value subsets of a Nosql database, can be viewed using the select query language of Hive 
  2. Data from Hive tables can be uploaded into Nosql key-value pairs

Be the first to comment

Comments ( 0 )
Please enter your name.Please provide a valid email address.Please enter a comment.CAPTCHA challenge response provided was incorrect. Please try again.Captcha
Oracle

Integrated Cloud Applications & Platform Services