X

Information, tips, tricks and sample code for Big Data Warehousing in an autonomous, cloud-driven world

Big Data SQL Quick Start. Add SerDe in Big Data SQL classpath. - Part17.

Alexey Filanovskiy
Product Manager

today I'm going to write about how to add custom SerDe in Big Data SQL. SerDe is one of the most powerful features of Hadoop and Big Data SQL in particular. It allows you to read any type of data as structured, you just need to explain how to do parse it. 

Let's imagine, that we have JSON file:

{"wr_returned_date_sk":38352,"wr_returned_time_sk":46728,"wr_item_sk":561506,"wr_refunded_customer_sk":1131210}
{"wr_returned_date_sk":38380,"wr_returned_time_sk":78937,"wr_item_sk":10003,"wr_refunded_customer_sk":1131211}

and want to proceed it with custom SerDe, for example, org.openx.data.jsonserde.JsonSerDe. Based on the guide I'm trying to create the external table:

hive> CREATE EXTERNAL TABLE json_openx(
       wr_returned_date_sk bigint,
       wr_returned_time_sk bigint,
       wr_item_sk bigint,
       wr_refunded_customer_sk bigint)
     ROW FORMAT SERDE 'org.openx.data.jsonserde.JsonSerDe'
     LOCATION 'hdfs://scaj43-ns/user/hive/warehouse/json_string';
FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask. Cannot validate serde: org.openx.data.jsonserde.JsonSerDe

in the end, I got the error, which tells me that I don't have this jar. Fair enough. I have to add it. I can add this file with the Hive API and create table again:

hive> add jar hdfs://scaj43-ns/tmp/json/json-serde-1.3.6-SNAPSHOT-jar-with-dependencies.jar;
converting to local hdfs://scaj43-ns/tmp/json/json-serde-1.3.6-SNAPSHOT-jar-with-dependencies.jar
Added [/tmp/f0317b31-2df6-4a24-ab8d-66136f9c26e6_resources/json-serde-1.3.6-SNAPSHOT-jar-with-dependencies.jar] to class path
Added resources: [hdfs://scaj43-ns/tmp/json/json-serde-1.3.6-SNAPSHOT-jar-with-dependencies.jar]
hive> CREATE EXTERNAL TABLE json_openx(
            wr_returned_date_sk bigint,
            wr_returned_time_sk bigint,
            wr_item_sk bigint,
            wr_refunded_customer_sk bigint)
          ROW FORMAT SERDE 'org.openx.data.jsonserde.JsonSerDe'
          LOCATION 'hdfs://scaj43-ns/user/hive/warehouse/json_string';
OK
Time taken: 0.047 seconds
hive> select * from json_openx limit 1;
OK
38352   46728   561506  1131210

Now everything seems good, but if I log off and log in again, the query will not work.

hive> select * from json_openx limit 1;

FAILED: RuntimeException MetaException(message:java.lang.ClassNotFoundException Class org.openx.data.jsonserde.JsonSerDe not found)

well, I have to add this JAR in Hive config and most convenient way to do this is Cloudera Manager.

before this, I have to copy it on each machine on the cluster. Also, we need to copy this jar in hive Auxiliary JARs Directory:

[Linux]# dcli -C "mkdir /home/oracle/serde/"
[Linux]# dcli -C -f /root/json/json-serde-1.3.6-SNAPSHOT-jar-with-dependencies.jar -d /home/oracle/serde/
[Linux]# dcli -C -f /root/json/json-serde-1.3.6-SNAPSHOT-jar-with-dependencies.jar -d /opt/oracle/bigdatasql/bdcell-12.1/jlib
[Linux]# dcli -C "ls /home/oracle/serde"
192.168.42.92: json-serde-1.3.6-SNAPSHOT-jar-with-dependencies.jar
192.168.42.93: json-serde-1.3.6-SNAPSHOT-jar-with-dependencies.jar
192.168.42.94: json-serde-1.3.6-SNAPSHOT-jar-with-dependencies.jar

after all, files are propagated to each node, click on the hive service in Cloudera manager: 

then configuration: 

type in search bar "hive-env" and add the path to the jar:

after this reboot hive service and deploy config: 

Check that hive works:

hive> select * from json_openx limit 1;
OK
38352   46728   561506  1131210

Now we are ready to create an external table in Oracle (I'm using PL/SQL package for that):

SQL> DECLARE
   DDLout VARCHAR2(4000);
BEGIN   dbms_hadoop.create_extddl_for_hive(
CLUSTER_ID      => 'scaj43',
DB_NAME         => 'default',
HIVE_TABLE_NAME => 'json_openx',
HIVE_PARTITION  => FALSE,
TABLE_NAME      => 'json_openx',
PERFORM_DDL     => FALSE,
TEXT_OF_DDL     => DDLout);
   dbms_output.put_line(DDLout);
END;
/

SQL> CREATE TABLE BDS.json_openx 
(wr_returned_date_sk NUMBER,  
 wr_returned_time_sk NUMBER,  
 wr_item_sk NUMBER,  
 wr_refunded_customer_sk NUMBER)  
ORGANIZATION EXTERNAL
 (TYPE ORACLE_HIVE
 DEFAULT DIRECTORY DEFAULT_DIR 
 ACCESS PARAMETERS (
com.oracle.bigdata.cluster=scaj43
com.oracle.bigdata.tablename=default.json_openx)
) 
PARALLEL 2 
REJECT LIMIT UNLIMITED;

SQL> SELECT * FROM BDS.json_openx;
SELECT * FROM BDS.json_openx
ERROR at line 1:
ORA-29913: error in executing ODCIEXTTABLEOPEN callout
ORA-29400: data cartridge error
KUP-11504: error from external driver: oracle.hadoop.sql.JXADException: error
parsing "com.oracle.bigdata.colmap" field name "wr_returned_date_sk" not found 

well... we got an error because this jar doesn't exist in Big Data SQL classpath. Let's add it. On the cell side in file bigdata.properties (in my environment int's in /opt/oracle/bigdatasql/bdcell-12.1/bigdata.properties) add this jar to java.classpath.Hadoop variable:

java.classpath.hadoop=/opt/oracle/bigdatasql/bdcell-hadoopconf/*:/opt/cloudera/parcels/CDH/lib/hadoop/lib/*:/opt/cloudera/parcels/CDH/lib/hadoop/*:/opt/cloudera/parcels/CDH/lib/hadoop-hdfs/lib/*:/opt/cloudera/parcels/CDH/lib/hadoop-hdfs/*:/opt/cloudera/parcels/CDH/lib/hadoop-yarn/lib/*:/opt/cloudera/parcels/CDH/lib/hadoop-yarn/*:/opt/cloudera/parcels/CDH/lib/hadoop-mapreduce/lib/*:/opt/cloudera/parcels/CDH/lib/hadoop-mapreduce/*:/home/oracle/serde/json-serde-1.3.6-SNAPSHOT-jar-with-dependencies.jar

On the database side in $ORACLE_HOME/bigdatasql/bigdata_config/bigdata.properties file add the same path against the same variable java.classpath.hadoop.

After this you need to restart Big Data SQL on the cell side:

and restart extproc on the database side: 

[Linux]# $GRID_HOME/bin/crsctl stop resource bds_DBINSTANCE_HADOOPCLUSTER
[Linux]# $GRID_HOME/bin/crsctl start resource bds_DBINSTANCE_HADOOPCLUSTER

and finally, check that everything works properly:

SQL> SELECT * FROM BDS.json_openx;
38352               46728      561506                1131210
38380               78937      10003                 1131211

Be the first to comment

Comments ( 0 )
Please enter your name.Please provide a valid email address.Please enter a comment.CAPTCHA challenge response provided was incorrect. Please try again.