ODI - Hive External Tables, reverse engineering

Here we see Hive external tables on JSON data files in action and being reversed engineered from Hive into ODI. Carrying on from my earlier post on Hive reverse engineering here we will tackle external tables with primitive data types. Just like Oracle external tables for accessing data on the filesystem, Hive external tables can access big data files in HDFS. This example uses a JSON SerDe for Hive which was downloaded from here. The external tables in Hive are quite nice! You just specify the HDFS directory and all files in that directory are projected through the external table definition, also you can specify sophisticated SerDe classes that comply with the Hive SerDe interfaces - these classes can reverse engineer complex types also which make them very useful.

Before launching ODI or the agent for this I added the SerDe JAR file to this environment variable used by ODI, here is what I did in my system;

  • export ODI_HIVE_SESSION_JARS=$ODI_HIVE_SESSION_JARS:/home/oracle/json/hive-json-serde-0.2.jar

I have 2 JSON files with data in my HDFS under /user/oracle/json I will specify this directory in the external table definition, along with the JSON SerDe. Within my Hive shell I can define the external table, remembering to add the JSON SerDe file (using the ADD JAR command) just before I create the external table.

  1. ADD JAR /home/oracle/json/hive-json-serde-0.2.jar;
  3.           field1 string, field2 int, field3 string, field4 double
  4.        )
  5.        ROW FORMAT SERDE 'org.apache.hadoop.hive.contrib.serde2.JsonSerde'
  6.        LOCATION '/user/oracle/json';

 The data in the first JSON file looks like the following, this was just an example from the google code snippets;

  • {"field1":"data1","field2":100,"field3":"more data1","field4":123.001}
  • {"field1":"data2","field2":200,"field3":"more data2","field4":123.002}
  • {"field1":"data3","field2":300,"field3":"more data3","field4":123.003}
  • {"field1":"data4","field2":400,"field3":"more data4","field4":123.004}

I have defined these fields in the external table. Back in ODI, in my model below I am using the customized reverse with the RKM Hive mentioned above, also for this reverse I just want one table imported, so I defined the mask just to be my_table. Clicking the 'Reverse Engineer' button on the top left of the panel starts the reverse engineering.

If there are errors, then check the ODI Operator, that's where you will see anything that goes wrong. The table in ODI has just some rudimentary information about the number of columns, all of the external table definition for this is defined in Hive, this is the same as when we reverse engineer an Oracle external table. Once the table is captured in ODI I can view the data;

The custom Hive RKM I used is here - this has some changes to the one shipped with ODI to include external table reverse (I included EXTERNAL_TABLE in the getTables API call), plus support for custom SerDe JARs loaded via the environment variable ODI_HIVE_SESSION_JARS (just like the ODI IKMs load the JARs, the RKM also needs to).

Anyway, that's a quick peek at external tables in Hive and reverse engineering them into ODI. Once the table is captured in ODI you can use it in mappings just like other tables!


Hi, I'm trying to replicate your solution but I have always the same error in ODI Studio. This is the error:

ODI-1217: Session HiveDefault (77) fails with return code 7000.
ODI-1226: Step HiveDefault fails after 1 attempt(s).
ODI-1229: An error occurred while performing a Reverse operation on model code HIVEDEFAULT.
ODI-1590: The execution of the script failed.
Caused By: org.apache.bsf.BSFException: exception from Groovy: java.sql.SQLException: Error while processing statement: FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask. MetaException(message:java.lang.ClassNotFoundException Class com.cloudera.hive.serde.JSONSerDe not found)
at org.codehaus.groovy.bsf.GroovyEngine.exec(GroovyEngine.java:111)
at com.sunopsis.dwg.codeinterpretor.SnpScriptingInterpretor.execInBSFEngine(SnpScriptingInterpretor.java:360)
at com.sunopsis.dwg.codeinterpretor.SnpScriptingInterpretor.exec(SnpScriptingInterpretor.java:175)
at oracle.odi.runtime.agent.execution.interpreter.SessionTaskScriptingInterpretor.scripting(SessionTaskScriptingInterpretor.java:173)
at oracle.odi.runtime.agent.execution.SessionTask.scripting(SessionTask.java:99)
at oracle.odi.runtime.agent.execution.cmd.ScriptingExecutor.execute(ScriptingExecutor.java:47)
at oracle.odi.runtime.agent.execution.cmd.ScriptingExecutor.execute(ScriptingExecutor.java:19)
at oracle.odi.runtime.agent.execution.TaskExecutionHandler.handleTask(TaskExecutionHandler.java:52)
at oracle.odi.runtime.agent.execution.SessionTask.processTask(SessionTask.java:189)
at oracle.odi.runtime.agent.execution.SessionTask.doExecuteTask(SessionTask.java:111)
at oracle.odi.runtime.agent.execution.AbstractSessionTask.execute(AbstractSessionTask.java:864)
at oracle.odi.runtime.agent.execution.SessionExecutor$SerialTrain.runTasks(SessionExecutor.java:2024)
at oracle.odi.runtime.agent.execution.SessionExecutor.executeSession(SessionExecutor.java:562)
at oracle.odi.runtime.agent.processor.TaskExecutorAgentRequestProcessor$1.doAction(TaskExecutorAgentRequestProcessor.java:718)
at oracle.odi.runtime.agent.processor.TaskExecutorAgentRequestProcessor$1.doAction(TaskExecutorAgentRequestProcessor.java:611)
at oracle.odi.core.persistence.dwgobject.DwgObjectTemplate.execute(DwgObjectTemplate.java:203)
at oracle.odi.runtime.agent.processor.TaskExecutorAgentRequestProcessor.doProcessStartAgentTask(TaskExecutorAgentRequestProcessor.java:800)
at oracle.odi.runtime.agent.processor.impl.StartSessRequestProcessor.access$1400(StartSessRequestProcessor.java:74)
at oracle.odi.runtime.agent.processor.impl.StartSessRequestProcessor$StartSessTask.doExecute(StartSessRequestProcessor.java:702)
at oracle.odi.runtime.agent.processor.task.AgentTask.execute(AgentTask.java:180)
at oracle.odi.runtime.agent.support.DefaultAgentTaskExecutor$2.run(DefaultAgentTaskExecutor.java:108)
at java.lang.Thread.run(Thread.java:745)
Caused by: java.sql.SQLException: Error while processing statement: FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask. MetaException(message:java.lang.ClassNotFoundException Class com.cloudera.hive.serde.JSONSerDe not found)
at org.apache.hive.jdbc.Utils.verifySuccess(Utils.java:167)
at org.apache.hive.jdbc.Utils.verifySuccessWithInfo(Utils.java:155)
at org.apache.hive.jdbc.HiveStatement.execute(HiveStatement.java:210)
at org.apache.hive.jdbc.HiveStatement.executeQuery(HiveStatement.java:333)
at java_sql_Statement$executeQuery.call(Unknown Source)
at org.codehaus.groovy.runtime.callsite.CallSiteArray.defaultCall(CallSiteArray.java:45)
at org.codehaus.groovy.runtime.callsite.AbstractCallSite.call(AbstractCallSite.java:108)
at org.codehaus.groovy.runtime.callsite.AbstractCallSite.call(AbstractCallSite.java:116)
at RkmHive.getFlexFields(Create_Java_Classes:487)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.codehaus.groovy.runtime.callsite.PogoMetaMethodSite$PogoCachedMethodSiteNoUnwrapNoCoerce.invoke(PogoMetaMethodSite.java:272)
at org.codehaus.groovy.runtime.callsite.PogoMetaMethodSite.callCurrent(PogoMetaMethodSite.java:52)
at org.codehaus.groovy.runtime.callsite.CallSiteArray.defaultCallCurrent(CallSiteArray.java:49)
at org.codehaus.groovy.runtime.callsite.AbstractCallSite.callCurrent(AbstractCallSite.java:133)
at org.codehaus.groovy.runtime.callsite.AbstractCallSite.callCurrent(AbstractCallSite.java:141)
at RkmHive.getTables(Create_Java_Classes:292)
at RkmHive.startReverse(Create_Java_Classes:206)
at RkmHive$startReverse.call(Unknown Source)
at org.codehaus.groovy.runtime.callsite.CallSiteArray.defaultCall(CallSiteArray.java:45)
at org.codehaus.groovy.runtime.callsite.AbstractCallSite.call(AbstractCallSite.java:108)
at org.codehaus.groovy.runtime.callsite.AbstractCallSite.call(AbstractCallSite.java:112)
at Start_the_Reverse.run(Start_the_Reverse:17)
at groovy.lang.GroovyShell.evaluate(GroovyShell.java:518)
at groovy.lang.GroovyShell.evaluate(GroovyShell.java:556)
at groovy.lang.GroovyShell.evaluate(GroovyShell.java:537)
at org.codehaus.groovy.bsf.GroovyEngine.exec(GroovyEngine.java:109)
... 21 more


Posted by guest on January 30, 2015 at 08:55 AM PST #

Your error is ClassNotFoundException Class com.cloudera.hive.serde.JSONSerDe not found. Did you download the serde and set the environment variable before starting ODI as in my post?


Posted by David on January 30, 2015 at 09:18 AM PST #

Thanks for your prompt response.
Yes, but I'm using hive-serdes-1.0-SNAPSHOT.jar lib. Is it the problem?
There is a way to load the lib in another way?

Posted by Donato on January 30, 2015 at 09:49 AM PST #

How are you setting the variable? How are you launching ODI? It needs to be an environment variable in scope of the ODI studio when launched. If the variable is not defined or not in scope then you will get class not found.

Are you using the custom RKM also? The custom RKM I posted registers the JAR files for use by the reverse engineer. If you do not use that you will get class not found.


Posted by David on January 30, 2015 at 10:17 AM PST #

Yes, I'm using a Big Data Oracle VM. I set the variable like you, "export ODI_HIVE_SESSION_JARS=$ODI_HIVE_SESSION_JARS:/home/oracle/Downloads/hive-serdes-1.0-SNAPSHOT.jar" and run the odi studio with command "odi", enviroment variable. But the error is always the same, class not found.

Posted by guest on February 05, 2015 at 01:54 AM PST #

Yes, I use the custom RKM Hive.

Posted by Donato on February 05, 2015 at 01:55 AM PST #

Post a Comment:
  • HTML Syntax: NOT allowed

Learn the latest trends, use cases, product updates, and customer success examples for Oracle's data integration products-- including Oracle Data Integrator, Oracle GoldenGate and Oracle Enterprise Data Quality


« May 2016