Tuesday Jul 09, 2013

ODI - Loading MongoDB (API as Target)

In this post I will show how to load documents into a MongoDB collection. The interface design looks just like all other ODI interfaces, but behind the scenes the KM configured in the physical design uses the MongoDB SDK (see MongoDB SDK here) to insert the documents. The target datastore below represents a MongoDB document, the columns are the keys in the document. Each row is inserted as a document, and each column is a key, the column value is the value. The ENAME value below is shown as a complex JSON value.

The IKM I have used is a multi-connect IKM, the source is a SQL data source and the target uses a MongoDB groovy command. The heart of the IKM to insert the documents into the collection has a SQL select as the source command and the following groovy code for the target command;

  1. import com.mongodb.*
  2. MongoClient mongoClient = new MongoClient(" <%=odiRef.getOption("MONGO_SERVER")%> ", <%=odiRef.getOption("MONGO_PORT")%> );
  3. DB db = mongoClient.getDB("<%=odiRef.getOption("MONGODB")%>");
  4. DBCollection coll = db.getCollection("<%=odiRef.getOption("MONGOCOLL")%>")
  5. BasicDBObject doc = new BasicDBObject();

  6. <%=odiRef.getColList(" ", "doc.put(\u0022[COL_NAME]\u0022, \u0022#[CX_COL_NAME]\u0022);", " \n ", "", "((INS and !TRG) and REW)")%>

  7. coll.insert(doc);

The odiRef.getColList method call above generates code for every target column, the code performs a doc.put invocation to add the key-value pairs into the document. For example this is the code generated and executed based on the interface design discussed above;

  1. import com.mongodb.*
  2. MongoClient mongoClient = new MongoClient("DALLAN-SVR", 27017);
  3. DB db = mongoClient.getDB("test");
  4. DBCollection coll = db.getCollection("testCollection")
  5. BasicDBObject doc = new BasicDBObject();

  6.  doc.put("EMPNO", "#EMPNO"); 
  7.  doc.put("ENAME", "#ENAME"); 
  8.  doc.put("JOB", "#JOB"); 
  9.  doc.put("MGR", "#MGR"); 
  10.  doc.put("HIREDATE", "#HIREDATE"); 
  11.  doc.put("SAL", "#SAL"); 
  12.  doc.put("COMM", "#COMM"); 
  13.  doc.put("DEPTNO", "#DEPTNO");

  14. coll.insert(doc);

 This is a simple illustration of how to load documents into MongoDB. We can go into the MongoDB command line and execute the command to see all objects in the collection and get the list of documents, below you can see a preview of executing db.testCollection.find()

  • { "_id" : ObjectId("51dc3ded6c4b9a5bd07d68a6"), "EMPNO" : "7369", "ENAME" : "{ NAME : SMITH, DESCR : 22 }", "JOB" : "CLERK", "MGR" : "7902", "HIREDATE" : "1980-12-17 00:00:00.0", "SAL" : "801", "COMM" : "", "DEPTNO" : "20" }
  • { "_id" : ObjectId("51dc3ded6c4b9a5bd07d68a7"), "EMPNO" : "7499", "ENAME" : "{ NAME : ALLEN, DESCR : 22 }", "JOB" : "SALESMAN", "MGR" : "7698", "HIREDATE" : "1981-02-20 00:00:00.0", "SAL" : "1601", "COMM" : "300", "DEPTNO" : "30" }

 You can see the key:value pairs in our document. For those MongoDB gurus, you'll notice in the 'complex' data illustration, this is really a string and not a MongoDB complex object - that discussion is for another day.

This post is not just about MongoDB, but also a useful post on how to integrate APIs as a target in a data flow. 

Wednesday Jan 02, 2013

ODI - Hive and NoSQL, the code

This post includes the Java client demonstration code used in the Hive and NoSQL post illustrated here. The BasicBigData.java code is a NoSQL client which populates a key value store that is queryable using the Hive external table from that post. It didn't take long to code and a few peeks at the NoSQL javadoc to get it going. You can take this java code and compile and run it (instructions for compiling are similar to the verification demo here - it is very easy).

The java code uses the NoSQL major/minor path constructor to describe the Key, below is a snippet to define the birthdate for Bob Smith;

  1. ArrayList<String> mjc1 = new ArrayList<String>();
  2. mjc1.add("Smith");
  3. mjc1.add("Bob");
  4. ...
  5. ArrayList<String> mnrb = new ArrayList<String>();
  6. mnrb.add("birthdate");
  7. ...
  8. store.put(Key.createKey(mjc1,mnrb),Value.createValue("05/02/1975".getBytes()));
  9. ...

In the referenced post, to actually aggregate the key values, we used the Hive collect_set aggregation function (see here for Hive aggregation functions). The collect_set aggregation function returns a set of objects with duplicates eliminated. To get the aggregation function behavior in ODI with the correct group by we must tell ODI about the Hive aggregation function. We can define a new language element for collect set in the Topology tree, define the element as a group function, and also define the expression for Hive under the Implementation tab;

We are then able to define expressions which reference this aggregation function and get the exact syntax defined in the earlier post. Below we see the Hive expressions using collect_set below;

From this design and the definition of the aggregation function in ODI, when its executed you can see the generated Hive QL with the correct columns in the grouping function;

The target Hive datastore in the interface I defined as been loaded with the key values from the NoSQL keystore, cool!

Those are a few of the missing pieces which would let you query NoSQL through Hive external tables, hopefully some useful pointers. 

Monday Dec 31, 2012

ODI - Hive and NoSQL

The Hive external table let's us do lots of cool stuff including processing data from NoSQL. We have seen how custom SerDes are used, Hive storage handlers also provide some cool capabilities. Using the Hive storage handler defined here, an external table can be defined to project data from a NoSQL key-value store. The external table can then be used as a source in ODI, very simple.

The illustration on github has the following data stored in a Oracle NoSQL Database (the key is the lastname/firstname etc):

  • /Smith/Bob/-/birthdate: 05/02/1975
  • /Smith/Bob/-/phonenumber: 1111-1111
  • /Smith/Bob/-/userid: 1
  • /Smith/Patricia/-/birthdate: 10/25/1967
  • /Smith/Patricia/-/phonenumber: 2222-2222
  • /Smith/Patricia/-/userid: 2
  • /Wong/Bill/-/birthdate: 03/10/1982
  • /Wong/Bill/-/phonenumber: 3333-3333
  • /Wong/Bill/-/userid: 3

Using the Hive external table and the custom storage handler for a key value store, we define a mask to project the data through the external table. 

  1. ADD JAR /home/oracle/kv/HiveKVStorageHandler.jar;
  2. CREATE EXTERNAL TABLE MY_KV_TABLE (lastname string, firstname string, birthdate string, phonenumber string, userid string)
  3.       STORED BY 'org.vilcek.hive.kv.KVHiveStorageHandler'
  4.       WITH SERDEPROPERTIES ("kv.major.keys.mapping" = "lastname,firstname", "kv.minor.keys.mapping" = "birthdate,phonenumber,userID")
  5.       TBLPROPERTIES ("kv.host.port" = "localhost:5000", "kv.name" = "kvstore");

There are a few interesting properties here;

  • we specify the keyvalue store using TBLPROPERTIES, identify the host/port and the keystore name (kvstore).
  • the SerDe properties contains the mapping of the keys to column names, you will get a row for each value of birthdate, phonenumber, userID

Fairly straightforward. We can then reverse engineer this into ODI, using the same mechanism as I described in previous posts here setting the ODI_HIVE_SESSION_JARS and so forth. The data projected looks like this;

  2. OK
  3. Smith     Patricia     10/25/1967     NULL NULL
  4. Smith Patricia NULL 2222-2222     NULL
  5. Smith Patricia NULL NULL 2
  6. Smith Bob 05/02/1975 NULL NULL
  7. Smith Bob NULL 1111-1111 NULL
  8. Smith Bob NULL NULL 1
  9. Wong Bill 03/10/1982 NULL NULL
  10. Wong Bill NULL 3333-3333 NULL
  11. Wong Bill NULL NULL 3

In ODI by defining the Hive collect_set function as an aggregation function, we can then aggregate the data and pivot the data to get it as a row;

  1. SELECT lastname, firstname, collect_set(birthdate)[0], collect_set(phonenumber)[0], collect_set(userid)[0]
  2.       FROM MY_KV_TABLE
  3.       GROUP BY lastname, firstname;

So another interesting illustration of external tables in Hive and what they can provide.


Learn the latest trends, use cases, product updates, and customer success examples for Oracle's data integration products-- including Oracle Data Integrator, Oracle GoldenGate and Oracle Enterprise Data Quality


« April 2014