ODI - Loading MongoDB (API as Target)

In this post I will show how to load documents into a MongoDB collection. The interface design looks just like all other ODI interfaces, but behind the scenes the KM configured in the physical design uses the MongoDB SDK (see MongoDB SDK here) to insert the documents. The target datastore below represents a MongoDB document, the columns are the keys in the document. Each row is inserted as a document, and each column is a key, the column value is the value. The ENAME value below is shown as a complex JSON value.

The IKM I have used is a multi-connect IKM, the source is a SQL data source and the target uses a MongoDB groovy command. The heart of the IKM to insert the documents into the collection has a SQL select as the source command and the following groovy code for the target command;

  1. import com.mongodb.*
  2. MongoClient mongoClient = new MongoClient(" <%=odiRef.getOption("MONGO_SERVER")%> ", <%=odiRef.getOption("MONGO_PORT")%> );
  3. DB db = mongoClient.getDB("<%=odiRef.getOption("MONGODB")%>");
  4. DBCollection coll = db.getCollection("<%=odiRef.getOption("MONGOCOLL")%>")
  5. BasicDBObject doc = new BasicDBObject();

  6. <%=odiRef.getColList(" ", "doc.put(\u0022[COL_NAME]\u0022, \u0022#[CX_COL_NAME]\u0022);", " \n ", "", "((INS and !TRG) and REW)")%>

  7. coll.insert(doc);

The odiRef.getColList method call above generates code for every target column, the code performs a doc.put invocation to add the key-value pairs into the document. For example this is the code generated and executed based on the interface design discussed above;

  1. import com.mongodb.*
  2. MongoClient mongoClient = new MongoClient("DALLAN-SVR", 27017);
  3. DB db = mongoClient.getDB("test");
  4. DBCollection coll = db.getCollection("testCollection")
  5. BasicDBObject doc = new BasicDBObject();

  6.  doc.put("EMPNO", "#EMPNO"); 
  7.  doc.put("ENAME", "#ENAME"); 
  8.  doc.put("JOB", "#JOB"); 
  9.  doc.put("MGR", "#MGR"); 
  10.  doc.put("HIREDATE", "#HIREDATE"); 
  11.  doc.put("SAL", "#SAL"); 
  12.  doc.put("COMM", "#COMM"); 
  13.  doc.put("DEPTNO", "#DEPTNO");

  14. coll.insert(doc);

 This is a simple illustration of how to load documents into MongoDB. We can go into the MongoDB command line and execute the command to see all objects in the collection and get the list of documents, below you can see a preview of executing db.testCollection.find()

  • { "_id" : ObjectId("51dc3ded6c4b9a5bd07d68a6"), "EMPNO" : "7369", "ENAME" : "{ NAME : SMITH, DESCR : 22 }", "JOB" : "CLERK", "MGR" : "7902", "HIREDATE" : "1980-12-17 00:00:00.0", "SAL" : "801", "COMM" : "", "DEPTNO" : "20" }
  • { "_id" : ObjectId("51dc3ded6c4b9a5bd07d68a7"), "EMPNO" : "7499", "ENAME" : "{ NAME : ALLEN, DESCR : 22 }", "JOB" : "SALESMAN", "MGR" : "7698", "HIREDATE" : "1981-02-20 00:00:00.0", "SAL" : "1601", "COMM" : "300", "DEPTNO" : "30" }

 You can see the key:value pairs in our document. For those MongoDB gurus, you'll notice in the 'complex' data illustration, this is really a string and not a MongoDB complex object - that discussion is for another day.

This post is not just about MongoDB, but also a useful post on how to integrate APIs as a target in a data flow. 

Comments:

Dear David,

thank you very much for sharing this article.

Do i get this right - first of all i'd have to install the mongo sdk on the machine where i have the odi running - vice versa - the agents?

my best wishes

/J

Posted by Jan Christian Leonhard on January 07, 2014 at 04:18 AM PST #

Hi Jan

Yes, the MongoDB API (or any other API you might use from KMs, procedures etc) would have to be available for the agent executing the code.

Cheers
David

Posted by David on January 07, 2014 at 08:28 AM PST #

Dear David,
I have some unclear points:
- when you create a target datastore which is the technology applied to?
- you use a multi-connect IKM?
- if i create a new multi-connect IKM which are source and target technologies that i have to specify?
- which task/command on target and source did you create on IKM?
- can you share us your IKM with a link?

Thanks and Regards
Vincenzo

Posted by Vincenzo on January 13, 2015 at 01:13 AM PST #

Hi Vincenzo

The KM is on the java.net share site below;
https://java.net/projects/oracledi/downloads/download/ODI/Knowledge%20Modules/KM_IKM_SQL_to_Mongo_DB.xml

For the target model, I simply made it a File technology model. Then I can define the shape of the datastore. The IKM uses a groovy task - none of the target model technology is really used so you could actually make the target model any technology (the IKM has target tech as undefined). Specifying technology on the IKM helps with default primarily - I left both source and target as undefined and it is a multi connect IKM, I change the name to IKM SQL to Mongo DB to make that a little clearer.

Cheers
David

Posted by David on January 13, 2015 at 09:00 AM PST #

Hi David,
thanks for your reply.

We added some lines to authenticate with mongo:

ServerAddress server=new ServerAddress(" <%=odiRef.getOption("MONGO_SERVER")%> ", <%=odiRef.getOption("MONGO_PORT")%>);
MongoCredential credential=MongoCredential.createMongoCRCredential("usr","<%=odiRef.getOption("MONGODB")%>","pwd".toCharArray());
MongoClient mongoClient = new MongoClient(server,Arrays.asList(credential));

Cheers
Vincenzo

Posted by Vincenzo on January 14, 2015 at 03:45 AM PST #

Hi Vincenzo

That's great! You could use the username and password field on the dataserver and then make the KM more secure and generic. The password would not be shown in the KM or the execution logs.

<%=odiRef.getInfo("DEST_USER_NAME")%>
<%=odiRef.getInfo("DEST_PASS")%>

Then your code could become;

ServerAddress server=new ServerAddress("<%=odiRef.getOption("MONGO_SERVER")%>", <%=odiRef.getOption("MONGO_PORT")%>);
MongoCredential credential=MongoCredential.createMongoCRCredential("<%=odiRef.getInfo("DEST_USER_NAME")%>","<%=odiRef.getOption("MONGODB")%>","<%=odiRef.getInfo("DEST_PASS")%>".toCharArray());
MongoClient mongoClient = new MongoClient(server,Arrays.asList(credential));

Cheers
David

Posted by David on January 14, 2015 at 07:50 AM PST #

Hi David,
thank for your reply.
I made some tests and I realized that for every line I write on mongo are made more connections to the server. I think this depends on how IKM is structured. Could you suggested me some trick to avoid this? Is there a way to initialize the connection once and reuse it?
I have add mongoClient.close, in this way at least the connections are opened and closed.
If you had some links where deepening the use of IKM task would be welcome.

Cheers,
Vincenzo

Posted by guest on January 20, 2015 at 01:19 AM PST #

Hi Vincenzo

Yes, you can create a connection in a task and reuse across many subsequent tasks.

For example is this is the 1st task and it's groovy....
import com.mongodb.MongoClient;
mongoClient = new MongoClient("DALLAN-LAP", 27017)
adb = mongoClient.getDB("<%=odiRef.getOption("MONGODB")%>")
coll = adb.getCollection("<%=odiRef.getOption("MONGOCOLL")%>")

Then in the 2nd task which is also groovy you can reference mongoClient and adb for example.... in the next task I may do this and reference adb;

coll = adb.getCollection("<%=odiRef.getOption("MONGOCOLL")%>")
coll.drop()
coll = adb.getCollection("<%=odiRef.getOption("MONGOCOLL")%>")

Cheers
David

Posted by David on January 20, 2015 at 01:14 PM PST #

Great,
I had some problems with groovy syntax :P

I think that the connections to a new db as mongo could be better structured in ODI defining a new technology in the physical architecture panel (ODI 12), but to do some testing this approach is fine.

look then your new article ... ;)

Thanks a lot!
Vincenzo

Posted by Vincenzo on January 21, 2015 at 01:28 AM PST #

Hi Vincenzo

Something like this a little better for defining the data server plus schema? I have defined the collection name in the ODI physical schema and the MongoDB server credentials etc in the ODI data server;
https://blogs.oracle.com/dataintegration/resource/odi_11g/mongodb_in_odi.jpg
I tried to draw a parallel with Couchbase also (Bucket is in ODI Physical Schema);
https://blogs.oracle.com/dataintegration/resource/odi_11g/couchbase_in_odi.jpg

Cheers
David

Posted by David on January 22, 2015 at 10:38 AM PST #

Very well,
thanks for your screenshot. Gives an idea of what I meant.

Cheers
Vincenzo

Posted by Vincenzo on January 26, 2015 at 03:33 AM PST #

Hi David,

I use this KM but when i run the interface, It get this error, I think I add mongodb library wrong, plz give me the instruction how to add mongodb SDK to use in KM, and what version of this tutorial.

org.apache.bsf.BSFException: exception from Groovy: org.codehaus.groovy.control.MultipleCompilationErrorsException: startup failed:
Initialize: 2: unable to resolve class MongoClient
@ line 2, column 13.
MongoClient mongoClient = new MongoClient("DALLAN-LAP", 27017);

Thanks and Best Regards,

Posted by Kenstep on January 26, 2015 at 08:35 AM PST #

Hi Kenstep

If you are on windows copy the Java MongoDB driver into the userlib directory for ODI (on Linux this is under ~/.odi/oracledi/userlib) and restart your ODI client or agent if using agent;
C:\Users\your_user\AppData\Roaming\odi\oracledi\userlib

You need to download the Java driver from the web and copy to your host running the ODI studio or agent;
http://docs.mongodb.org/ecosystem/drivers/java/

The class MongoClient is provided in the MongoDB java driver download.
Cheers
David

Posted by David on January 26, 2015 at 08:53 AM PST #

Hi David,

Thanks for your help, but it's still not work.

I add mongo API driver to this folder (I'm using windows XP so the path is a little difference)
http://i79.photobucket.com/albums/j151/khoa_anh1/2_zpsbbnhw66e.jpg

(I also found some folder lib or driver like: Oracle_ODI1\oracledi.sdk\lib or Oracle_ODI1\oracledi\agent\drivers)

Then I run the interface on local Agent
http://i79.photobucket.com/albums/j151/khoa_anh1/3_zpsw4l3ngh6.jpg

It's still got error
http://i79.photobucket.com/albums/j151/khoa_anh1/1_zpso4rpxhno.jpg

I download KM from the link here & do not change anything in code: https://java.net/projects/oracledi/downloads/download/ODI/Knowledge%20Modules/KM_IKM_SQL_to_Mongo_DB.xml

It's seem that the ODI doesn't load mongo-java-driver-2.12.0 or do I need to specify this file in the file KM_IKM_SQL_to_Mongo_DB.xml.

do I forgot something ? plz help me. Thank you very much

Regards,

Posted by guest on January 26, 2015 at 08:38 PM PST #

Hi

Are you logged in to the OS as the Administrator account? Did you restart the ODI studio after copying the JAR file?

Cheers
David

Posted by David on January 27, 2015 at 07:38 AM PST #

Hi David,

I'm sure that I logged in as Administrator (I test on local machine, It's has only administrator account) --> so when i run the ODI, i cann't remove the file mongo-java-driver-2.9.3 (because ODI is reading it).

I also restart ODI (restart machine) after copy mongo-java-driver-2.9.3

can you make a tutorial video of how to add new API that is used in KM, or can you describe detail anything you use in this tutorial (version of ODI, version of OS, interface, version of mongo driver)

Plz support me

Thanks you mery much,
Kenstep

Posted by Kenstep on January 27, 2015 at 07:54 AM PST #

Hi Kenstep,
if you run the interface on local Agent, maybe you must use <YourODIAgentHome>\odi\agent\lib for mongo jar driver.

Vincenzo

Posted by guest on January 27, 2015 at 08:28 AM PST #

Sorry, I meant if you run the interface on standalone Agent...but now I've seen your screenshot.

Vincenzo

Posted by Vincenzo on January 27, 2015 at 08:35 AM PST #

Hi David,

I try to read a collection from mongoDB by ODI, i have 2 systems, windows, and linux.
On windows system it work, on linux system ODi return this error :

java.lang.RuntimeException: java.lang.IllegalArgumentException: Could not load JDBC driver class [org.apache.derby.jdbc.EmbeddedDriver]
at oracle.odi.core.datasource.provider.AbstractDataSourceProvider.configure(AbstractDataSourceProvider.java:106)
at

suggestions?
Thanks you mery much,
Davide.

Posted by guest on January 27, 2015 at 09:05 AM PST #

Hi
For the 'Could not load JDBC driver class [org.apache.derby.jdbc.EmbeddedDriver]' you will need to install/copy the Derby/JavaDB JDBC driver jar file to the oracledi/userlib directory and restart studio/agent.

Cheers
David

Posted by David on January 27, 2015 at 11:01 AM PST #

Hi Kenstep
You are using an old version (you have 2.9.3) of the MongoDB java jar file.

I think the latest production release right now is below;
http://central.maven.org/maven2/org/mongodb/mongo-java-driver/2.12.5/mongo-java-driver-2.12.5.jar

There is a 3.0 version in beta I have also tried. Remove the old version from your userlib directory and install/copy the later one I mention and restart ODI studio and you should be up and running.

Cheers
David

Posted by David on January 27, 2015 at 11:16 AM PST #

Hi David,

I used the newest version then It's work, Thank you

After running, It got this error (after loaded a lot of row from MySQL to MongoDB)

java.lang.OutOfMemoryError: Not enough storage is available to process this command in tsStartJavaThread (src/jvm/threads/vmthread/lifecycle.c:1097).

Attempting to allocate 640M bytes

There is insufficient native memory for the Java
Runtime Environment to continue.

Possible reasons:
The system is out of physical RAM or swap space
In 32 bit mode, the process size limit was hit

Possible solutions:
Reduce memory load on the system
Increase physical memory or swap space
Check if swap backing store is full
Use 64 bit Java on a 64 bit OS
Decrease Java heap size (-Xmx/-Xms)
Decrease number of Java threads
Decrease Java thread stack sizes (-Xss)
Disable compressed references (-XXcompressedRefs=false)

at java.lang.Thread.start0(Native Method)
at java.lang.Thread.start(Thread.java:640)
at java.util.concurrent.ThreadPoolExecutor.addIfUnderCorePoolSize(ThreadPoolExecutor.java:703)
at java.util.concurrent.ThreadPoolExecutor.prestartCoreThread(ThreadPoolExecutor.java:1381)
at java.util.concurrent.ScheduledThreadPoolExecutor.delayedExecute(ScheduledThreadPoolExecutor.java:222)
at java.util.concurrent.ScheduledThreadPoolExecutor.scheduleAtFixedRate(ScheduledThreadPoolExecutor.java:426)
at java.util.concurrent.Executors$DelegatedScheduledExecutorService.scheduleAtFixedRate(Executors.java:656)
at com.mongodb.Mongo.createCursorCleaningService(Mongo.java:760)
at com.mongodb.Mongo.<init>(Mongo.java:307)
at com.mongodb.Mongo.<init>(Mongo.java:287)
at com.mongodb.Mongo.<init>(Mongo.java:283)
at com.mongodb.MongoClient.<init>(MongoClient.java:178)
at com.mongodb.MongoClient.<init>(MongoClient.java:153)
at com.mongodb.MongoClient.<init>(MongoClient.java:142)

I tested on my virtual local machine, My Real server has 16GB of RAM, will it work well ?, I need to run a large ETL from MySQL to MongoDB.

Is that not a performence way to load data from MySQL to MongoDB Because it use a lot of Memory ?

Any Suggestions or can we use a alternative way to do it ?

Thank you and Best Regards,
-----------------
Kenstep

Posted by Kenstep on January 27, 2015 at 07:33 PM PST #

Hi Kenstep

There is an updated KM on java.net you can try, as Vincenzo pointed out, the one previously posted was connecting on each insert so would consume many resources plus there was a debug loop that I removed to print out content. I updated the KM on java.net and posted there. The KM is on the java.net share site below;

https://java.net/projects/oracledi/downloads/download/ODI/Knowledge%20Modules/KM_IKM_SQL_to_Mongo_DB.xml

The other alternative is building a KM 'MySQL to MongoDB batch' that does an unload from MySQL and uses the CSV bulk loader into MongoDB. Let me know how the updated KM goes.

Cheers
David

Posted by David on January 28, 2015 at 08:05 AM PST #

Post a Comment:
  • HTML Syntax: NOT allowed
About

Learn the latest trends, use cases, product updates, and customer success examples for Oracle's data integration products-- including Oracle Data Integrator, Oracle GoldenGate and Oracle Enterprise Data Quality

Search

Archives
« February 2015
SunMonTueWedThuFriSat
1
2
3
4
5
6
7
8
9
11
12
13
14
15
17
18
20
21
22
23
24
25
26
27
28
       
       
Today