Tuesday Jan 15, 2013

ODI - Hive and MongoDB

I've been experimenting with another Hive storage handler, this time for MongoDB, there are a few out there including this one from MongoDB. The one I have been using supports basic primitive types and also supports read and write - using the standard approach of storage handler class and custom properties to describe the data mask. This then lets you access MongoDB via hive external table very easily and abstract away a lot of integration complexity - also makes it ideal for using in ODI. I have been using on my Linux VM where I have Hive running to access my MongoDB running on an another machine. The storage handler is found here, I used it to access the same example I blogged about here, below is the external table definition;

  1. ADD JAR /home/oracle/mongo/hive-mongo.jar;

  2. create external table mongo_emps(EMPNO string, ENAME string, SAL int)  
  3. stored by "org.yong3.hive.mongo.MongoStorageHandler"  
  4. with serdeproperties ( "mongo.column.mapping" = "EMPNO,ENAME,SAL" )  
  5. tblproperties ( "mongo.host" = "<my_mongo_ipaddress>", "mongo.port" = "27017",  
  6. "mongo.db" = "test", "mongo.collection" = "myColl" );

Very simple. The nice aspect of the Hive external table are the SerDeProperties that can be specified, very simple but provides a nice flexible approach. I can then reverse engineer this into ODI (see reverse engineering posting here) and use it in my Hive integration mappings to read and potentially write to MongoDB.

The primitive types supported can also project nested document types, so for example in the document below (taken from here), name, contribs and awards are strings but have JSON structures;

  1. {
  2. "_id" : 1,
  3. "name" : {
  4. "first" : "John",
  5. "last" :"Backus"
  6. },
  7. "birth" : ISODate("1924-12-03T05:00:00Z"),
  8. "death" : ISODate("2007-03-17T04:00:00Z"),
  9. "contribs" : [ "Fortran", "ALGOL", "Backus-Naur Form", "FP" ],
  10. "awards" : [
  11. {
  12. "award" : "W.W. McDowellAward",
  13. "year" : 1967,
  14. "by" : "IEEE Computer Society"
  15. },
  16. {
  17. "award" : "National Medal of Science",
  18. "year" : 1975,
  19. "by" : "National Science Foundation"
  20. },
  21. {
  22. "award" : "Turing Award",
  23. "year" : 1977,
  24. "by" : "ACM"
  25. },
  26. {
  27. "award" : "Draper Prize",
  28. "year" : 1993,
  29. "by" : "National Academy of Engineering"
  30. }
  31. }

can be processed with the following external table definition, which then can be used in ODI;

  1. create external table mongo_bios(name string, birth string, death string, contribs string, awards string)  
  2. stored by "org.yong3.hive.mongo.MongoStorageHandler"  
  3. with serdeproperties ( "mongo.column.mapping" = "name,birth,death,contribs,awards" )  
  4. tblproperties ( "mongo.host" = "<my_ip_address>", "mongo.port" = "27017",  
  5. "mongo.db" = "test", "mongo.collection" = "bios" );

All very simple and that's what makes it so appealing. Anyway, that's a quick following on using external tables with MongoDB and Hive to the SQL oriented approach I described here that used java table functions.

About

Learn the latest trends, use cases, product updates, and customer success examples for Oracle's data integration products-- including Oracle Data Integrator, Oracle GoldenGate and Oracle Enterprise Data Quality

Search

Archives
« July 2014
SunMonTueWedThuFriSat
  
4
5
6
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
  
       
Today