ODI - Basic Hive Queries
By David Allan on Dec 28, 2012
Here we will see a basic example joining the movie lens data and then loading a Hive table based on the tables from the Reverse Engineering Hive Tables post. The Hive table was defined and created via ODI, I duplicated the movies table and added a column for the rating, just for demo purposes...
When I build my interface, I add movies for my source and movies_info as my target, the auto mapping completes much of the mapping, the rating (which is not mapped below) comes from another table - this is where ODI's incremental design is nice, I can add in a new datastore as a source and map columns from it, then describe the join.
After I have added the movie ratings table, I will define the join just by dragging movie_id from movies to the ratings table movie_id column. That's the join...mostly defined.
The other thing you need to check is that the ordered join property is set. This will generate the ordered join (ANSI style, but using the Hive technology's template) syntax.
We can also perform transformations using built in or user defined functions, below I am performing the Hive built-in UPPER function on the movie name for example.
In the physical, or flow view I am using the Hive Control Append IKM, I am using ODI to create the target table in Hive and also performing a truncate if it exists. Also have the control flow switched off.
Executing this is just like any other interface apart from we leverage Hive to perform the heavy lifting. The resultant execution can be inspected in the ODI operator or console and the resultant table inspected when complete.