By Alex Kotopoulis-Oracle on Feb 19, 2015
The strength of Oracle Data Integrator (ODI) has always been the separation of logical design and physical implementation. Users can define a logical transformation flow that maps any sources to targets without being concerned what exact mechanisms would be used to realize such a job. In fact, ODI doesn’t have its own transformation engine but instead outsources all work to the native mechanisms of the underlying platforms, may it be relational databases, data warehouse appliances, or Hadoop clusters.
In the case of Big Data this philosophy of ODI gains even more importance. New Hadoop projects are incubated and released on a constant basis and introduce exciting new capabilities; the combined brain trust of the big data community conceives new technology that outdoes any proprietary ETL engine. ODI’s ability to separate your design from the implementation enables you to pick the ideal environment for your use case; and if the Hadoop landscape evolves, it is easy to retool an existing mapping with a new physical implementation. This way you don’t have to tie yourself to one language that is hyped this year, but might be legacy in the next.
ODI enables the generation from logical design into executed code through physical designs and Knowledge Modules. You can even define multiple physical designs for different languages based on the same logical design. For example, you could choose Hive as your transformation platform, and ODI would generate Hive SQL as the execution language. You could also pick Pig, and the generated code would be Pig Latin. If you choose Spark, ODI will generate PySpark code, which is Python with Spark APIs. Knowledge Modules will orchestrate the generation of code for the different languages and can be further configured to optimize the execution of the different implementation, for example parallelism in Pig or in-memory caching for Spark.
The example below shows an ODI mapping that reads from a log file in HDFS, registered in HCatalog. It gets filtered, aggregated, and then joined with another table, before being written into another HCatalog-based table. ODI can generate code for Hive, Pig, or Spark based on the Knowledge Modules chosen.
ODI provides developer productivity and can future-proof your investment by overcoming the need to manually code Hadoop transformations to a particular language. You can logically design your mapping and then choose the implementation that best suits your use case.