The Benefits of ODI Knowledge Modules: a Template Approach for Better Data Integration
By Christophe Dupupet on Nov 09, 2009
This post assumes that you have some level of familiarity with ODI. The concepts of Knowledge Module are used here assuming that you understand them in the context of ODI. If you need more details on these elements, please refer to the ODI Tutorial for a quick introduction, or to the complete ODI documentation for detailed information..
At the core, ODI knowledge modules are templates of code that will be leveraged for data integration tasks: they pre-define data integration steps that are required to extract, load, stage - if needed - and integrate data.
Several types of Knowledge Modules are available, and are grouped in families for Loading operations (LKMs), Integration operations (IKMs), Data Control operations (CKMs), and more.
For a more detailed description of what a knowledge module is, simply picture the multiple steps required to load data from a flat file into a database. You can connect to the file using JDBC, or leverage the database native loading utility (sqlldr for Oracle, bcp for SQL Server or Sybase, load for db2, etc.). External tables are another alternative for databases that support this feature.
As you use one or the other technique, you may first want to stage the data before loading your actual target table; in other cases, staging will only slow down your data load.
As far as the integration in the target system is concerned, again multiple strategies are available: simple inserts, inserts and updates, upserts, slowly changing dimension... these techniques may be as simple as one step, or be a complex series of commands that must be issued to your database for proper execution.
The Knowledge Modules will basically list these steps so that a developer who needs to repeat the same integration pattern only has to select the appropriate templates, versus re-developing the same logic over and over again.
The immediate benefits of this approach are well known and well documented:
- All developers use the same approach, and code development is consistent across the company, hence guarantying the quality of the code
- Productivity is greatly improved, as proven path are re-used versus being re-developed
- Code improvement and modification can be centralized and has a much broader impact: optimization and regulatory changes are done once and inherited by all processes
- Maintenance is greatly simplified
To fully appreciate all the benefits of using knowledge Modules, there is a lot more that needs to be exposed and understood about the technology. This post is a modest attempt at addressing this need.
GENERATION OF CODE AND TRANSFORMATIONS
Most tools today will offer the ability to generate SQL code (or some other type of code, such as scripts) on your source or target system. As most products come with a transformation engine, they will also generate proprietary code for this engine where data is staged (I'll skip the debate here as to whether a transformation engine is a staging area or not - the point being that code can be generated on either source, "middle-tier" or target).
However, real life requirements are rarely either/or. Often times, it makes sense to leverage all systems to optimize the processing: spread out the load for the transformations, reduce the amount of data to be transferred over the network, process the data where it is versus moving the data around solely for the purpose of transformations.
To achieve this, Data Integration tools must be able to distribute the transformation logic across the different systems.
Only ODI will effectively generate code and transformations on all systems. This feature is only possible thanks to the KM technology.
Beyond the ability to generate code, you have to make sure that the generated code is the best possible code for the selected technology. Too often, tools first generate code that is then translated for the appropriate database. With the KMs technology, no translation is required: the generated code was initially conceived explicitly for a given technology, hence taking advantage of all the specifics of this technology.
And since the KMs are technology specific, there is no limit to what can be leveraged on the databases, including user defined functions or stored procedures.
Whenever a tool generates code, the most common complaint is that there is very little (if any) control over the generated result. What if a simple modification of the code could provide dramatic performance improvements? Basic examples would include index management, statistics generation, joins management, and a lot more.
The KM technology is open and expansible so that developers have complete control over the code generation process. Beyond the ability to optimize the code, they can extend their solution to define and enforce in house best practices, and comply with corporate, industry or regulatory requirements. KMs Modifications are done directly from the developers graphical interface.
One point that can easily be adapted is whether data have to be materialized throughout the integration process. Some out-of-the-box KMs will explicitly land data in a physical file or tables. Others will avoid I/Os by leveraging pipes instead of files, views and synonyms instead of tables. Again, developers can adapt the behavior to their actual requirements.
EXPANDING THE TOOL TO NEW TECHNOLOGIES
How much time does it take to adapt your code to a new release of your database? How much time does it take to add a new technology altogether? In both cases, KMs will provide a quick and easy answer.
Let us start with the case of a new version of the database. While our engineering teams will release new KMs as quickly as possible to take advantage of the latest releases of any new database, you do not have to wait for them. A new release typically means new parameters for your DDL and DML, as well as new functions for your existing transformations. Adapt the existing KMs with the features you need, and in minutes your code is ready to leverage the latest and greatest of your database.
Likewise, if you ever need to define a new technology that would not be listed by ODI (in spite of the already extensive list we provide), simply define the behavior of this technology in the Topology interface, and design technology specific KMs to take advantage of the specific features of this database. I can guaranty you that 80% of the code you need (at least!) is already available in an existing KM... Thus dramatically reducing the amount of effort required to generate code for your own technology.
ARE KM MODIFICATIONS REQUIRED?
I am a strong advocate of the customization of KMs: I like to get the best I can out of what I am given. But often times, good enough is more than enough. I will always remember trying to optimize performance for a customer: we did not know initially what our processing window would be - other than "give us your best possible performance". The first out-of-the-box KM we tried processed the required 30,000,000 records in 20 minutes. Due to IT limitations, we could only leverage lesser systems for faster KMs... but still reduced performance to 6 minutes for the same volume of data. We started modifying KMs to get even better results, when the customer admitted that we actually had 3 hour for the process to complete... At this point, spending time in KM modifications was clearly not needed anymore.
KMs are meant to give the best possible performance out of the box. But every environment is unique, and assuming that we can have the best possible code for you before knowing your own specific challenges would be an illusion - hence the ability to push the product and the code to the limit
Another common question is: do you have to leverage both source and target systems as part of your transformations? Clearly, the answer is no. But in most cases, it is crucial to have the flexibility to leverage all systems, versus being cornered in using only one of them. Over time, you will want to reduce the volume of data transferred over the network; you will want to distribute some of your processing... all more reasons to leverage all available engines in your environment.
Do not hesitate and share with us how you extend your KMs!
Screenshots were taken using version 10.1.3.5 of ODI. Actual icons and graphical representations may vary with other versions of ODI.