Next Generation Design Environments for Enterprise Data Architectures
By Christophe Dupupet on Sep 05, 2008
Isn’t it frustrating when you are nearly done with your integration project, just to realize that your data integration strategy is… well, not the best one you could have chosen? You now to revise potentially thousands of mappings, just when you are about to deliver your integration processes. The changes could be in the nature of how you process inserts an updates (this can have dramatic performance implications). Or a last minute modification in the integration requirements regarding errors processing, auditing of the changes applied to the data. The impact of these changes is dramatic on the existing developments, and their implementation is crucial from a business perspective. As most Data Integration products do not offer solutions that help with these types of massive changes, the implementation costs go skyrocket just when you thought you knew what the real cost was.
The Declarative Design approach offers a new paradigm that will make the definition of the data integration strategy flexible enough so that the 13th hour changes will only last minutes versus days. Data Integration processes are made of 3 major elements: data movement, data transformations and data quality. The data movement itself is made of two elements: the transport of the data from source to target, and the integration of the data itself in the target system: inserts, updates, creation of history for older records in the target systems (often referred to as Slowly Changing Dimension), etc. Declarative Design will separate the definition of the transformations from that of the transport and integration. Data Quality rules will also be defined separately. At execution time all components will be combined. But from a maintenance perspective, they can be handled separately.
The transformations themselves are fairly specific, and are usually defined at a column level. Of course, some transformation logic may be found in multiple places, but all mature tools will offer some abstraction layer to centralize the definition of these transformations and make them re-usable throughout the code. But generally speaking, we can consider that the data transformation required to build the information in the target system is defined one column at a time.
At a higher level, data quality rules can usually be defined at a table level, whether they are referential integrity rules or constraints on values or ranges of values. Even complex formulas can be applied at this level. What is important at this level is that all data that is loaded to any given table goes through the same validation logic.
The transport and integration strategies on the other end are quite common to all processes. There will be similarities in how you extract from the source systems and how you load data in the target system. If we consider 200 source tables that are loaded into 150 target tables (with all necessary lookup tables and transformations), the way we do the physical extraction from the source system will probably be the same for all 200 source tables, and the way we will integrate the data into the target tables will probably be the same for all 150 tables, maybe with 2 or 3 variations. Now let us assume that all we have to worry about is two extraction methods and three integration strategies: no matter how many more tables we add on our source and target systems, we would still have this same number of components. Furthermore, if we decide later in our project to modify either the way we extract data or the way we perform our integration, we only have a handful of components to revisit. Not all of the existing mappings. Tools that do not offer this segregation between transformation and data transfer will force you into re-visiting every single step of every single transformation, just to make sure that you have not overseen an option somewhere…
Using Declarative Design we can now build integration templates that pre-define how the integration will be performed. These templates (Knowledge Modules in ODI) will give you this integration flexibility, but will also bring a lot more to your developments. Now that the integration itself is pre-defined, you have the guaranty that all developers will use the integration best practices that you have defined. You will not have to deal with multiple implementations for the same integration logic. You can also task your most advanced developers with the definition of these integration strategies, and have all other developers not worry about this for a bit – and simply focus on the transformations themselves.
The net result will be better code generation overall and more agility for your developments and your maintenance.