Data Lineage in Oracle BI EE
By @lex on Oct 30, 2008
Earlier this year, I had the opportunity to work in a small Oracle BI EE project with a not so small source database. One of my tasks was to enable the project team members to easily identify objects with specific indications such as long-running queries or a high number of requests or a redundancy problem or the like.
To put it short, what they wanted was: Data Lineage for Oracle BI metadata
The design - and later implementation - included the following
- Oracle BI Usage Tracking (standard module)
- combined with presentation catalog metadata
- combined with rpd metadata
First, the Oracle BI rpd file is, well, a file and the data is not accessible like in a relational database. A fact which has been subject to criticism in recent times. The same holds true for the presentation catalog which has its information spread across thousands of small xml files.
To overcome this, we used the Repository Documentation utility of the Oracle BI Administration Tool. This allows you to create a flat csv file with the dependencies of all objects from presentation layer, business model and mapping layer and physical layer. The screenshot shows the file after the insertion of a primary key column, which is a necessary task if you wish to connect to it as a data source in Oracle BI. In the project we used an import package to load the file into a MS SQL Server database.
Both tasks are purely manual and up to this moment I have not found an automation solution for it (if you did, please drop a comment), which is the main reason why data lineage for Oracle BI is a little bit cumbersome. Whenever the rpd or catalog content changes, an administrator must manually extract the csv files and (ideally) load them into tables in a relational database.
As one commenter to this post has found out, there is an execution plan named Data Lineage in the current DAC repository (as created by the installer for Oracle BI Applications 7.9.5). A short investigation reveals that there are indeed table definitions which supposedly hold the data from the rpd and presentation catalog. At the moment this is only in the DAC repository and there is no accompanying ETL mapping in the Informatica repository. I would be so glad to be wrong here - if so, please add your comments below.
We will see what future versions of Oracle BI Applications bring. Maybe a mapping which loads the csv files into the tables along with a ready-to-use rpd file (just dreamin'...)
Now comes the knitting, which means that you have to create complex (physical) joins to glue the rpd metadata together with the presentation catalog data. For a first test you can use the presentation layer object names (table and column) to get first results in Answers. (This post does not deal with the creation of the business model and presentation layer in the rpd file to access the data).
The S_NQ_ACCT table holds the valuable data from Oracle BI Usage Tracking, it has a column which contains the request path which allows us to join it to the presentation catalog data.
Now we have a somewhat complete data lineage tool. Below is a screenshot from one of the first dashboard prototypes of the a/m project which allowed us to visualize the benefits of caching.
What about bringing in the DAC repository, the Informatica repository, the Siebel CRM repository (right-click on a Siebel applet and see a report which describes all BC fields, columns and tables and their mapping to Oracle BI)...whoa...just woke up
have a nice day