The new Identity Manager 8.0 data exporter is a major new feature. As mentioned before, the architecture for Identity Manager is "data sparse", meaning we mostly store meta data about users and accounts, not the actual value. This greatly reduces the chance of passing "stale data" to systems under management. When you want to know a user's email address is, Identity Manager retrieves it from the email system, not from a repository. You have the freshest data, not what you think it was the last time you sync'd up.
This model has proven to be the correct approach in provisioning design. But it comes up short when customers want to do more auditing and performance analysis. When all you have is current fresh data, its tough to ask the question "who had access to the financial system the first two week of the month when a security breach occurred?" or "how long on average does it take for a user to be provisioned?". We can do it, but it gets messy in the audit logs.
So now 8.0 has a data exporter sub-system built in. It allows the deployment team to determine what objects in IdM they want to report on and capture that data from within IdM and post it to a series of staging tables. These staging tables can then be accessed by the warehouse ETL to bring the data into the warehouse.
The process is really quite elementary. A data exporting task is started that will form the data from the underlying IdM objects. You can run the task to only bring out information that has changed from the last time the data export task was run. The data is then loaded into the staging tables. This data is then transferred into the customer's dataware house of choice.
There are two schemas that need to be used. The first is an ObjectClass schema that helps form the object data within IdM into something that matches the staging tables schema. IdM 8.0 comes with scripts to build these staging tables in all of the supported repositories. They represent most of the important objects within IdM. If you want to export extended attributes, you will have to modify these ObjectClass schemas and tables. IdM also provides export schemas to read the staging tables and help modify the data into something the warehouse beans can consume. As shown above, we also provide a way to connect to the external tables and perform basic forensic reports from within IdM.
The beauty of this staging table dual schema approach is it abstracts the underlying structure of IdM from the data export interface, allowing changes to be made to IdM in future versions and not "break" existing export configurations. It allows IdM to be changed and the warehouse to change and not require a complete reconfiguration.
The new data export feature uses Java Hibernate
as the underlying transformation engine, which aids in the mapping of the IdM Java Objects into relational database table structures. IdM 8.0 provides default Warehouse Interface Code (WIC) that are Java classes that define the underlying schema of common IdM objects. Not all IdM objects are defined, but the important one are. They should work in a majority of the IdM deployments, but if extended or custom attributes need to be exported, these WIC classes will need to be redefined and regenerated. We include both binary and source versions of this code to help in the deployment configuration.
If you are considering using the data exporter feature, be sure to consider the impact it will have on the server it is deployed on. You may want to consider having a dedicated server for the exporting engine, as it is resource intensive. You definitely do not want to run it on a server that has a user GUI interface running, as this will slow the response time. You can schedule the exporter tasks to run in off hours to lower this impact. Good news, the new data exporter subsystem has JMX interfaces on both the pre and post staging tables beans to allow performance monitoring on how fast the exporter is working.
More in future posts. For now, good documentation
is available from our public website