Or, how can I use 11gR2 to update my operational data warehouse with real time data loads and still patch the schema to deal with errors or with upgrades due to changing business demands.
In a time when the data warehouse was refreshed weekly (wow, that often?) fixing the schema or the data was a piece of cake. The data was static, the users were querying (you hoped) away and you could tinker with the schema without really making a mess of it all. Take the DW down over the weekend, hey no problem. Those where the days, when you had nice batch windows to load data, weekends to upgrade the schema and the query layers and in general not too much pressure.
Both Operational Data Warehousing and Real Time Data Warehousing make all of the above a blast from the past and the fantasy dream world of all data warehouse architects.
Today chances are that your data warehouse is as critical to your business as the OLTP application that captures sales, orders or something else that makes you money. Taking the system down is not an option in many cases.
In OLTP land this reality has been around for a while... patching the app while the app is running is something that really saves time and money over there.
Oh, but we just said that the DW is like the OLTP app. Therefore, should I use the same patching process and functionality? Yes, and that is where we bring in Edition Based Redefinition.
We have a very small star schema, with a single fact table and two dimensions. The idea is to simply use this example and highlight some of the things that should be done to enable online patching. The goal is not to create a very complex scenario and cover every possible case to be looked at...
We are also looking at a real time or short load time interval. Again, the scenario will just be inserting a small number of rows, but it should actually show the principle quite well.
To simulate the query angle we have two users, one is the owner of the schema and the objects to be worked on, the other queries these objects. The owner is REDEF_OWN and the user is REDEF_QRY (and yes, very creative naming...).
Let's get going. We have created the tables (all three) and loaded the initial data set. Users are all happy and business is booming.
Our change means that the product dimension gets a second and third level, but most importantly name needs to be changed to being a code to uniformly identify the product across the globe. This means there will be some re-linking of data as some facts will go to a different product code (the new "name").
First, develop and test all of the steps on a copy in the development environment, then go to work on the editions in production. For a complete list of the steps to do take a look in the documentation (here) or download the script with the code for this example.
Now that you know your changes you will create Editioning Views that will regulate access to editioned data structures. This access is both ETL as well as the query access. This setup is shown in the following illustration.
What we did here is rename the actual tables which now have a _GEN suffix and created Editioning Views with the original names. The code looks like this (note the new keyword in the create view syntax!):
ALTER TABLE PRODUCTS RENAME TO PRODUCTS_GEN;
CREATE OR REPLACE EDITIONING VIEW PRODUCTS
With this redirection in place (and we do this for all three tables to ensure we can modify all of these) we are ready to create the new edition and start modifying the actual objects. As a quick side note on grants and security, make sure the query users (in this case REDEF_QRY) have the correct grants on both the tables and the views to allow for selects.
Once we are in the new edition all changes can be made, tested and then published. The screen above shows how to change the Edition and see what edition this session is working with. In this case, we are working in the PHASETWO edition.
As we add the columns to the table, make sure to not change the primary key (yet). We will do that later and by using the view, re-point the foreign key to that new primary key. Also note that we are not changing the definition of the description, so we will leave it as is...
The change looks like this:
The next step is to get the changes propagated from the live system into the new system. Remember data loads are going on as we set all of this up. For this we need a basic procedure that captures the logic and then use a cross edition trigger on the first edition to invoke the logic for each inserted row (or a batch).
We use a procedure to allow extensions to the example without directly modifying the trigger. For this simple example, the trigger probably would have been flexible enough to deal with the changes.
After creating the procedure (see the code for some simple PL/SQL procedure doing some work), create the forward edition trigger (this is in Edition PhaseTwo!) and enable it. For simplicity we will not go into waiting for pending DML. In a real scenario you may want to do this of course. See the links to the documentation for more information on that step.
At this point, both editions are still in synch. E.g. no data has been changed. First we transform the data to be upgraded by executing the transforms. Obviously, at this point, every insert going into the base schema will have the trigger fire and will show up transformed in the new edition.
Just to show what the actual status is, here is the view a query user will see when looking at products (he is looking at Ora$Base):
At the same time, if the owner looks at PRODUCTS (he is in PhaseTwo) he will see the following:
As you can see only the non-modified columns are showing (why? - See the actual code of the view - in short, because we are selecting the new columns...).
Using the code in the documentation we will apply the transformations in the new edition and see what the result will be:
We do the same steps to the fact table to cater for the redefinition of the dimension key (we went from 20 to 40). The details of this are in the code package.
At this point in time we have all data available in the PhaseTwo edition in it current form. Now a new ETL record with the following values comes in for the FACT table (SALES): 20, 10, 200, 3, 20. In other words we are still referring to dimension key 20.
This gives us the following results in the Ora$Base edition, note that the last record refers to dimension key 20 as a product_key:
In the PhaseTwo edition we see the record transformed and we see that the product key field is always 40, even for the new record:
In effect we have updated our data, redefined some codes, added some columns and all of this with no noticeable impact for the end users.
The situation is now as follows:
Now we should change the primary and foreign keys as well so it all refers to the new situation, we do this right before the switch of the users and the load processes. Because the cross edition trigger is in place you will get a key violation on the next generation primary key.
After we change the keys, the PK on PRODUCTS_GEN is named PRODUCT_PK_1. Inserting into the dimension in Ora$Base (there are no keys on PRD_ID_1 in that revision) will give the error coming from the trigger:
Now we are all set and we can switch the edition for all users:
ALTER DATABASE DEFAULT EDITION = PhaseTwo;
After a disconnect and reconnect of any users that were connected, they will see the new status and are by default running on the upgraded edition. Next steps will include the removal of the triggers and the procedures.
Obviously your ETL is now changing as well as there are more fields to fill and the fields have different names. Some of the logic built into the triggers will move into ETL. And this is where the whole testing in development and test comes into place. Just make sure the ETL cut over is a simple as the above for the query users.
The interesting discussion is now whether or not you want to do this exact construct with large batch ETL routines. I would say that when doing this in batch, it may make sense to do a dual data load and use the new logic to populate the new system with the old data. But I'm open to any comments on this...
As you can see we are adding columns to the original table. If you start with a clean sheet of paper, you may consider using Redefinition Views as the query object, and keeping the underlying tables more generic. In other words, rather than using _1 suffixes maybe you can use more common or global names that will allow you to keep the tables within some set naming standards (and not have slightly ad hoc _1 version indicators).
The idea therefore is to work with a level of indirection to the query users... if that makes sense.
Why is this so interesting? Didn't you test this all anyways in your test environment? Yes, all of that is true, as we all test our OLTP app upgrade in the lab. The point is that with this edition based approach I can run the ETL process continuously, everyone keeps on querying without any downtime, and then everything is all of a sudden changed with no downtime for the users. That means no interruption on the business at all... that is what you need when you are running an active data warehouse (or should we call this OLDW for On-Line Data Warehousing?).