« February 2009 | Main | April 2009 »

March 2009 Archives

March 9, 2009

5 Ways to Cut Costs for Data Warehousing

Are you trying to better understand what doing "more with less" means for data warehousing? Or are you making the mistake of trying to catch falling knives?

Well if it is the former, hopefully you didn't miss the I-Seminar today, and if it is the latter, well hopefully you have a first-aid kit handy. But in either case the I-Seminar would have done you some good.

But if you missed it, definitely pre-register for April's companion event: 5 Ways to Cutting Costs for Data Warehousing. I'll also be posting the recording of today's webinar once it is available.

Not giving away all 5 ways, but I'll touch on one: Improving Operational Efficiencies through Real-time Data Warehousing.

One important use case that we discussed was how to ensure that data is up to date. In a typical Data Warehouse example, real-time feeds are critical to ensure that the business reports are not looking at stale-week-old data. The more real-time data, the better and more informed your decision making will be.In many such scenarios, change data capture (CDC) plays a key role in keeping data consistently updated without impacting the target or source performance. In addition, these systems draw from a wide range of internal sales, customer, and financial data applications as well as third-party systems. This requires a broad range of data integration connectivity options to support moving data across such a wide variety of enterprise applications.

There are many ways to extract data from a DBMS, including queries, replication, table dumps, storage snapshots, and calls to the API of an application that sits over the database. Change data capture (CDC) is an alternate data extraction method that has recently become of interest, primarily because it enables data integration to operate closer to real time.

CDC can be applied to most database brands, including relational, legacy, mainframe, and file-based DBMSs. A few vendors have built CDC into their products, but many organizations use the data modeling and log capabilities of a DBMS to build their own solutions. CDC has been around for many years, but its ability to solve some of the most difficult data integration challenges is driving interest among IT professionals today.

A simple example of CDC in action follows. Two separate datasources for a web storefront (one for customer data, one for order data) are consolidated into a single data warehouse. To simply update the order details in real-time, only the delta (or set of orders and new customer info) needs to be propagated across to the data warehouse. This does not require moving all the data for both systems. Without CDC, business managers would not be able to see daily trends. In addition, business managers would be forced to wait for the next batch of data to load into the data warehouse before they could look at the results. By then it might be too late to make important informed decisions.

So what does this all have to do with cutting costs? In the I-seminar we reviewed a case study where a large European bank was able to improve their operational efficiencies. They started to see 50% improvements in loading and most importantly maintaining the timeliness of their data. This in turn improved the performance of their business intelligence applications and consequently improved their customer responsiveness.

I've provided you with just one way to reduce costs and improve efficiencies. There are more out there waiting to be discovered. Let us know ways that you're doing more with less and improving on your Data-centric archtiectures. And stay away from falling knives.


March 12, 2009

No Country for Old Code

It's doom and gloom out there. I don't need to tell you. Ghost towns of foreclosures, GDP contracting, jobs shrinking, stimuli amuck. The only profitable forms of business left today seem to be Whiskey, Cable television, and... Data Integration.

I get the Whiskey... but why Data Integration? More and more companies today have decided to take matters into their own hands and wrestle the "more with less" problem. These are the culprits they're currently hunting with a vengeance: old code.And Data Integration is the weapon of choice.

Data Integration offers ways to eliminate custom coding and custom SQL associated with moving, accessing, and managing data-centric architectures. Check out this article entitled, Data Integration: Full Steam Ahead Despite Weak Economy, by Stephen Swoyer.

A major trend is really data integration. [Customers are] not backing off. They're recognizing that if they want to compete on analytics … if they want to do fact-based decision-making, it's not just a matter of putting BI tools in more hands, but putting more capacity in the data warehouse. They' have to build a data integration foundation that will help them integrate data without having that horrible manual oversight

Don't let manual processes and custom code bog you down. We'll be discussing in our upcoming I-Seminar Series entitled "5 Ways to Reduce Costs for Data Warehousing." Be sure to check it out!

March 13, 2009

ODI to Essbase: There and Back Again

Once again, John Goodwin has outdone himself by an exhaustive in-depth study of how ODI and Essbase work together to improve data integration to Enterprise Performance Management applications. On his very first blog on the subject, John writes on his views of ODI:

If you load hierarchies from files or manage them straight from a repository then ODI (Oracle Data Integrator) could be the way you want to go, it offers all this functionality and so much more.

John goes on to describe in over 20 detailed blogs just what he means by "more" for how it works, tips and tricks, and the benefits of the ODI + Essbase solution. He describes how to use the connectors and Knowledge Modules – which provide pre-built functionality for specific tasks for Essbase integration that save effort in having to re-invent the wheel.

Below are the blog series for your reference where you can step through one or all of them. Keep his blog in your RSS feed for upcoming results. Thanks again John for your outstanding effort.

Introduction to Oracle Data Integrator
The Agent
Essbase Quick Fix
Configuration
Enter the Designer
SQL to Planning
Data Load to Planning
A diversion, Text to Planning
ODI & Planning - Brief look into file manipulation
Essbase Outline Extractor
ODI / Essbase challenge - Extracting alternate alias member information
Loading Essbase Metadata
Loading SQL Metadata into Essbase and its problems
Planning and multiple UDAs
Loading data into Essbase
Extracting data from Essbase
Essbase to Planning
ODI Release Update
ODI Series - Putting it all together
ODI Series – A few extra tips
ODI Series – The final installment
Essbase/Planning – Automating changes in data - Part 1
Essbase/Planning – Automating changes in data - Part 2
ODI Hyperion Knowledge Modules - Reporting statistics correctly

March 19, 2009

Cost Savings are the New Black for Data Warehousing

According to Margy Ross - from the Kimball Group - the biggest obstacle today in implementing data warehouses is improper IT and business alignment. The following except is from an interview by Mark Rittman:

[Margy Ross] : “The biggest challenge is effectively partnering with the business organizations to deliver a DW/BI solution that the business will embrace to support their decision making. Often what undermines the partnership is the IT manager’s lack of understanding of how the business works and the major opportunities for leveraging business intelligence to add value. In terms of advice for making a case for data warehousing, it’s critical that the DW manager walks in with their business counterparts to collectively establish the business case.”

I believe this "business case" is essential for data warehousing, and although Margy doesn't expand on it in this particular interview - I would like to.

The business case for Data Warehousing is not - as much as you would like it to be - making your BI applications work...or another popular case for Data Warehousing is "because we need it to integrate and consolidate data and applications in real-time". That's a feature, not a business case. That's part of why there is this lack of alignment. We often forget that our business management thinks differently than we do as IT managers, or as Margy puts it there is a "lack of understanding of how the business works" and what opportunities in Business Intelligence and Data Warehousing can lead to improved value.

So here are three ideas for you that you can start to think about for your Data Warehousing business cases.

#1 - Cost Savings

Cost savings are the new black - that's right. Reduce Dev costs. Reduce IT costs. Reduce costs from risk. Data Warehouses provide ways to consolidate and to eliminate unnecessary SQL or custom code; they provide a way to consolidate and modernize infrastructure and rationalize redudancies in data-marts, data-silos, and multiple data-centric applications. They provide an easy way to reduce risk of bad quality data.

#2 - Efficiencies

If cost savings are the new black, then efficiencies are well the new black too. These two are inseparable. Cost savings are great, but you want to do more with your less. In Data Warehousing this translates to performance improvements, real-time data, better and faster ways to implement and build out solutions

#3 - Business Insight

Why is it that we're stuck talking about our Data Warehouse in a 'static' feature/functional way? Your business wants data for a purpose. It needs information that is clean, realiable, consistent. So that it can do what with it? Make better informed business decisions and act on such a way that improves the bottom line of the business. Hey we're back full circle again to #1. It's the cost thing.

Economy issues or not. #1 business case for your Data Warehouse should be saving your company money. Even in an up-economy we should be doing that. That is where Business and IT need to 'align' to determine how to cut costs and improve efficiencies for their business.

You can read more about how to establish a better case for your Data Warehouse, BI, or Information Management initiative here.

Or send us your ideas on how we can better solve the alignment challenge.

March 26, 2009

SOA and MDM: Chocolate and Peanut Butter or Burnt Cookie?

Recently Joe McKendrick asked this question, "Is having SOA and Master Data Management at the same time a form of overkill?" . He argues that in fact they compliment each quite nicely:

SOA in and of itself holds little value to an organization unless it provides the capability to open up information to the enterprise. As is the case with SOA, successful MDM is a silo-breaker, invoking collaboration across the enterprise. MDM helps assure that the information populating SOA-based services is accurate, timely, and consistent.

While I agree that SOA needs to "open up information" to the enterprise to be successful, I'm not sure that I would say that MDM is the only mechanism to enable consistent, accurate, timely and accessible information-based services. There are multiple approaches to moving, managing and integrating data including ELT, Data Quality, Data Governance, Data Services and Data Federation (see a great post from Alex Kotopoulis), just to name a few. In fact, many of these mechanisms provide immediate "silo-breaking" value. And there is still disagreement on how these elements are used or not-used within MDM. I'll save that post for another day.

But if we go back to MDM as a broader definition for a moment - is it true that SOA and MDM are feasible to be put together? Does it mean we're exposing master data as "master data services" for re-use by a business process?

Regardless of your architectural recipe, SOA and MDM do have a place together. Provided that you undergo a bit of political wrestling to ensure its effectiveness (according to a post by David Linthicum) then will you start seeing the value out of SOA and MDM combined.


March 27, 2009

It's not you. It's your data.

I thought today I would send a light post, courtesy of Timo Elliott. A picture is worth a terabyte of words as they say.

its_not_you.png

March 30, 2009

Using Variables in ODI: The Timestamp Example

The posts in this series assume that you have some level of familiarity with ODI. The concepts of Model, Project, Interface and Package are used here assuming that you understand these concepts in the context of ODI. If you need more details on these elements, please refer to the ODI Tutorial for a quick introduction, or to the complete ODI documentation for detailed information on these concepts.

This is the first post of a series that will explore how variables can be used in ODI for more flexibility in code generation.
In this discussion, we will use a Timestamp to extract changes from a source table. This will help us illustrate the usage of variables in Interfaces and Packages.
Subsequent posts will explain the usage of variables in more advanced cases:
- using variables to pass parameters to a process
- bind variables
- using variables in Topology.

But first, let's get started with the general usage of variables with our Timestamp extract example.

INTRODUCTION

Detailed instructions will be provided throughout this post to describe how to reproduce this case. We will create 2 variables in ODI for this example. Then we will implement the following logic:
- Store current date and time in variable1 as we start our process
- Filter out the data from the table on using variable2 that contains the data and time of the last operation
- Update variable2 with the value of variable1
- Run again!

The examples given below assume that we are using an Oracle database, but can easily be adapted to work on any other database.

1. VARIABLES SETUP IN ODI

1.1 Variables Definition

1.1.1 General Information
The first step is to create two variables in ODI. The first variable will be used to store the date and time at which we started the process – only for the duration of the process. The second variable will be used to save that date for the next iteration of the package.

To create a variable in ODI, expand your project and right-click on the ‘Variables’ folder. Then select ‘Insert Variable’

Create New Variable


You will then see the Variable definition window:

New Variable Definition

1.1.2. First Variable: StartTime
We will name our first variable StartTime.
Set the data type to ‘Alphanumeric’, simply because it is generally easier to handle than dates.
Define a default value, for instance
’29-MAR-2009 10:08:12’.
Note the single quotes around the text for the default value. It is usually better to define a default value, if only to make sure that if you forget to set a value for your variable, you will not be left with a NULL...
Fill in the ‘Description’ field to explain what the variable will be used for. This is important as this field will be retrieved by ODI when the documentation is generated from the “Print” menu in your project:
This variable will be used to store the time at which the extraction process starts so that we can remember this value for the next run of this same process

Now click on the ‘Refreshing’ tab to see the following window:

Variable Refresh SQL Query

Select any logical schema on Oracle and type the following SQL query:
Select SYSDATE from DUAL

Note:: You can run the query to make sure it is valid. Click the ‘refresh’ button in the above window. Then close the variable definition window, and check in the ODI Logs that the query runs successfully. You can then re-open the variable definition window, and check the resulting value in the ‘History’ tab of the variable.

Click Ok to save your first variable.

1.1.3 Second Variable: LastUpdate
We will name our second variable LastUpdate.
Set the data type to ‘Alphanumeric’. The default value will be the original start date (basically the date for the oldest recors in our source file. (be careful here with the date format).
’01-JAN-1997 00:00:01’
No need to put a refresh value here. (Though an alternative may have been to put here a query to retrieve the oldest date in the source system!)

Click Ok to save your second variable

Both variables can be seen in the Project Tree if you expand the Variables entry.


2. USING THE VARIABLES IN INTERFACES AND PACKAGES

2.1. Use the ‘LastUpdate’ variable to filter incoming records

When you are designing your interface, you now want to filter data based on the timestamp. This will be done with the LastUpdate variable
1. In the interface, create a filter: drag and drop a date column in on the dark gray area to create the filter. ODI will display a little funnel and open a property box for you to specify the filter expression. That box is pre-filled with the column name
2. Edit the filter expression and add:
>=#LastUpdate
(We only want the records where the timestamp is greater than the value of the variable)

Variable In Interface

2.2 Create a package to handle the timestamp values

Now you can create a package (in your project, right click on Packages and select Insert Package), in which you will do the following:
1. Drag and drop the LastUpdate variable from the Project Tree into the package. Then click on the icon representing the variable in the package and set the type to declare variable. Set the step name to Declare LastUpdate
2. Drag and drop the StartTime variable from the Project Tree into the package. Then click on the icon and set the type to declare variable. Set the step name to Declare StartTime.
3. Drag and drop the StartTime variable from the Project Tree into the package again. Then click on this icon and check that the type is set to refresh variable. This will run the associated query and store the returned value in our variable.
4. Drag and drop your interface in the package
5.Drag and drop the variable LastUpdate again, but this time set the type to Set Variable, click the assign bullet and put the entire name of the StartTime variable (including the project code)

Note: in the screen shot below, replace SALES with the CODE of the project where the variables have been defined. To retrieve the project code, double-click on the project name in the objects tree

6. Then click on the PackageOK.PNG button (available in the Package upper toolbar), and connect the icons sequentially with green arrows to define the execution order for these steps. Once this is done, save your package. This package will now process incremental loads of your sales.

The screen shot below shows the sequential ordering of the steps, as well as the value set for our LastUpdate variable after processing the interface.

Variables In Package

3. UNDERSTANDING POTENTIAL ERRORS

3.1 Project CODE vs. project NAME

Be careful when you use a variable to prefix it with a # sign, and with the project CODE, not the project NAME.

3.2 Mistyped variable names

Mistyped variable names (in the variable affectation step, or in the interface) might have unexpected results. Pay attention in particular to the case used for the variable names and when you reference them: variables are case sensitive!

3.3 Understanding Variable Values

Variables values are not visible in the generated code that you can review in the Operator interface. For instance, the code we have described above for the interface would appear in the Operator as
Select (…) from (…) where HIRE_DATE>=#SALES.LastUpdate

Even though the variable value is not visible here, we know that ODI has recognized the variable as it has inserted the project code. This is good news and guaranties that the value will properly be substituted at runtime.

If you want to see the value taken by the variable at runtime, look at the History tab in the variable definition, or look at the “Variables” folder in the operator (under the session ID) and then check out the history tab of all variables used in this execution.

MORE ON VARIABLES IN ODI

For more information on ODI variables, please refer to the ODI Users Guide (part of the Documentation Library that comes with the ODI installation), in particular the entry "Creating and Using Variables"

All Screenshots were taken using version 10.1.3.5 of ODI. Actual icons and graphical representations may vary with other versions of ODI.

About March 2009

This page contains all entries posted to Data Integration and Management in March 2009. They are listed from oldest to newest.

February 2009 is the previous archive.

April 2009 is the next archive.

Many more can be found on the main index page or by looking through the archives.

Powered by
Movable Type and Oracle