Tuesday Nov 05, 2013

ODI 12c - Aggregating Data

This posting will look at the aggregation component that was introduced in ODI 12c. For many ETL tool users this shouldn't be a big surprise, its a little different than ODI 11g but for good reason. You can use this component for composing data with relational like operations such as sum, average and so forth. Also, Oracle SQL supports special functions called Analytic SQL functions, you can use a specially configured aggregation component or the expression component for these now in ODI 12c. In database systems an aggregate transformation is a transformation where the values of multiple rows are grouped together as input on certain criteria to form a single value of more significant meaning - that's exactly the purpose of the aggregate component.

In the image below you can see the aggregate component in action within a mapping, for how this and a few other examples are built look at the ODI 12c Aggregation Viewlet here - the viewlet illustrates a simple aggregation being built and then some Oracle analytic SQL such as AVG(EMP.SAL) OVER (PARTITION BY EMP.DEPTNO) built using both the aggregate component and the expression component.

In 11g you used to just write the aggregate expression directly on the target, this made life easy for some cases, but it wan't a very obvious gesture plus had other drawbacks with ordering of transformations (agg before join/lookup. after set and so forth) and supporting analytic SQL for example - there are a lot of postings from creative folks working around this in 11g - anything from customizing KMs, to bypassing aggregation analysis in the ODI code generator.

The aggregate component has a few interesting aspects.

1. Firstly and foremost it defines the attributes projected from it - ODI automatically will perform the grouping all you do is define the aggregation expressions for those columns aggregated. In 12c you can control this automatic grouping behavior so that you get the code you desire, so you can indicate that an attribute should not be included in the group by, that's what I did in the analytic SQL example using the aggregate component.

2. The component has a few other properties of interest; it has a HAVING clause and a manual group by clause. The HAVING clause includes a predicate used to filter rows resulting from the GROUP BY clause. Because it acts on the results of the GROUP BY clause, aggregation functions can be used in the HAVING clause predicate, in 11g the filter was overloaded and used for both having clause and filter clause, this is no longer the case. If a filter is after an aggregate, it is after the aggregate (not sometimes after, sometimes having). 

3. The manual group by clause let's you use special database grouping grammar if you need to. For example Oracle has a wealth of highly specialized grouping capabilities for data warehousing such as the CUBE function. If you want to use specialized functions like that you can manually define the code here. The example below shows the use of a manual group from an example in the Oracle database data warehousing guide where the SUM aggregate function is used along with the CUBE function in the group by clause.

The SQL I am trying to generate looks like the following from the data warehousing guide;

  1. SELECT channel_desc, calendar_month_desc, countries.country_iso_code,
  2.       TO_CHAR(SUM(amount_sold), '9,999,999,999') SALES$
  3. FROM sales, customers, times, channels, countries
  4. WHERE sales.time_id=times.time_id AND sales.cust_id=customers.cust_id AND
  5.   sales.channel_id= channels.channel_id
  6.  AND customers.country_id = countries.country_id
  7.  AND channels.channel_desc IN
  8.   ('Direct Sales', 'Internet') AND times.calendar_month_desc IN
  9.   ('2000-09', '2000-10') AND countries.country_iso_code IN ('GB', 'US')
  10. GROUP BY CUBE(channel_desc, calendar_month_desc, countries.country_iso_code);

I can capture the source datastores, the filters and joins using ODI's dataset (or as a traditional flow) which enables us to incrementally design the mapping and the aggregate component for the sum and group by as follows;

In the above mapping you can see the joins and filters declared in ODI's dataset, allowing you to capture the relationships of the datastores required in an entity-relationship style just like ODI 11g. The mix of ODI's declarative design and the common flow design provides for a familiar design experience. The example below illustrates flow design (basic arbitrary ordering) - a table load where only the employees who have maximum commission are loaded into a target. The maximum commission is retrieved from the bonus datastore and there is a look using employees as the driving table and only those with maximum commission projected.

Hopefully this has given you a taster for some of the new capabilities provided by the aggregate component in ODI 12c. In summary, the actions should be much more consistent in behavior and more easily discoverable for users, the use of the components in a flow graph also supports arbitrary designs and the tool (rather than the interface designer) takes care of the realization using ODI's knowledge modules.

Interested to know if a deep dive into each component is interesting for folks. Any thoughts? 

Thursday Dec 06, 2012

ODI 11g - Cleaning control characters and User Functions

In ODI user functions have a poor name really, they should be user expressions - a way of wrapping common expressions that you may wish to reuse many times - across many different technologies is an added bonus. To illustrate look at the problem of how to remove control characters from text. Users ask these types of questions over all technologies - Microsoft SQL Server, Oracle, DB2 and for many years - how do I clean a string, how do I tokenize a string and so on. After some searching around you will find a few ways of doing this, in Oracle there is a convenient way of using the TRANSLATE and REPLACE functions. So you can convert some text using the following SQL;

  • replace( translate('This is my string'||chr(9)||' which has a control character', chr(3)||chr(4)||chr(5)||chr(9), chr(3) ), chr(3), '' )

If you had many columns to perform this kind of transformation on, in the Oracle database the natural solution you'd go to would be to code this as a PLSQL function since you don't want the code splattered everywhere. Someone tells you that there is another control character that needs added equals a maintenance headache. Coding it as a PLSQL function will incur a context switch between SQL and PLSQL which could prove costly.

In ODI user functions let you capture this expression text and reference it many times across your mappings. This will protect the expression from being copy-pasted by developers and make maintenance much simpler - change the expression definition in one place.

Firstly define a name and a syntax for the user function, I am calling it UF_STRIP_BAD_CHARACTERS and it has one parameter an input string; 

We then can define an implementation for each technology we will use it, I will define Oracle's using the inputString parameter and the TRANSLATE and REPLACE functions with whatever control characters I want to replace;

I can then use this inside mapping expressions in ODI, below I am cleaning the ENAME column - a fabricated example but you get the gist.

 Note when I use the user function the function name remains in the text of the mapping, the actual expression is not substituted until I generate the scenario. If you generate the scenario and export the scenario you can have a peak at the code that is processed in the runtime - below you can see a snippet of my export scenario;

 That's all for now, hopefully a useful snippet of info.

Wednesday Oct 31, 2012

Ameristar Wins with Oracle GoldenGate’s Heterogeneous Real-Time Data Integration

Today we announced a press release about another successful project with Oracle GoldenGate. This time at Ameristar. Ameristar is a casino gaming company and needed a single data integration solution to connect multiple heterogeneous systems to its Teradata data warehouse.

The project involves integration of Ameristar’s promotional and gaming data from 14 data sources across its 7 casino hotel properties in real time into a central Teradata data warehouse. The source systems include the Aristocrat gaming and MGT promotional management platforms running on Microsoft SQL Server 2000 databases.

As you can notice, there was no Oracle Database involved in this project, but Ameristar’s IT leadership knew that  GoldenGate’s strong heterogeneous and real-time data integration capabilities is the right technology for their data warehousing project. With GoldenGate Ameristar was able to reduce data latency to the enterprise data warehouse, and use this real-time customer information for marketing teams in improving overall customer experience. Ameristar customers receive more targeted and timely campaign offers, and the company has more up-to-date visibility into financial metrics of the company.

One other key benefit the company experienced with GoldenGate is in operational costs. The previous data capture solution Ameristar used was trigger based and required a lot of effort to manage. They needed dedicated IT staff to maintain it. With GoldenGate, the solution runs seamlessly without needing a fully-dedicated staff, giving the IT team at Ameristar more resources for their other IT projects.

If you want to learn more about GoldenGate and the latest features for Oracle Database and non-Oracle databases, please watch our on demand webcast about Oracle GoldenGate 11g Release 2.

Friday Oct 05, 2012

Hidden Gems: Accelerating Oracle Data Integrator with SOA, Groovy, SDK, and XML

On the last day of Oracle OpenWorld, we had a final advanced session on getting the most out of Oracle Data Integrator through the use of various advanced techniques.

The primary way to improve your ODI processes is to choose the optimal knowledge modules for your load and take advantage of the optimized tools of your database, such as OracleDataPump and similar mechanisms in other databases. Knowledge modules also allow you to customize tasks, allowing you to codify best practices that are consistently applied by all integration developers.

ODI SDK is another very powerful means to automate and speed up your integration development process. This allows you to automate Life Cycle Management, code comparison, repetitive code generation and change of your integration projects. The SDK is easily accessible through Java or scripting languages such as Groovy and Jython.

Finally, all Oracle Data Integration products provide services that can be integrated into a larger Service Oriented Architecture. This moved data integration from an isolated environment into an agile part of a larger business process environment. All Oracle data integration products can play a part in thisracle GoldenGate can integrate into business event streams by processing JMS queues or publishing new events based on database transactions.

  • Oracle GoldenGate can integrate into business event streams by processing JMS queues or publishing new events based on database transactions.
  • Oracle Data Integrator allows full control of its runtime sessions through web services, so that integration jobs can become part of business processes.
  • Oracle Data Service Integrator provides a data virtualization layer over your distributed sources, allowing unified reading and updating for heterogeneous data without replicating and moving data.
  • Oracle Enterprise Data Quality provides data quality services to cleanse and deduplicate your records through web services.

About

Learn the latest trends, use cases, product updates, and customer success examples for Oracle's data integration products-- including Oracle Data Integrator, Oracle GoldenGate and Oracle Enterprise Data Quality

Search

Archives
« April 2014
SunMonTueWedThuFriSat
  
2
3
5
6
7
8
9
10
12
13
14
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
   
       
Today