Friday Apr 10, 2015

Customers Tell All: What Sets Oracle Apart in Big Data Integration

Data integration has become a critical component of many technology solutions that businesses pursue to differentiate in their markets. Instead of relying on manual coding in house, more and more businesses choose data integration solutions to support their strategic IT initiatives, from big data analytics to cloud integration.

To explore the differences among the leading data integration solutions and the impact their technologies are having on real-world businesses, Dao Research recently conducted a research study, where they interviewed IBM, Informatica, and Oracle customers. In addition they reviewed publicly available solution information from these three vendors.

The research revealed some key findings that explains Oracle's leadership in the data integration space. For example:

  • Customers who participated in this study cite a range of 30 to 60 % greater development productivity using Oracle Data Integrator vs traditional ETL tools from Informatica and IBM. Dao's research ties Oracle's advantage to product architecture differences such as native push-down processing, the seperation of logical and physical layers, and the ability to extend Oracle Data Integrator using its knowledge modules.
  • The research also showed that Oracle’s data integration cost of ownership is lower because of its unified platform strategy (versus offering multiple platforms and options), its use of source and target databases for processing, higher developer productivity, faster implementation, and it doesn’t require management resources for a middle-tier integration infrastructure.
  • In the area of big data integration, the study highlights Oracle’s advantage with its flexible and native solutions. Unlike competitors’ offerings, developed as separate solutions, Oracle’s solution is aware of the cluster environment of big data systems. Oracle enables big data integration and cloud data integration through the use of a single platform with common tooling and inherent support for big data processing environments.
  • I should add that the latest release of Oracle Data Integrator EE Big Data Options  widens the competitive gap. Oracle is the only vendor that can automatically generate Spark, Hive, and Pig transformations from a single mapping. Oracle Data Integration customers can focus on building the right architecture for driving business value, and do not have to become expert on multiple programming languages.  For example, an integration architect in a large financial services provider told the research company "As an ODI developer, I am a Big Data developer without having to understand the underpinnings of Big Data. That's pretty powerful capability."


You can find the report of Dao's research here:

I invite you to read this research paper to understand why more and more customers trust Oracle for their strategic data integration initiatives after working with or evaluating competitive offerings.


This Week's A-Team Blog Speaks to Automating Changes after Upgrading ODI or Migrating from Oracle Warehouse Builder

The A-Team not only provides great content, they are humorous too!

Check out this week’s post, the title says it all: Getting Groovy with Oracle Data Integrator: Automating Changes after Upgrading ODI or Migrating from Oracle Warehouse Builder

The article covers various scripts written in Groovy and leverage the ODI SDK that assist in automating massive changes to one’s repository. These initially came to be as a result of customer desire in enhancing their environment in their effort to move from Oracle Warehouse Builder (ODI) to Oracle Data Integrator (ODI), but in the end came the realization that these scripts could be used by any ODI user.

Happy reading!

Thursday Apr 09, 2015

ODI, Big Data SQL and Oracle NoSQL

Back in January Anuj posted an article here on using Oracle NoSQL via the Oracle database Big Data SQL feature. In this post, I guess you could call it part 2 of Anuj's I will follow up with how the Oracle external table is configured and how it all hangs together with manual code and via ODI. For this I used the Big Data Lite VM and also the newly released Oracle Data Integrator Big Data option. The BDA Lite VM 4.1 release uses version 3.2.5 of Oracle NoSQL - from this release I used the new declarative DDL for Oracle NoSQL to project the shape from NoSQL with some help from Anuj.

My goal for the integration design is to show a logical design in ODI and how KMs are used to realize the implementation and leverage Oracle Big Data SQL - this integration design supports predicate pushdown so I actually minimize data moved from my NoSQL store on Hadoop and the Oracle database - think speed and scalability! My NoSQL store contains user movie recommendations. I want to join this with reference data in Oracle which includes the customer information, movie and genre information and store in a summary table.

Here is the code to create and load the recommendation data in NoSQL - this would normally be computed by another piece of application logic in a real world scenario;

  • export KVHOME=/u01/nosql/kv-3.2.5
  • cd /u01/nosql/scripts
  • ./admin.sh

  • connect store -name kvstore
  • EXEC "CREATE TABLE recommendation( \
  •          custid INTEGER, \
  •          sno INTEGER, \
  •          genreid INTEGER,\
  •          movieid INTEGER,\
  •          PRIMARY KEY (SHARD(custid), sno, genreid, movieid))"
  • PUT TABLE -name RECOMMENDATION  -file /home/oracle/movie/moviework/bigdatasql/nosqldb/user_movie.json

The Manual Approach

This example is using the new data definition language in NoSQL. To make this accessible via Hive, users can create Hive external tables that use the NoSQL Storage Handler provided by Oracle. If this were manually coded in Hive, we could define the table as follows;

  • CREATE EXTERNAL TABLE IF NOT EXISTS recommendation(
  •                  custid INT,
  •                  sno INT,
  •                  genreId INT,
  •                  movieId INT)
  •           STORED BY 'oracle.kv.hadoop.hive.table.TableStorageHandler'
  •           TBLPROPERTIES  ( "oracle.kv.kvstore"="kvstore",
  •                            "oracle.kv.hosts"="localhost:5000",
  •                            "oracle.kv.hadoop.hosts"="localhost",
  •                            "oracle.kv.tableName"="recommendation");

At this point we have made NoSQL accessible to many components in the Hadoop stack - pretty much every component in the hadoop ecosystem can leverage the HCatalog entries defined be they Hive, Pig, Spark and so on. We are looking at Oracle Big Data SQL tho, so let's see how that is achieved. We must define an external table that uses either the SerDe or a Hive table, below you can see how the table has been defined in Oracle;

  • CREATE TABLE recommendation(
  •                  custid NUMBER,
  •                  sno NUMBER,
  •                  genreid NUMBER,
  •                  movieid NUMBER
  •          )
  •                  ORGANIZATION EXTERNAL
  •          (
  •                  TYPE ORACLE_HIVE
  •                  DEFAULT DIRECTORY DEFAULT_DIR
  •                  ACCESS PARAMETERS  (
  •                      com.oracle.bigdata.tablename=default.recommendation
  •                  )
  •          ) ;

Now we are ready to write SQL! Really!? Well let's see, below we can see the type of query we can do to join the NoSQL data with our Oracle reference data;

  • SELECT m.title, g.name, c.first_name
  • FROM recommendation r, movie m, genre g, customer c
  • WHERE r.movieid=m.movie_id and r.genreid=g.genre_id and r.custid=c.cust_id and r.custid=1255601 and r.sno=1 
  • ORDER by r.sno, r.genreid;

Great, we can now access the data from Oracle - we benefit from the scalability of the solution and minimal data movement! Let's make it better, let's make it more maintainable, flexibility to future changes and also accessible by more people by showing how it is done in ODI.

Oracle Data Integrator Approach

The data in NoSQL has a shape, we can capture that shape in ODI just as it is defined in NoSQL. We can then design mappings that manipulate the shape and load into whatever target we like. The SQL we saw above is represented in a logical mapping as below;


Users can use the same design experience as other data items and benefit from the mapping designer. They can join, map, transform just as normal. The ODI designer allows you to separate how you physically want this to happen from the logical semantics - this is all about giving you flexibility to change and adapt to new integration technologies and patterns.

In the physical design we can assign Knowledge Modules that take the responsibility of building the integration objects that we previously manually coded above. These KMs are generic so support all shapes and sizes of data items. Below you can see how the LKM is assigned for accessing Hive from Oracle;

This KM takes the role of building the external table - you can take this use it, customize it and the logical design stays the same. Why is that important? Integration recipes CHANGE as we learn more and developers build newer and better mechanisms to integrate. 

This KM takes care of creating the external table in Hive that access our NoSQL system. You could also have manually built the external table and imported this into ODI and used that as a source for the mapping, I want to show how the raw items can be integrated as the more metadata we have and you use to design the greater the flexibility in the future. The LKM Oracle NoSQL to Hive uses regular KM APIs to build the access object, here is a snippet from the KM;

  • create table <%=odiRef.getObjectName("L", odiRef.getTableName("COLL_SHORT_NAME"), "W")%>
  •  <%=odiRef.getColList("(", "[COL_NAME] [DEST_CRE_DT]", ", ", ")", "")%> 
  •           STORED BY 'oracle.kv.hadoop.hive.table.TableStorageHandler'
  •           TBLPROPERTIES  ( "oracle.kv.kvstore"="<%=odiRef.getInfo("SRC_SCHEMA")%>",
  •                            "oracle.kv.hosts"="<%=odiRef.getInfo("SRC_DSERV_NAME")%>",
  •                            "oracle.kv.hadoop.hosts"="localhost",
  •                            "oracle.kv.tableName"="<%=odiRef.getSrcTablesList("", "[TABLE_NAME]", ", ", "")%>");

You can see the templatized code versus literals, this still needs some work as you can see, can you spot some hard-wiring that needs fixed? ;-) This was using the 12.1.3.0.1 Big Data option of ODI so integration with Hive is much improved and it leverages the DataDirect driver which is also a big improvement. In this post I created a new technology for Oracle NoSQL in ODI, you can do this too for anything you want, I will post this technology on java.net and more so that as a community we can learn and share.

Summary 

Here we have seen how we can make seemingly complex integration tasks quite simple and leverage the best of data integration technologies today and importantly in the future!


Wednesday Apr 08, 2015

Oracle GoldenGate for DB2 z/OS Supports DB2 11

With the release of Oracle GoldenGate 12.1.2.1.4 release, Oracle GoldenGate for DB2 z/OS provides the support for DB2 11. This release also includes the fix to make Oracle GoldenGate z/OS Extract compatible with  IBM APAR PI12599 for DB2 z/OS. [Read More]

Monday Apr 06, 2015

Announcing Oracle Data Integrator for Big Data

Proudly announcing the availability of Oracle Data Integrator for Big Data. This release is the latest in the series of advanced Big Data updates and features that Oracle Data Integration is rolling out for customers to help take their Hadoop projects to the next level.

Increasing Big Data Heterogeneity and Transparency

This release sees significant additions in heterogeneity and governance for customers. Some significant highlights of this release include

  • Support for Apache Spark,
  • Support for Apache Pig, and
  • Orchestration using Oozie.

Click here for a detailed list of what is new in Oracle Data Integrator (ODI).

Oracle Data Integrator for Big Data helps transform and enrich data within the big data reservoir/data lake without users having to learn the languages necessary to manipulate them. ODI for Big Data generates native code that is then run on the underlying Hadoop platform without requiring any additional agents. ODI separates the design interface to build logic and the physical implementation layer to run the code. This allows ODI users to build business and data mappings without having to learn HiveQL, Pig Latin and Map Reduce.

Oracle Data Integrator for Big Data Webcast

We invite you to join us on the 30th of April for our webcast to learn more about Oracle Data Integrator for Big data and to get your questions answered about Big Data Integration. We discuss how the newly announced Oracle Data Integrator for Big Data

  • Provides advanced scale and expanded heterogeneity for big data projects 
  • Uniquely compliments Hadoop’s strengths to accelerate decision making, and 
  • Ensures sub second latency with Oracle GoldenGate for Big Data.


Thursday Mar 26, 2015

Oracle Big Data Lite 4.1.0 is available with more on Oracle GoldenGate and Oracle Data Integrator

Oracle's big data team has announced the newest Oracle Big Data Lite Virtual Machine 4.1.0.  This newest Big Data Lite Virtual Machine contains great improvements from a data integration perspective with inclusion of the recently released Oracle GoldenGate for Big Data.  You will see this in an improved demonstration that highlights inserts, updates, and deletes into Hive using Oracle GoldenGate for Big Data with Oracle Data Integrator performing a merge of the new operations into a consolidated table.

Big Data Lite is a pre-built environment which includes many of the key capabilities for Oracle's big data platform.   The components have been configured to work together in this Virtual Machine, providing a simple way to get started in a big data environment.  The components include Oracle Database, Cloudera Distribution including Apache Hadoop, Oracle Data Integrator, Oracle GoldenGate amongst others. 

Big Data Lite also contains hands-on labs and demonstrations to help you get started using the system.  Tame Big Data with Oracle Data Integration is a hands-on lab that teaches you how to design Hadoop data integration using Oracle Data Integrator and Oracle GoldenGate. 

                Start here to learn more!  Enjoy!

Wednesday Mar 11, 2015

Recap - How to Future Proof Your Big Data Investments - An Oracle webcast with Cloudera

Last week Oracle and Cloudera experts came together to discuss how Big Data is shaping Data Management.

Cloudera's Charles Zedlewski  spoke about  the Enterprise Data Hub, a Hadoop data store that can be used to store vast quantities of unstructured data. This data store, which is secure and governed, stores data in full fidelity, along with enriched and transformed data that can then be drawn upon to make critical business decisions.

Following Cloudera, Jeff Pollock emphasized the importance of Data Integration technologies that can take advantage of Hadoop clusters in their data processing methods. Offloading of data queries into the cluster, real time data ingestion into the data hub and building together an integrated Big Data reservoir were top of mind when designing Oracle's Data Management technologies.

Rounding the webcast up was Dain Hansen, who quizzed the two experts on some of the top mind questions that customers have about big data.

If you missed the webcast do not worry. You can watch it here on demand.

Friday Feb 27, 2015

How to Future Proof Your Big Data Investments - An Oracle webcast with Cloudera

Cutting through the Big Data Clutter

The Big Data world is changing rapidly, giving raise to new standards, languages and architectures. Customers are unclear about which Big Data technology will benefit their business the most, and how to future proof their Big Data investments.

This webcast helps customers sift through the changing Big Data architectures to help customer build their own resilient Big Data platform. Oracle and Cloudera experts discuss how enterprise platforms need to provide more flexibility to handle real-time and in memory computations for Big Data.



The speakers introduce the 4th generation architecture for Big Data that allows for expanded and critical capabilities to exist alongside each other. Customers can now see higher returns on their Big Data investment by ingesting real time data and improved data transformation for their Big Data analytics solutions. By choosing Oracle Data Integrator, Oracle GoldenGate and Oracle Enterprise Metadata Management, customers gain the ability to keep pace with changing Big Data technologies like Spark, Oozie, Pig and Flume without losing productivity and reduce risk through robust Big Data governance.

In this webcast we also discuss the newly announced Oracle GoldenGate for Big Data. With this release, customers can stream real time data from their heterogeneous production systems into Hadoop and other Big Data systems like Apache Hive, HBase and Flume. This brings real time capabilities to customer’s Big Data architecture allowing them to enhance their big data analytics and ensure their Big Data reservoirs are up-to-date with production systems.

Click here to mark your calendars and join us for the webcast to understand Big Data Integration and ensure that you are investing in the right Big Data Integration solutions.

Thursday Feb 19, 2015

Introducing Oracle GoldenGate for Big Data!

Big data systems and big data analytics solutions are becoming critical components of modern information management architectures.  Organizations realize that by combining structured transactional data with semi-structured and unstructured data they can realize the full potential value of their data assets, and achieve enhanced business insight. Businesses also notice that in today’s fast-paced, digital business environment to be agile and respond with immediacy, access to data with low latency is essential. Low-latency transactional data brings additional value especially for dynamically changing operations that day-old data, structured or unstructured, cannot deliver.

Today we announced the general availability of Oracle GoldenGate for Big Data product, which offers a real-time transactional data streaming platform into big data systems. By providing easy-to-use, real-time data integration for big data systems, Oracle GoldenGate for Big Data facilitates improved business insight for better customer experience. It also allows IT organizations to quickly move ahead with their big data projects without extensive training and management resources. Oracle GoldenGate for Big Data's real-time data streaming platform also allows customers to keep their big data reservoirs up to date with their production systems. 

Oracle GoldenGate’s fault-tolerant, secure and flexible architecture shines in this new big data streaming offering as well. Customers can enjoy secure and reliable data streaming with subseconds latency. With Oracle GoldenGate’s core log-based change data capture capabilities it enables real-time streaming without degrading the performance of the source production systems.

The new offering, Oracle GoldenGate for Big Data, provides integration for Apache Flume, Apache HDFS, Apache Hive and Apache Hbase. It also includes Oracle GoldenGate for Java, which enables customers to easily integrate to additional big data systems, such as Oracle NoSQL, Apache Kafka, Apache Storm, Apache Spark, and others.

You can learn more about our new offering via Oracle GoldenGate for Big Data data sheet and by registering for our upcoming webcast:

How to Future-Proof your Big Data Integration Solution

March 5th, 2015 10am PT/ 1pm ET

I invite you to join this webcast to learn from Oracle and Cloudera executives how to future-proof your big data infrastructure. The webcast will discuss :

  • Selection criteria that will drive business results with Big Data Integration 
  • Oracle's new big data integration and governance offerings, including Oracle GoldenGate for Big Data
  • Oracle’s comprehensive big data features in a unified platform 
  • How Cloudera Enterprise Data Hub and Oracle Data Integration combine to offer complementary features to store data in full fidelity, to transform and enrich the data for increased business efficiency and insights.

Hope you can join us and ask your questions to the experts.

Hive, Pig, Spark - Choose your Big Data Language with Oracle Data Integrator

The strength of Oracle Data Integrator (ODI) has always been the separation of logical design and physical implementation. Users can define a logical transformation flow that maps any sources to targets without being concerned what exact mechanisms would be used to realize such a job. In fact, ODI doesn’t have its own transformation engine but instead outsources all work to the native mechanisms of the underlying platforms, may it be relational databases, data warehouse appliances, or Hadoop clusters.

In the case of Big Data this philosophy of ODI gains even more importance. New Hadoop projects are incubated and released on a constant basis and introduce exciting new capabilities; the combined brain trust of the big data community conceives new technology that outdoes any proprietary ETL engine. ODI’s ability to separate your design from the implementation enables you to pick the ideal environment for your use case; and if the Hadoop landscape evolves, it is easy to retool an existing mapping with a new physical implementation. This way you don’t have to tie yourself to one language that is hyped this year, but might be legacy in the next.

ODI enables the generation from logical design into executed code through physical designs and Knowledge Modules. You can even define multiple physical designs for different languages based on the same logical design. For example, you could choose Hive as your transformation platform, and ODI would generate Hive SQL as the execution language. You could also pick Pig, and the generated code would be Pig Latin. If you choose Spark, ODI will generate PySpark code, which is Python with Spark APIs. Knowledge Modules will orchestrate the generation of code for the different languages and can be further configured to optimize the execution of the different implementation, for example parallelism in Pig or in-memory caching for Spark.

The example below shows an ODI mapping that reads from a log file in HDFS, registered in HCatalog. It gets filtered, aggregated, and then joined with another table, before being written into another HCatalog-based table. ODI can generate code for Hive, Pig, or Spark based on the Knowledge Modules chosen. 

 ODI provides developer productivity and can future-proof your investment by overcoming the need to manually code Hadoop transformations to a particular language.  You can logically design your mapping and then choose the implementation that best suits your use case.

Monday Feb 16, 2015

The Data Governance Commandments

This is the second of our Data Governance Series. Read the first part here.

The Four Pillars of Data Governance

Our Data Governance Commandments are simple principles that can help your organization get its data story straight, and get more value from customer, performance or employee data.

Data governance is a wide-reaching disciple, but like all walks of life, there are a handful of essential elements you need in place before you can start really enjoying the benefits of a good data governance strategy. These are the four key pillars of data governance:

People

Data is like any other asset your business has: It needs to be properly managed and maintained to ensure it continues delivering the best results.

Enter the data steward; a role dedicated to managing, curating and monitoring the flow of data through your organization. This can be a dedicated individual managing data full-time, or just a role appended to an existing employee’s tasks.

But do you really need one? If you take your data seriously, then someone should certainly be taking on this role; even if they only do it part-time.

Processes

So what are these data stewards doing with your data exactly? That’s for you to decide, and it’s the quantity and quality of these processes that will determine just how successful your data governance program is.

Whatever cleansing, cleaning and data management processes you undertake, you need to make sure they’re linked to your organization’s key metrics. Data accuracy, accessibility, consistency and completeness all make fine starting metrics, but you should add to these based on your strategic goals.

Technology

No matter how ordered your data is, it still needs somewhere to go, so you need to make sure your data warehouse is up to task, and is able to hold all your data in an organized fashion that complies with all your regulatory obligations.

But as data begins filling up your data warehouse, you’ll need to improve your level of data control and consider investing in a tool to better manage metadata: the data about other data. By managing metadata, you master the data itself, and can better anticipate data bottlenecks and discrepancies that could impact your data’s performance.

More importantly, metadata management allows you to better manage the flow of data—wherever it is going. You can manage and better control your data not just within the data warehouse or a business analytics tool, but across all systems, increasing transparency and minimizing security and compliance risks.

But even if you can control data across all your systems, you also need to ensure you have the analytics to put the data to use. Unless actionable insights are gleaned from your data, it’s just taking up space and gathering dust.

Best Practices

For your data governance to really deliver—and keep delivering—you need to follow best practices.

Stakeholders must be identified and held accountable, strategies must be in place to evolve your data workflows, and data KPIs must be measured and monitored. But that’s just the start. Data governance best practices are evolving rapidly, and only by keeping your finger on the pulse of the data industry can you prepare your governance strategy to succeed.

How Many Have You Got?

These four pillars are essential to holding up a great data governance strategy, and if you’re missing even one of them, you’re severely limiting the value and reliability of your data.

If you’re struggling to get all the pillars in place, you might want to read our short guide to data governance success.

Tuesday Feb 10, 2015

The Data Governance Commandments: Ignoring Your Data Challenges is Not an Option

This is the first of our Data Governance blog series. Read the next of the series here.

Our Data Governance Commandments are simple principles that can help your organization get its data story straight, and get more value from customer, performance or employee data.

All businesses are data businesses in the modern world, and if you’re collecting any information on employees, performance, operations, or your customers, your organization is swimming in data by now. Whether you’re using it, or just sitting on it, that data is there and it is most definitely your responsibility.

Even if you lock it in a vault and bury your head in the sand, that data will still be there, and it will still be:

  • Subject to changeable regulations and legislation
  • An appealing target for cybercriminals
  • An opportunity that you’re missing out on

Those are already three very good reasons to start working on your data strategy. But let’s break it down a bit more.

Regulations

Few things stand still in the world of business, but regulations in particular can move lightning-fast.

If your data is sat in a data warehouse you built a few years ago, that data could now be stored in an insecure format, listed incorrectly, and violating new regulations you haven’t taken into account.

You may be ignoring the data, but regulatory bodies aren’t—and you don’t want to find yourself knee-deep in fines.

Security

Your network is like a big wall around your business. Cybercriminals only need to find one crack in the brickwork, and they’ll come flooding in.

Sure, you’ve kept firewalls, anti-virus software and your critical servers up to date, but what about that old data warehouse? How’s that looking?

If you’ve taken your eye off your DW for even a second, you’re putting all that data at risk. And if the cybercriminals establish a backdoor through the DW into the rest of the organization, who knows how far the damage could spread?

If you lose just consumer reputation and business following such a data breach, consider yourself lucky. The impact could be far worse for the organization that ignores its data security issues.

Potential

Even without the dangers of data neglect, ignoring your data means you’re ignoring fantastic business opportunities. The data you’re ignoring could be helping your business:

  • Better target marketing and sales activities
  • Make more informed business decisions
  • Get more from key business applications
  • Improve process efficiency

Can you afford to ignore all of these benefits, and risk the security and compliance of your data?

Thankfully, there are plenty of ways you can start tightening up your data strategy right away.

Check out our short guide to data governance, and discover the three principles you need to follow to take control of your data.

Thursday Jan 29, 2015

Oracle GoldenGate 12c for Oracle Database - Integrated Capture sharing capture session

Oracle GoldenGate for Oracle Database has introduced several features in Release 12.1.2.1.0. In this blog post I would like to explain one of the interesting features:  “Integrated Capture (a.k.a. Integrated Extract) sharing capture session”. This feature allows making the creation of additional Integrated Extracts faster by leveraging existing LogMiner dictionaries. As Integrated Extract requires registering the Extract let’s first see what is ‘Registering the Extract’?

REGISTER EXTRACT EAMER DATABASE

The above command registers the Extract process with the database for what is called “Integrated Capture Mode”. In this mode the Extract interacts directly with the database LogMining server to receive data changes in the form of logical change records (LCRs).

When you create Integrated Extract prior to release Oracle GoldenGate 12.1.2.1.0, you might have seen the delay in registering the Extract with database. It is mainly because the creation of Integrated Extract involves dumping the dictionary and then processing that dictionary to populate LogMiner tables for each session, which causes overhead to online systems and hence it requires extra time to startup. The same process is being followed when you create additional Integrated Extract.

What if you could use the existing LogMiner dictionaries to create the additional Integrated Extract? This is what it has been done in this release. Additional Integrated Extract creation can be made faster significantly by leveraging existing LogMiner dictionaries which have been mined already. Hence no more separate copy of the LogMiner dictionaries to be dumped with each Integrated Extract. As a result, it will make the creation of additional Integrated Extracts much faster and helps avoid significant database overhead caused by dumping and processing the dictionary.

In order to use the feature, you should have Oracle DB version 12.1.0.2 or higher, and Oracle GoldenGate for Oracle version 12.1.2.1.0 or higher. The feature is currently supported for non-CDB databases only.

Command Syntax:

REGISTER EXTRACT group_mame DATABASE

..

{SHARE [AUTOMATIC | extract | NONE]}

It has primarily three options to select with; NONE is default if you don’t specify anything.

AUTOMATIC option will clone/share the LogMiner dictionary from the existing closest capture. If no suitable clone candidate is found, then a new LogMiner dictionary is created.

Extract option will clone/share from the capture session associated for the specified Extract. If this is not possible, then an error occurs the register does not complete.

NONE option does not clone or create a new LogMiner dictionary; this is the default.

While you use the feature, the SHARE options should be followed by SCN and specified SCN must be greater than or equal to at least one of the first SCN of existing captures and specified SCN should be less than current SCN.

Let’s see few behaviors prior to 12.1.2.1.0 release and with SHARE options. 'Current SCN’ indicates the current SCN value when register Extract command was executed in following example scenario.

Capture Name

LogMiner ID

First SCN

Start SCN

LogMiner Dictionary ID (LM-DID)

EXT1

1

60000

60000

1

EXT2

2

65000

65000

2

EXT3

3

60000

60000

3

EXT4

4

65000

66000

2

EXT5

5

60000

68000

1

EXT6

6

70000

70000

4

EXT7

7

60000

61000

1

EXT8

8

65000

68000

2

Behavior Prior to 12.1.2.1.0 – No Sharing

Register extract EXT1 with database (current SCN: 60000)

Register extract EXT2 with database (current SCN: 65000)

Register extract EXT3 with database SCN 60000 (current SCN: 65555)

Register extract EXT4 with database SCN 61000   à Error!!

Registration of Integrated Extract EXT1, EXT2 and EXT3 happened successfully where as EXT4 fails because the LogMiner server does not exist at SCN 61000.

Also take a note that all Integrated Extract (EXT1 – EXT3) created dictionaries separately (LogMiner Dictionary IDs are different, now onwards I’ll call them LM-DID).

New behavior with different SHARE options

  • Register extract EXT4 with database SHARE AUTOMATIC (current SCN: 66000)

EXT4 automatically chose the capture session EXT2 as it has Start SCN 65000 which is nearer to current SCN 66000. Hence EXT4 & EXT3 capture sessions would share the same LM-DID 2.

  • Register extract EXT5 with database SHARE EXT1 (current SCN: 68000)

EXT5 is sharing the capture session EXT1. Since EXT1 is up and running, it doesn’t give any error. LM-DID 1 would be shared across EXT 5 and EXT1 capture sessions.

  • Register extract EXT6 with database SHARE NONE (current SCN: 70000)

EXT6 is being registered with SHARE NONE option; hence the new LogMiner dictionary will be created or dumped. Please see LM-DID column for EXT6 in above table. It contains LM-DID value 4.

  • Register extract EXT7 with database SCN 61000 SHARE NONE (current SCN: 70000)

It would generate an error as similar to EXT4 @SCN61000. The LogMiner Server doesn’t exist at SCN 61000 and since the SHARE option is NONE, it won’t share the existing LogMiner dictionaries as well. This is same behavior as prior to 12.1.2.1.0 release.

  • Register extract EXT7 with database SCN 61000 SHARE AUTOMATIC (current SCN: 72000)

EXT7 is sharing the capture session EXT1 as it is the closest for SCN61000. You must have noticed that the EXT7 @SCN61000 scenario has passed with SHARE AUTOMATIC option, which was not the case earlier (EXT4 @61000).

  • Register extract EXT8 with database SCN 68000 SHARE EXT2 (current SCN: 76000)

EXT8 extract is sharing EXT2 capture session. Hence sharing of LogMiner dictionaries happens between EXT8 & EXT2

This feature is not only providing you faster start up for additional Integrated Extract, but also resolves few scenarios which wasn’t possible earlier. If you are using this feature and had questions or comments, please let me know by leaving your comments below. I’ll reply to you as soon as possible.

Thursday Jan 22, 2015

OTN Virtual Technology Summit Data Integration Subtrack Features Big Data Integration and Governance

I am sure many of you have heard about the quarterly Oracle Technology Network (OTN) Virtual Technology Summits. It provides a hands-on learning experience on the latest offerings from Oracle by bringing experts from our community and product management team. 

The next OTN Virtual Technology Summit is scheduled to February 11th (9am-12:30pm PT) and will feature Oracle's big data integration and metadata management capabilities with hands-on-lab content.

The Data Integration and Data Warehousing sub-track includes the following sessions and speakers:

Feb 11th 9:30am PT -- HOL: Real-Time Data Replication to Hadoop using GoldenGate 12c Adaptors

Oracle GoldenGate 12c is well known for its highly performant data replication between relational databases. With the GoldenGate Adaptors, the tool can now apply the source transactions to a Big Data target, such as HDFS. In this session, we'll explore the different options for utilizing Oracle GoldenGate 12c to perform real-time data replication from a relational source database into HDFS. The GoldenGate Adaptors will be used to load movie data from the source to HDFS for use by Hive. Next, we'll take the demo a step further and publish the source transactions to a Flume agent, allowing Flume to handle the final load into the targets.

Speaker: Michael Rainey, Oracle ACE, Principal Consultant, Rittman Mead

Feb 11th 10:30am PT -- HOL: Bringing Oracle Big Data SQL to Oracle Data Integration 12c Mappings

Oracle Big Data SQL extends Oracle SQL and Oracle Exadata SmartScan technology to Hadoop, giving developers the ability to execute Oracle SQL transformations against Apache Hive tables and extending the Oracle Database data dictionary to the Hive metastore. In this session we'll look at how Oracle Big Data SQL can be used to create ODI12c mappings against both Oracle Database and Hive tables, to combine customer data held in Oracle tables with incoming purchase activities stored on a Hadoop cluster. We'll look at the new transformation capabilities this gives you over Hadoop data, and how you can use ODI12c's Sqoop integration to copy the combined dataset back into the Hadoop environment.

Speaker: Mark Rittman, Oracle ACE Director, CTO and Co-Founder, Rittman Mead

Feb 11th 11:30am PT-- An Introduction to Oracle Enterprise Metadata Manager
This session takes a deep technical dive into the recently released Oracle Enterprise Metadata Manager. You’ll see the standard features of data lineage, impact analysis and version management applied across a myriad of Oracle and non-Oracle technologies into a consistent metadata whole, including Oracle Database, Oracle Data Integrator, Oracle Business Intelligence and Hadoop. This session will examine the Oracle Enterprise Metadata Manager "bridge" architecture and how it is similar to the ODI knowledge module. You will learn how to harvest individual sources of metadata, such as OBIEE, ODI, the Oracle Database and Hadoop, and you will learn how to create OEMM configurations that contain multiple metadata stores as a single coherent metadata strategy.

Speaker: Stewart Bryson, Oracle ACE Director, Owner and Co-founder, Red Pill Analytics

I invite you to register now to this free event and enjoy this feast for big data integration and governance enthusiasts.

Americas -- February 11th/ 9am to 12:30pm PT- Register Now

Please note the same OTN Virtual Technology Summit content will be presented again to EMEA and APAC. You can register for the via the links below.

EMEA – February 25th / 9am to 12:30pm GMT* - Register Now

APAC – March 4th / 9:30am-1:00pm IST* - Register Now

Join us and let us know how you like the data integration sessions in this quarter's OTN event.

Monday Jan 12, 2015

ODI 12c - Eclipse and Updated Mapping Builder Example

The 12c ODI release offers a full SDK which allows great opportunities for automation. You can reduce development times and work smarter. Here I will show how to use Eclipse to configure and setup the the SDK to start working smarter. In this example I will use an updated version of Mapping Builder which is a sample SDK illustration I blogged about previously. The builder has been updated to include auto mapping of attributes, multiple targets, loading modes and more. The updated OdiMappingBuilder.java code used in this example is here

There is a viewlet illustrating the using Eclipse and the ODI SDK here. The viewlet walks through the entire configuration of an Eclipse project using the 12.1.3 release of ODI. 


The important areas are with respect to the libraries you need to define for the project, below you can see the libraries used from the ODI installation, I have installed my software in C:\Oracle\ODI_1213 in these examples.

Once the libraries are configured you can import the mapping builder java source code...

Then you must define how the code is executed, you specify the parameters as arguments. This is of course an example for illustration, you can define how you would like this to be done in whatever way you like. You specify all of the mapping builder parameters here. I also pass the control file as a parameter rather than via stdin (which is what I did in previous blog posts).

I mentioned the mapping builder code now supports auto mapping, multiple targets and loading mode, you can see an example driver file below. The line starting with >mapping has 2 qualifiers, below I have used EQUALS and IGNORECASE. The first parameter  supports the following values;

  • EQUALS - supports matching EMPNO with EMPNO (or empno with EMPNO when IGNORECASE used for example)
  • SRCENDSWITH - supports matching source XEMPNO with EMPNO (plus case option as above)
  • TGTENDSWITH - supports matching source EMPNO with XEMPNO (plus case option as above)

The second parameter supports;

  • MATCH - exact match, matches EMPNO with EMPNO or empno with empno
  • IGNORECASE - supports mismatched case matching EMPNO matches empno

As well as auto mapping you can also specify specific column level mapping explicitly as was previously supported, the viewlet has an example. 

 You can also see the control file has a target_load directive which can be given a condition or else in the final field, this let's you load multiple conditional based targets. This is a new addition to the existing target directive. So the example has 3 directives now;

  • target - load via insert/append the data
  • targetinc - incremental update / merge the data
  • target_load - insert multiple targets 

In the example above, the following mapping is created, you can see the multiple conditional based targets and data is mapped all from a simple driver file;

Using Eclipse you can quickly build and debug your utilities and applications using the SDK and get the benefits of using an IDE such as code insight and auto completion.

Check out the viewlet and the mapping builder example plus the blogs on the SDK such as the recent one on the SDK overview and work smarter. Also, the updated OdiMappingBuilder.java code used in this example is here - get an overview of the mapping builder here.

About

Learn the latest trends, use cases, product updates, and customer success examples for Oracle's data integration products-- including Oracle Data Integrator, Oracle GoldenGate and Oracle Enterprise Data Quality

Search

Archives
« August 2015
SunMonTueWedThuFriSat
      
1
2
3
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
     
Today