Thursday Apr 09, 2015

ODI, Big Data SQL and Oracle NoSQL

Back in January Anuj posted an article here on using Oracle NoSQL via the Oracle database Big Data SQL feature. In this post, I guess you could call it part 2 of Anuj's I will follow up with how the Oracle external table is configured and how it all hangs together with manual code and via ODI. For this I used the Big Data Lite VM and also the newly released Oracle Data Integrator Big Data option. The BDA Lite VM 4.1 release uses version 3.2.5 of Oracle NoSQL - from this release I used the new declarative DDL for Oracle NoSQL to project the shape from NoSQL with some help from Anuj.

My goal for the integration design is to show a logical design in ODI and how KMs are used to realize the implementation and leverage Oracle Big Data SQL - this integration design supports predicate pushdown so I actually minimize data moved from my NoSQL store on Hadoop and the Oracle database - think speed and scalability! My NoSQL store contains user movie recommendations. I want to join this with reference data in Oracle which includes the customer information, movie and genre information and store in a summary table.

Here is the code to create and load the recommendation data in NoSQL - this would normally be computed by another piece of application logic in a real world scenario;

  • export KVHOME=/u01/nosql/kv-3.2.5
  • cd /u01/nosql/scripts
  • ./admin.sh

  • connect store -name kvstore
  • EXEC "CREATE TABLE recommendation( \
  •          custid INTEGER, \
  •          sno INTEGER, \
  •          genreid INTEGER,\
  •          movieid INTEGER,\
  •          PRIMARY KEY (SHARD(custid), sno, genreid, movieid))"
  • PUT TABLE -name RECOMMENDATION  -file /home/oracle/movie/moviework/bigdatasql/nosqldb/user_movie.json

The Manual Approach

This example is using the new data definition language in NoSQL. To make this accessible via Hive, users can create Hive external tables that use the NoSQL Storage Handler provided by Oracle. If this were manually coded in Hive, we could define the table as follows;

  • CREATE EXTERNAL TABLE IF NOT EXISTS recommendation(
  •                  custid INT,
  •                  sno INT,
  •                  genreId INT,
  •                  movieId INT)
  •           STORED BY 'oracle.kv.hadoop.hive.table.TableStorageHandler'
  •           TBLPROPERTIES  ( "oracle.kv.kvstore"="kvstore",
  •                            "oracle.kv.hosts"="localhost:5000",
  •                            "oracle.kv.hadoop.hosts"="localhost",
  •                            "oracle.kv.tableName"="recommendation");

At this point we have made NoSQL accessible to many components in the Hadoop stack - pretty much every component in the hadoop ecosystem can leverage the HCatalog entries defined be they Hive, Pig, Spark and so on. We are looking at Oracle Big Data SQL tho, so let's see how that is achieved. We must define an external table that uses either the SerDe or a Hive table, below you can see how the table has been defined in Oracle;

  • CREATE TABLE recommendation(
  •                  custid NUMBER,
  •                  sno NUMBER,
  •                  genreid NUMBER,
  •                  movieid NUMBER
  •          )
  •                  ORGANIZATION EXTERNAL
  •          (
  •                  TYPE ORACLE_HIVE
  •                  DEFAULT DIRECTORY DEFAULT_DIR
  •                  ACCESS PARAMETERS  (
  •                      com.oracle.bigdata.tablename=default.recommendation
  •                  )
  •          ) ;

Now we are ready to write SQL! Really!? Well let's see, below we can see the type of query we can do to join the NoSQL data with our Oracle reference data;

  • SELECT m.title, g.name, c.first_name
  • FROM recommendation r, movie m, genre g, customer c
  • WHERE r.movieid=m.movie_id and r.genreid=g.genre_id and r.custid=c.cust_id and r.custid=1255601 and r.sno=1 
  • ORDER by r.sno, r.genreid;

Great, we can now access the data from Oracle - we benefit from the scalability of the solution and minimal data movement! Let's make it better, let's make it more maintainable, flexibility to future changes and also accessible by more people by showing how it is done in ODI.

Oracle Data Integrator Approach

The data in NoSQL has a shape, we can capture that shape in ODI just as it is defined in NoSQL. We can then design mappings that manipulate the shape and load into whatever target we like. The SQL we saw above is represented in a logical mapping as below;


Users can use the same design experience as other data items and benefit from the mapping designer. They can join, map, transform just as normal. The ODI designer allows you to separate how you physically want this to happen from the logical semantics - this is all about giving you flexibility to change and adapt to new integration technologies and patterns.

In the physical design we can assign Knowledge Modules that take the responsibility of building the integration objects that we previously manually coded above. These KMs are generic so support all shapes and sizes of data items. Below you can see how the LKM is assigned for accessing Hive from Oracle;

This KM takes the role of building the external table - you can take this use it, customize it and the logical design stays the same. Why is that important? Integration recipes CHANGE as we learn more and developers build newer and better mechanisms to integrate. 

This KM takes care of creating the external table in Hive that access our NoSQL system. You could also have manually built the external table and imported this into ODI and used that as a source for the mapping, I want to show how the raw items can be integrated as the more metadata we have and you use to design the greater the flexibility in the future. The LKM Oracle NoSQL to Hive uses regular KM APIs to build the access object, here is a snippet from the KM;

  • create table <%=odiRef.getObjectName("L", odiRef.getTableName("COLL_SHORT_NAME"), "W")%>
  •  <%=odiRef.getColList("(", "[COL_NAME] [DEST_CRE_DT]", ", ", ")", "")%> 
  •           STORED BY 'oracle.kv.hadoop.hive.table.TableStorageHandler'
  •           TBLPROPERTIES  ( "oracle.kv.kvstore"="<%=odiRef.getInfo("SRC_SCHEMA")%>",
  •                            "oracle.kv.hosts"="<%=odiRef.getInfo("SRC_DSERV_NAME")%>",
  •                            "oracle.kv.hadoop.hosts"="localhost",
  •                            "oracle.kv.tableName"="<%=odiRef.getSrcTablesList("", "[TABLE_NAME]", ", ", "")%>");

You can see the templatized code versus literals, this still needs some work as you can see, can you spot some hard-wiring that needs fixed? ;-) This was using the 12.1.3.0.1 Big Data option of ODI so integration with Hive is much improved and it leverages the DataDirect driver which is also a big improvement. In this post I created a new technology for Oracle NoSQL in ODI, you can do this too for anything you want, I will post this technology on java.net and more so that as a community we can learn and share.

Summary 

Here we have seen how we can make seemingly complex integration tasks quite simple and leverage the best of data integration technologies today and importantly in the future!


Wednesday Apr 08, 2015

Oracle GoldenGate for DB2 z/OS Supports DB2 11

With the release of Oracle GoldenGate 12.1.2.1.4 release, Oracle GoldenGate for DB2 z/OS provides the support for DB2 11. This release also includes the fix to make Oracle GoldenGate z/OS Extract compatible with  IBM APAR PI12599 for DB2 z/OS. [Read More]

Monday Apr 06, 2015

Announcing Oracle Data Integrator for Big Data

Proudly announcing the availability of Oracle Data Integrator for Big Data. This release is the latest in the series of advanced Big Data updates and features that Oracle Data Integration is rolling out for customers to help take their Hadoop projects to the next level.

Increasing Big Data Heterogeneity and Transparency

This release sees significant additions in heterogeneity and governance for customers. Some significant highlights of this release include

  • Support for Apache Spark,
  • Support for Apache Pig, and
  • Orchestration using Oozie.

Click here for a detailed list of what is new in Oracle Data Integrator (ODI).

Oracle Data Integrator for Big Data helps transform and enrich data within the big data reservoir/data lake without users having to learn the languages necessary to manipulate them. ODI for Big Data generates native code that is then run on the underlying Hadoop platform without requiring any additional agents. ODI separates the design interface to build logic and the physical implementation layer to run the code. This allows ODI users to build business and data mappings without having to learn HiveQL, Pig Latin and Map Reduce.

Oracle Data Integrator for Big Data Webcast

We invite you to join us on the 30th of April for our webcast to learn more about Oracle Data Integrator for Big data and to get your questions answered about Big Data Integration. We discuss how the newly announced Oracle Data Integrator for Big Data

  • Provides advanced scale and expanded heterogeneity for big data projects 
  • Uniquely compliments Hadoop’s strengths to accelerate decision making, and 
  • Ensures sub second latency with Oracle GoldenGate for Big Data.


Thursday Mar 26, 2015

Oracle Big Data Lite 4.1.0 is available with more on Oracle GoldenGate and Oracle Data Integrator

Oracle's big data team has announced the newest Oracle Big Data Lite Virtual Machine 4.1.0.  This newest Big Data Lite Virtual Machine contains great improvements from a data integration perspective with inclusion of the recently released Oracle GoldenGate for Big Data.  You will see this in an improved demonstration that highlights inserts, updates, and deletes into Hive using Oracle GoldenGate for Big Data with Oracle Data Integrator performing a merge of the new operations into a consolidated table.

Big Data Lite is a pre-built environment which includes many of the key capabilities for Oracle's big data platform.   The components have been configured to work together in this Virtual Machine, providing a simple way to get started in a big data environment.  The components include Oracle Database, Cloudera Distribution including Apache Hadoop, Oracle Data Integrator, Oracle GoldenGate amongst others. 

Big Data Lite also contains hands-on labs and demonstrations to help you get started using the system.  Tame Big Data with Oracle Data Integration is a hands-on lab that teaches you how to design Hadoop data integration using Oracle Data Integrator and Oracle GoldenGate. 

                Start here to learn more!  Enjoy!

Wednesday Mar 11, 2015

Recap - How to Future Proof Your Big Data Investments - An Oracle webcast with Cloudera

Last week Oracle and Cloudera experts came together to discuss how Big Data is shaping Data Management.

Cloudera's Charles Zedlewski  spoke about  the Enterprise Data Hub, a Hadoop data store that can be used to store vast quantities of unstructured data. This data store, which is secure and governed, stores data in full fidelity, along with enriched and transformed data that can then be drawn upon to make critical business decisions.

Following Cloudera, Jeff Pollock emphasized the importance of Data Integration technologies that can take advantage of Hadoop clusters in their data processing methods. Offloading of data queries into the cluster, real time data ingestion into the data hub and building together an integrated Big Data reservoir were top of mind when designing Oracle's Data Management technologies.

Rounding the webcast up was Dain Hansen, who quizzed the two experts on some of the top mind questions that customers have about big data.

If you missed the webcast do not worry. You can watch it here on demand.

Friday Feb 27, 2015

How to Future Proof Your Big Data Investments - An Oracle webcast with Cloudera

Cutting through the Big Data Clutter

The Big Data world is changing rapidly, giving raise to new standards, languages and architectures. Customers are unclear about which Big Data technology will benefit their business the most, and how to future proof their Big Data investments.

This webcast helps customers sift through the changing Big Data architectures to help customer build their own resilient Big Data platform. Oracle and Cloudera experts discuss how enterprise platforms need to provide more flexibility to handle real-time and in memory computations for Big Data.



The speakers introduce the 4th generation architecture for Big Data that allows for expanded and critical capabilities to exist alongside each other. Customers can now see higher returns on their Big Data investment by ingesting real time data and improved data transformation for their Big Data analytics solutions. By choosing Oracle Data Integrator, Oracle GoldenGate and Oracle Enterprise Metadata Management, customers gain the ability to keep pace with changing Big Data technologies like Spark, Oozie, Pig and Flume without losing productivity and reduce risk through robust Big Data governance.

In this webcast we also discuss the newly announced Oracle GoldenGate for Big Data. With this release, customers can stream real time data from their heterogeneous production systems into Hadoop and other Big Data systems like Apache Hive, HBase and Flume. This brings real time capabilities to customer’s Big Data architecture allowing them to enhance their big data analytics and ensure their Big Data reservoirs are up-to-date with production systems.

Click here to mark your calendars and join us for the webcast to understand Big Data Integration and ensure that you are investing in the right Big Data Integration solutions.

Thursday Feb 19, 2015

Introducing Oracle GoldenGate for Big Data!

Big data systems and big data analytics solutions are becoming critical components of modern information management architectures.  Organizations realize that by combining structured transactional data with semi-structured and unstructured data they can realize the full potential value of their data assets, and achieve enhanced business insight. Businesses also notice that in today’s fast-paced, digital business environment to be agile and respond with immediacy, access to data with low latency is essential. Low-latency transactional data brings additional value especially for dynamically changing operations that day-old data, structured or unstructured, cannot deliver.

Today we announced the general availability of Oracle GoldenGate for Big Data product, which offers a real-time transactional data streaming platform into big data systems. By providing easy-to-use, real-time data integration for big data systems, Oracle GoldenGate for Big Data facilitates improved business insight for better customer experience. It also allows IT organizations to quickly move ahead with their big data projects without extensive training and management resources. Oracle GoldenGate for Big Data's real-time data streaming platform also allows customers to keep their big data reservoirs up to date with their production systems. 

Oracle GoldenGate’s fault-tolerant, secure and flexible architecture shines in this new big data streaming offering as well. Customers can enjoy secure and reliable data streaming with subseconds latency. With Oracle GoldenGate’s core log-based change data capture capabilities it enables real-time streaming without degrading the performance of the source production systems.

The new offering, Oracle GoldenGate for Big Data, provides integration for Apache Flume, Apache HDFS, Apache Hive and Apache Hbase. It also includes Oracle GoldenGate for Java, which enables customers to easily integrate to additional big data systems, such as Oracle NoSQL, Apache Kafka, Apache Storm, Apache Spark, and others.

You can learn more about our new offering via Oracle GoldenGate for Big Data data sheet and by registering for our upcoming webcast:

How to Future-Proof your Big Data Integration Solution

March 5th, 2015 10am PT/ 1pm ET

I invite you to join this webcast to learn from Oracle and Cloudera executives how to future-proof your big data infrastructure. The webcast will discuss :

  • Selection criteria that will drive business results with Big Data Integration 
  • Oracle's new big data integration and governance offerings, including Oracle GoldenGate for Big Data
  • Oracle’s comprehensive big data features in a unified platform 
  • How Cloudera Enterprise Data Hub and Oracle Data Integration combine to offer complementary features to store data in full fidelity, to transform and enrich the data for increased business efficiency and insights.

Hope you can join us and ask your questions to the experts.

Hive, Pig, Spark - Choose your Big Data Language with Oracle Data Integrator

The strength of Oracle Data Integrator (ODI) has always been the separation of logical design and physical implementation. Users can define a logical transformation flow that maps any sources to targets without being concerned what exact mechanisms would be used to realize such a job. In fact, ODI doesn’t have its own transformation engine but instead outsources all work to the native mechanisms of the underlying platforms, may it be relational databases, data warehouse appliances, or Hadoop clusters.

In the case of Big Data this philosophy of ODI gains even more importance. New Hadoop projects are incubated and released on a constant basis and introduce exciting new capabilities; the combined brain trust of the big data community conceives new technology that outdoes any proprietary ETL engine. ODI’s ability to separate your design from the implementation enables you to pick the ideal environment for your use case; and if the Hadoop landscape evolves, it is easy to retool an existing mapping with a new physical implementation. This way you don’t have to tie yourself to one language that is hyped this year, but might be legacy in the next.

ODI enables the generation from logical design into executed code through physical designs and Knowledge Modules. You can even define multiple physical designs for different languages based on the same logical design. For example, you could choose Hive as your transformation platform, and ODI would generate Hive SQL as the execution language. You could also pick Pig, and the generated code would be Pig Latin. If you choose Spark, ODI will generate PySpark code, which is Python with Spark APIs. Knowledge Modules will orchestrate the generation of code for the different languages and can be further configured to optimize the execution of the different implementation, for example parallelism in Pig or in-memory caching for Spark.

The example below shows an ODI mapping that reads from a log file in HDFS, registered in HCatalog. It gets filtered, aggregated, and then joined with another table, before being written into another HCatalog-based table. ODI can generate code for Hive, Pig, or Spark based on the Knowledge Modules chosen. 

 ODI provides developer productivity and can future-proof your investment by overcoming the need to manually code Hadoop transformations to a particular language.  You can logically design your mapping and then choose the implementation that best suits your use case.

Monday Feb 16, 2015

The Data Governance Commandments

This is the second of our Data Governance Series. Read the first part here.

The Four Pillars of Data Governance

Our Data Governance Commandments are simple principles that can help your organization get its data story straight, and get more value from customer, performance or employee data.

Data governance is a wide-reaching disciple, but like all walks of life, there are a handful of essential elements you need in place before you can start really enjoying the benefits of a good data governance strategy. These are the four key pillars of data governance:

People

Data is like any other asset your business has: It needs to be properly managed and maintained to ensure it continues delivering the best results.

Enter the data steward; a role dedicated to managing, curating and monitoring the flow of data through your organization. This can be a dedicated individual managing data full-time, or just a role appended to an existing employee’s tasks.

But do you really need one? If you take your data seriously, then someone should certainly be taking on this role; even if they only do it part-time.

Processes

So what are these data stewards doing with your data exactly? That’s for you to decide, and it’s the quantity and quality of these processes that will determine just how successful your data governance program is.

Whatever cleansing, cleaning and data management processes you undertake, you need to make sure they’re linked to your organization’s key metrics. Data accuracy, accessibility, consistency and completeness all make fine starting metrics, but you should add to these based on your strategic goals.

Technology

No matter how ordered your data is, it still needs somewhere to go, so you need to make sure your data warehouse is up to task, and is able to hold all your data in an organized fashion that complies with all your regulatory obligations.

But as data begins filling up your data warehouse, you’ll need to improve your level of data control and consider investing in a tool to better manage metadata: the data about other data. By managing metadata, you master the data itself, and can better anticipate data bottlenecks and discrepancies that could impact your data’s performance.

More importantly, metadata management allows you to better manage the flow of data—wherever it is going. You can manage and better control your data not just within the data warehouse or a business analytics tool, but across all systems, increasing transparency and minimizing security and compliance risks.

But even if you can control data across all your systems, you also need to ensure you have the analytics to put the data to use. Unless actionable insights are gleaned from your data, it’s just taking up space and gathering dust.

Best Practices

For your data governance to really deliver—and keep delivering—you need to follow best practices.

Stakeholders must be identified and held accountable, strategies must be in place to evolve your data workflows, and data KPIs must be measured and monitored. But that’s just the start. Data governance best practices are evolving rapidly, and only by keeping your finger on the pulse of the data industry can you prepare your governance strategy to succeed.

How Many Have You Got?

These four pillars are essential to holding up a great data governance strategy, and if you’re missing even one of them, you’re severely limiting the value and reliability of your data.

If you’re struggling to get all the pillars in place, you might want to read our short guide to data governance success.

Tuesday Feb 10, 2015

The Data Governance Commandments: Ignoring Your Data Challenges is Not an Option

This is the first of our Data Governance blog series. Read the next of the series here.

Our Data Governance Commandments are simple principles that can help your organization get its data story straight, and get more value from customer, performance or employee data.

All businesses are data businesses in the modern world, and if you’re collecting any information on employees, performance, operations, or your customers, your organization is swimming in data by now. Whether you’re using it, or just sitting on it, that data is there and it is most definitely your responsibility.

Even if you lock it in a vault and bury your head in the sand, that data will still be there, and it will still be:

  • Subject to changeable regulations and legislation
  • An appealing target for cybercriminals
  • An opportunity that you’re missing out on

Those are already three very good reasons to start working on your data strategy. But let’s break it down a bit more.

Regulations

Few things stand still in the world of business, but regulations in particular can move lightning-fast.

If your data is sat in a data warehouse you built a few years ago, that data could now be stored in an insecure format, listed incorrectly, and violating new regulations you haven’t taken into account.

You may be ignoring the data, but regulatory bodies aren’t—and you don’t want to find yourself knee-deep in fines.

Security

Your network is like a big wall around your business. Cybercriminals only need to find one crack in the brickwork, and they’ll come flooding in.

Sure, you’ve kept firewalls, anti-virus software and your critical servers up to date, but what about that old data warehouse? How’s that looking?

If you’ve taken your eye off your DW for even a second, you’re putting all that data at risk. And if the cybercriminals establish a backdoor through the DW into the rest of the organization, who knows how far the damage could spread?

If you lose just consumer reputation and business following such a data breach, consider yourself lucky. The impact could be far worse for the organization that ignores its data security issues.

Potential

Even without the dangers of data neglect, ignoring your data means you’re ignoring fantastic business opportunities. The data you’re ignoring could be helping your business:

  • Better target marketing and sales activities
  • Make more informed business decisions
  • Get more from key business applications
  • Improve process efficiency

Can you afford to ignore all of these benefits, and risk the security and compliance of your data?

Thankfully, there are plenty of ways you can start tightening up your data strategy right away.

Check out our short guide to data governance, and discover the three principles you need to follow to take control of your data.

Thursday Jan 29, 2015

Oracle GoldenGate 12c for Oracle Database - Integrated Capture sharing capture session

Oracle GoldenGate for Oracle Database has introduced several features in Release 12.1.2.1.0. In this blog post I would like to explain one of the interesting features:  “Integrated Capture (a.k.a. Integrated Extract) sharing capture session”. This feature allows making the creation of additional Integrated Extracts faster by leveraging existing LogMiner dictionaries. As Integrated Extract requires registering the Extract let’s first see what is ‘Registering the Extract’?

REGISTER EXTRACT EAMER DATABASE

The above command registers the Extract process with the database for what is called “Integrated Capture Mode”. In this mode the Extract interacts directly with the database LogMining server to receive data changes in the form of logical change records (LCRs).

When you create Integrated Extract prior to release Oracle GoldenGate 12.1.2.1.0, you might have seen the delay in registering the Extract with database. It is mainly because the creation of Integrated Extract involves dumping the dictionary and then processing that dictionary to populate LogMiner tables for each session, which causes overhead to online systems and hence it requires extra time to startup. The same process is being followed when you create additional Integrated Extract.

What if you could use the existing LogMiner dictionaries to create the additional Integrated Extract? This is what it has been done in this release. Additional Integrated Extract creation can be made faster significantly by leveraging existing LogMiner dictionaries which have been mined already. Hence no more separate copy of the LogMiner dictionaries to be dumped with each Integrated Extract. As a result, it will make the creation of additional Integrated Extracts much faster and helps avoid significant database overhead caused by dumping and processing the dictionary.

In order to use the feature, you should have Oracle DB version 12.1.0.2 or higher, and Oracle GoldenGate for Oracle version 12.1.2.1.0 or higher. The feature is currently supported for non-CDB databases only.

Command Syntax:

REGISTER EXTRACT group_mame DATABASE

..

{SHARE [AUTOMATIC | extract | NONE]}

It has primarily three options to select with; NONE is default if you don’t specify anything.

AUTOMATIC option will clone/share the LogMiner dictionary from the existing closest capture. If no suitable clone candidate is found, then a new LogMiner dictionary is created.

Extract option will clone/share from the capture session associated for the specified Extract. If this is not possible, then an error occurs the register does not complete.

NONE option does not clone or create a new LogMiner dictionary; this is the default.

While you use the feature, the SHARE options should be followed by SCN and specified SCN must be greater than or equal to at least one of the first SCN of existing captures and specified SCN should be less than current SCN.

Let’s see few behaviors prior to 12.1.2.1.0 release and with SHARE options. 'Current SCN’ indicates the current SCN value when register Extract command was executed in following example scenario.

Capture Name

LogMiner ID

First SCN

Start SCN

LogMiner Dictionary ID (LM-DID)

EXT1

1

60000

60000

1

EXT2

2

65000

65000

2

EXT3

3

60000

60000

3

EXT4

4

65000

66000

2

EXT5

5

60000

68000

1

EXT6

6

70000

70000

4

EXT7

7

60000

61000

1

EXT8

8

65000

68000

2

Behavior Prior to 12.1.2.1.0 – No Sharing

Register extract EXT1 with database (current SCN: 60000)

Register extract EXT2 with database (current SCN: 65000)

Register extract EXT3 with database SCN 60000 (current SCN: 65555)

Register extract EXT4 with database SCN 61000   à Error!!

Registration of Integrated Extract EXT1, EXT2 and EXT3 happened successfully where as EXT4 fails because the LogMiner server does not exist at SCN 61000.

Also take a note that all Integrated Extract (EXT1 – EXT3) created dictionaries separately (LogMiner Dictionary IDs are different, now onwards I’ll call them LM-DID).

New behavior with different SHARE options

  • Register extract EXT4 with database SHARE AUTOMATIC (current SCN: 66000)

EXT4 automatically chose the capture session EXT2 as it has Start SCN 65000 which is nearer to current SCN 66000. Hence EXT4 & EXT3 capture sessions would share the same LM-DID 2.

  • Register extract EXT5 with database SHARE EXT1 (current SCN: 68000)

EXT5 is sharing the capture session EXT1. Since EXT1 is up and running, it doesn’t give any error. LM-DID 1 would be shared across EXT 5 and EXT1 capture sessions.

  • Register extract EXT6 with database SHARE NONE (current SCN: 70000)

EXT6 is being registered with SHARE NONE option; hence the new LogMiner dictionary will be created or dumped. Please see LM-DID column for EXT6 in above table. It contains LM-DID value 4.

  • Register extract EXT7 with database SCN 61000 SHARE NONE (current SCN: 70000)

It would generate an error as similar to EXT4 @SCN61000. The LogMiner Server doesn’t exist at SCN 61000 and since the SHARE option is NONE, it won’t share the existing LogMiner dictionaries as well. This is same behavior as prior to 12.1.2.1.0 release.

  • Register extract EXT7 with database SCN 61000 SHARE AUTOMATIC (current SCN: 72000)

EXT7 is sharing the capture session EXT1 as it is the closest for SCN61000. You must have noticed that the EXT7 @SCN61000 scenario has passed with SHARE AUTOMATIC option, which was not the case earlier (EXT4 @61000).

  • Register extract EXT8 with database SCN 68000 SHARE EXT2 (current SCN: 76000)

EXT8 extract is sharing EXT2 capture session. Hence sharing of LogMiner dictionaries happens between EXT8 & EXT2

This feature is not only providing you faster start up for additional Integrated Extract, but also resolves few scenarios which wasn’t possible earlier. If you are using this feature and had questions or comments, please let me know by leaving your comments below. I’ll reply to you as soon as possible.

Thursday Jan 22, 2015

OTN Virtual Technology Summit Data Integration Subtrack Features Big Data Integration and Governance

I am sure many of you have heard about the quarterly Oracle Technology Network (OTN) Virtual Technology Summits. It provides a hands-on learning experience on the latest offerings from Oracle by bringing experts from our community and product management team. 

The next OTN Virtual Technology Summit is scheduled to February 11th (9am-12:30pm PT) and will feature Oracle's big data integration and metadata management capabilities with hands-on-lab content.

The Data Integration and Data Warehousing sub-track includes the following sessions and speakers:

Feb 11th 9:30am PT -- HOL: Real-Time Data Replication to Hadoop using GoldenGate 12c Adaptors

Oracle GoldenGate 12c is well known for its highly performant data replication between relational databases. With the GoldenGate Adaptors, the tool can now apply the source transactions to a Big Data target, such as HDFS. In this session, we'll explore the different options for utilizing Oracle GoldenGate 12c to perform real-time data replication from a relational source database into HDFS. The GoldenGate Adaptors will be used to load movie data from the source to HDFS for use by Hive. Next, we'll take the demo a step further and publish the source transactions to a Flume agent, allowing Flume to handle the final load into the targets.

Speaker: Michael Rainey, Oracle ACE, Principal Consultant, Rittman Mead

Feb 11th 10:30am PT -- HOL: Bringing Oracle Big Data SQL to Oracle Data Integration 12c Mappings

Oracle Big Data SQL extends Oracle SQL and Oracle Exadata SmartScan technology to Hadoop, giving developers the ability to execute Oracle SQL transformations against Apache Hive tables and extending the Oracle Database data dictionary to the Hive metastore. In this session we'll look at how Oracle Big Data SQL can be used to create ODI12c mappings against both Oracle Database and Hive tables, to combine customer data held in Oracle tables with incoming purchase activities stored on a Hadoop cluster. We'll look at the new transformation capabilities this gives you over Hadoop data, and how you can use ODI12c's Sqoop integration to copy the combined dataset back into the Hadoop environment.

Speaker: Mark Rittman, Oracle ACE Director, CTO and Co-Founder, Rittman Mead

Feb 11th 11:30am PT-- An Introduction to Oracle Enterprise Metadata Manager
This session takes a deep technical dive into the recently released Oracle Enterprise Metadata Manager. You’ll see the standard features of data lineage, impact analysis and version management applied across a myriad of Oracle and non-Oracle technologies into a consistent metadata whole, including Oracle Database, Oracle Data Integrator, Oracle Business Intelligence and Hadoop. This session will examine the Oracle Enterprise Metadata Manager "bridge" architecture and how it is similar to the ODI knowledge module. You will learn how to harvest individual sources of metadata, such as OBIEE, ODI, the Oracle Database and Hadoop, and you will learn how to create OEMM configurations that contain multiple metadata stores as a single coherent metadata strategy.

Speaker: Stewart Bryson, Oracle ACE Director, Owner and Co-founder, Red Pill Analytics

I invite you to register now to this free event and enjoy this feast for big data integration and governance enthusiasts.

Americas -- February 11th/ 9am to 12:30pm PT- Register Now

Please note the same OTN Virtual Technology Summit content will be presented again to EMEA and APAC. You can register for the via the links below.

EMEA – February 25th / 9am to 12:30pm GMT* - Register Now

APAC – March 4th / 9:30am-1:00pm IST* - Register Now

Join us and let us know how you like the data integration sessions in this quarter's OTN event.

Monday Jan 12, 2015

ODI 12c - Eclipse and Updated Mapping Builder Example

The 12c ODI release offers a full SDK which allows great opportunities for automation. You can reduce development times and work smarter. Here I will show how to use Eclipse to configure and setup the the SDK to start working smarter. In this example I will use an updated version of Mapping Builder which is a sample SDK illustration I blogged about previously. The builder has been updated to include auto mapping of attributes, multiple targets, loading modes and more. The updated OdiMappingBuilder.java code used in this example is here

There is a viewlet illustrating the using Eclipse and the ODI SDK here. The viewlet walks through the entire configuration of an Eclipse project using the 12.1.3 release of ODI. 


The important areas are with respect to the libraries you need to define for the project, below you can see the libraries used from the ODI installation, I have installed my software in C:\Oracle\ODI_1213 in these examples.

Once the libraries are configured you can import the mapping builder java source code...

Then you must define how the code is executed, you specify the parameters as arguments. This is of course an example for illustration, you can define how you would like this to be done in whatever way you like. You specify all of the mapping builder parameters here. I also pass the control file as a parameter rather than via stdin (which is what I did in previous blog posts).

I mentioned the mapping builder code now supports auto mapping, multiple targets and loading mode, you can see an example driver file below. The line starting with >mapping has 2 qualifiers, below I have used EQUALS and IGNORECASE. The first parameter  supports the following values;

  • EQUALS - supports matching EMPNO with EMPNO (or empno with EMPNO when IGNORECASE used for example)
  • SRCENDSWITH - supports matching source XEMPNO with EMPNO (plus case option as above)
  • TGTENDSWITH - supports matching source EMPNO with XEMPNO (plus case option as above)

The second parameter supports;

  • MATCH - exact match, matches EMPNO with EMPNO or empno with empno
  • IGNORECASE - supports mismatched case matching EMPNO matches empno

As well as auto mapping you can also specify specific column level mapping explicitly as was previously supported, the viewlet has an example. 

 You can also see the control file has a target_load directive which can be given a condition or else in the final field, this let's you load multiple conditional based targets. This is a new addition to the existing target directive. So the example has 3 directives now;

  • target - load via insert/append the data
  • targetinc - incremental update / merge the data
  • target_load - insert multiple targets 

In the example above, the following mapping is created, you can see the multiple conditional based targets and data is mapped all from a simple driver file;

Using Eclipse you can quickly build and debug your utilities and applications using the SDK and get the benefits of using an IDE such as code insight and auto completion.

Check out the viewlet and the mapping builder example plus the blogs on the SDK such as the recent one on the SDK overview and work smarter. Also, the updated OdiMappingBuilder.java code used in this example is here - get an overview of the mapping builder here.

Friday Jan 09, 2015

ODI 12c - Mapping SDK Overview

In this post I'll show some of the high level concepts in the physical design and the SDKs that go with it. To do this I'll cover some of the logical design area so that it all makes sense. The conceptual model for logical mapping in ODI 12c is shown below (it's quite a change from 11g), the model below allows us to build arbitrary flows. Each entity below you can find in the 12c SDK. Many of these have specialized classes - for example MapComponent has specializations for the many mapping components available from the designer - these classes may have specific business logic or specialized behavior. You can use the strongly typed, highly specialized classes like DatastoreComponent or you can write applications in a generic manner using the conceptual high level SDK - this is the technique I used in the mapping builder here.

The heart of the SDK for this area of the model can be found here;

If you need to see these types in action, take the mapping illustration below as an example, I have annotated the different items within the mapper. The connector points are shown in the property inspector, they are not shown in the graphical design. Some components have many input or output connector points (for example set component has many input connector points). Some components are simple expression based components (such as join and filter) we call these selector components, other components project a specific shape, we call those projectors - that's just how we classify them. 

In 12c we clearly separated the physical design from the logical, in 11g much of this was blended together. In separating them we also allow many physical designs for a logical mapping design. We also had to change the physical SDK and model so that we could support multiple targets and arbitrary flows. 11g was fairly rigid - if you look at the 'limitations' sections of the KMs you can see some of that. KMs are assigned on map physical nodes in the physical design, there are some helper methods on execution unit so you can set/get KMs.

The heart of the SDK for this logical mapping area of the model can be found here;

If we use the logical mapping shown earlier and look at the physical design we have for it, we can annotate the items below so you can envisage how each of the classes above is used in the design;

The MapPhysicalDesign class has all of the physical related information such as the ODI Optimization Context and Default Staging Location (there also exists a Staging Location Hint on the logical mapping design) - these are items that existed in ODI 11g and are carried forward.

To take an example if I want to change the LKMs or IKMs set on all physical designs, one approach would be to iterate through all of the nodes in a physical design and you can check whether an LKM or an IKM is assigned for that node - this then let;s you do all sorts - from get the current setting, to setting it with a new value. The snippet below gives a small illustration using groovy of the methods from the ODI SDK;

  1.         PhysicalDesignList=map.getPhysicalDesigns()
  2.          for (pd in PhysicalDesignList){
  3.             PhysicalNodesList=pd.getPhysicalNodes()
  4.             for (pn in PhysicalNodesList){
  5.                 if (pn.isLKMNode()){
  6.                     CurrentLKMName=pn.getLKMName()
  7. ...
  8.                          pn.setLKM(myLKM) 
  9.                 }else if (pn.isIKMNode()){
  10.                     CurrentIKMName=pn.getIKMName()
  11. ...
  12.                      pn.setIKM(myIKM)

There are many other methods within the SDK to do all sorts of useful stuff - first example is the getAllAPNodes method on a MapPhysicalDesign. This gives all of the nodes in a design which will have LKMs assigned - so you can quickly set or check. The second example is the getTargetNodes method on MapPhysicalDesign - this is handy to get all target nodes to set IKMs on, Final example is to find an AP node in the physical design for a logical component in your design - use the method findNode to achieve this.

Hopefully there are some useful pointers here, worth being aware of the ODI blog on Mapping SDK the ins and outs which provides an overview and cross reference to the primary ODI objects and the underpinning SDKs. If there are any other specific questions let us know.

Monday Dec 29, 2014

Oracle Data Enrichment Cloud Service (ODECS) - Coming Soon

What are your plans around Big Data and Cloud?

If your organization has already begun to explore these topics, you might be interested a new offering from Oracle that will dramatically simplify how you use your data in Hadoop and the Cloud:

Oracle Data Enrichment Cloud Service (ODECS)

There is a perception that most of the time spent in Big Data projects is dedicated to harvesting value. The reality is that 90% of the time in Big Data projects is really spent on data preparation. Data may be structured, but more often it will be semi-structured such as weblogs, or fully unstructured such as free form text. The content is vast, inconsistent, and incomplete, often off topic, and from multiple differing formats and sources. In this environment each new dataset takes weeks or months of effort to process, frequently requiring programmers writing custom scripts. Minimizing data preparation time is the key to unlocking the potential of Big Data.

Oracle Data Enrichment Cloud Service (ODECS) addresses this very reality. ODECS is a non-technical, web-based tool that sets out to minimize data preparation time in an effort to quickly unlock the potential of your data. The ODECS tool provides an interactive set of services that automate, streamline, and guide the process of data ingestion, preparation, enrichment, and governance without costly manual intervention.

The technology behind this service is amazing; it intuitively guides the user with a machine learning driven recommendation engine based on semantic data classification and natural language processing algorithms. But the best part is that non-technical staff can use this tool as easily as they use Excel, resulting in a significant cost advantage for data intensive projects by reducing the amount of time and resources required to ingest and prepare new datasets for downstream IT processes.

Curious to find out more? We invite you to view a short demonstration of ODECS below:


Let us know what you think!

Stay tuned as we write more about this offering…

About

Learn the latest trends, use cases, product updates, and customer success examples for Oracle's data integration products-- including Oracle Data Integrator, Oracle GoldenGate and Oracle Enterprise Data Quality

Search

Archives
« August 2015
SunMonTueWedThuFriSat
      
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
     
Today