Tuesday Dec 22, 2015

Oracle GoldenGate for Big Data 12.2.0.1 is Generally Available Now!

Much awaited Oracle GoldenGate for Big Data 12.2 is released today and it is available for download at OTN.

Let me give you a quick recap on Oracle GoldenGate for Big Data. Oracle GoldenGate for Big Data streams transactional data into big data systems in real-time, raising the quality and timeliness of business insights. Oracle GoldenGate for Big Data offers also provides a flexible and extensible solution to support all major big data systems.

Oracle GoldenGate for Big Data

  • Same trusted Oracle GoldenGate architecture used by 1000’s of customers
  • Data delivery to Big Data targets including NoSQL databases
  • Support for Polyglot, Lambda and Kappa architectures for streaming data

Key Benefits

  • Less invasive on source databases when compared to batch processing such as Sqoop or ETL processes
  • Simple ingestion for 1:1 data architecture for populating “raw data” zones
  • Real-time data delivery for streaming analytics/apps
  • Reliable, proven at scale with high performance


Architecture – GoldenGate for Big Data 12.2 versus 12.1

New Features in 12.2.0.1:



New Java based Replicat Process 

The advantages of using Java based Replicat process are the following:

    1. Improved performance with Java based adapters
    2. Declarative design and configurable mapping
    3. Transaction grouping based on Operation count & Message size
    4. Improved check pointing functionality
      E.g.: CHECKPOINTSECS 1 (default 10 seconds)

Dynamic Data Handling

You no longer require to define SOURCEDEFS. DDL changes are automatically replicated to target. For example, if a new column named “mycolumn“ is added on the source database, it will be automatically replicated to the target without stopping and reconfiguring Oracle GoldenGate.

Pluggable Formatters

Oracle GoldenGate for Big Data can write into any Big Data targets in various data formats such as delimited text or XML or JSON or Avro or custom format. This can save users cost and time for staging data in ETL operations.

Example: gg.handler.name.format= <value>
values supported are delimitedtext”, “xml”, “json”, “avro” or “avro_row”, “avro_op” or Custom Format. Extended class path needs to be included in the config file. <com.yourcompany.YourFormatter

Security Enhancement

Native Kerberos support is available in the 12.2.0.1 binaries.

Example of configuration:
gg.handler.gghdfs.authType=Kerberos
gg.handler.gghdfs.kerberosKeytabFile=/keytab/file/path
gg.handler.gghdfs.kerberosPrincipal=user/FQDN@MY.REALM

Declarative Design

Oracle GoldenGate for Big Data is able to provide mapping functionally between source table to target table and source field to target field for HDFS/Hive, HBase, Flume and Kafka. The metadata is also validated at Hive or using an Avro schema to ensure data correctness.

Example:
MAP GG.TCUSTOMER, TARGET GG.TCUSTMER2, COLMAP (USEDEFAULTS, "cust_code2"=cust_code,"city2"=city);

Kafka as target

Oracle GoldenGate for Big Data can write Logical change records data to a Kafka topic. Operations such as Insert, Update, Delete and Primary Key Update can be handled. It can handles native compression such as GZIP and Snappy in Kafka.

Example of defining Kafka Handler Properties:
gg.handlerlist=ggkafka
gg.handler.ggkafka.type=kafka
gg.handler.ggkafka.topicName=gg_kafka
gg.handler.ggkafka.mode=tx

Other Enhancements

  • Partition data by Hive Table and/or column. Partitioning into new file based on designated column values
    Example:
    • gg.handler.{name}.partitionByTable =true | false
    • gg.handler.{name}.partitioner.{fully qualified table name}={colname}
    • gg.handler.{name}.partitioner.{fully qualified table name}={colname1},{colname2}
    • gg.handler.<yourhandlername>.partitioner.dbo.TCUSTORD=region, rating
  • Configurable File Rolling Property for HDFS (file size, duration, inactivity timer, metadata change)
  • Configurable file output encoding into HDFS
  • Automatically create HBase table if it does not exist
  • Ability to treat primary key updates as a delete and then an insert in HBase
  • HBase row key generation
  • Treat Primary Key updates as delete and insert in Flume and HBase
  • New Time stamping functionality to include micro second precision as ISO-8601
  • Availability on additional OS platforms: Windows and Solaris
  • Certification for newer versions: Apache HDFS 2.7.x, Cloudera 5.4.x, Hortonworks 2.3, Kafka 0.8.2.0 and 0.8.2.1


For more details about new product features, you may refer to Oracle GoldenGate for Big Data 12.2.0.1 Release Notes and User Documentation.

For more information about Oracle GoldenGate for Big Data.

Feel free to reach out to me for your queries by posting in this blog or tweeting @thomasvengal

Happy Holidays ! 

Thursday Feb 19, 2015

Introducing Oracle GoldenGate for Big Data!

Big data systems and big data analytics solutions are becoming critical components of modern information management architectures.  Organizations realize that by combining structured transactional data with semi-structured and unstructured data they can realize the full potential value of their data assets, and achieve enhanced business insight. Businesses also notice that in today’s fast-paced, digital business environment to be agile and respond with immediacy, access to data with low latency is essential. Low-latency transactional data brings additional value especially for dynamically changing operations that day-old data, structured or unstructured, cannot deliver.

Today we announced the general availability of Oracle GoldenGate for Big Data product, which offers a real-time transactional data streaming platform into big data systems. By providing easy-to-use, real-time data integration for big data systems, Oracle GoldenGate for Big Data facilitates improved business insight for better customer experience. It also allows IT organizations to quickly move ahead with their big data projects without extensive training and management resources. Oracle GoldenGate for Big Data's real-time data streaming platform also allows customers to keep their big data reservoirs up to date with their production systems. 

Oracle GoldenGate’s fault-tolerant, secure and flexible architecture shines in this new big data streaming offering as well. Customers can enjoy secure and reliable data streaming with subseconds latency. With Oracle GoldenGate’s core log-based change data capture capabilities it enables real-time streaming without degrading the performance of the source production systems.

The new offering, Oracle GoldenGate for Big Data, provides integration for Apache Flume, Apache HDFS, Apache Hive and Apache Hbase. It also includes Oracle GoldenGate for Java, which enables customers to easily integrate to additional big data systems, such as Oracle NoSQL, Apache Kafka, Apache Storm, Apache Spark, and others.

You can learn more about our new offering via Oracle GoldenGate for Big Data data sheet and by registering for our upcoming webcast:

How to Future-Proof your Big Data Integration Solution

March 5th, 2015 10am PT/ 1pm ET

I invite you to join this webcast to learn from Oracle and Cloudera executives how to future-proof your big data infrastructure. The webcast will discuss :

  • Selection criteria that will drive business results with Big Data Integration 
  • Oracle's new big data integration and governance offerings, including Oracle GoldenGate for Big Data
  • Oracle’s comprehensive big data features in a unified platform 
  • How Cloudera Enterprise Data Hub and Oracle Data Integration combine to offer complementary features to store data in full fidelity, to transform and enrich the data for increased business efficiency and insights.

Hope you can join us and ask your questions to the experts.

Sunday Jul 13, 2014

New Big Data Features in ODI 12.1.3

Oracle Data Integrator (ODI) 12.1.3 extends its Hadoop capabilities through a number of exciting new cababilities. The new features include:

  • Loading of RDBMS data from and to Hadoop using Sqoop
  • Support for Apache HBase databases
  • Support for Hive append functionality
With these new additions ODI provides full connectivity to load, transform, and unload data in a Big Data environment.

The diagram below shows all ODI Hadoop knowledge modules with KMs added in ODI 12.1.3 in red. 

Sqoop support

Apache Sqoop is designed for efficiently transferring bulk amounts of data between Hadoop and relational databases such as Oracle, MySQL, Teradata, DB2, and others. Sqoop operates by creating multiple parallel map-reduce processes across a Hadoop cluster and connecting to an external database and transfering data from or to Hadoop storage in a partitioned fashion. Data can be stored in Hadoop using HDFS, Hive, or HBase. ODI adds two knowledge modules IKM SQL to Hive- HBase-File (SQOOP) and IKM File-Hive to SQL (SQOOP).

Loading from and to Sqoop in ODI is straightforward. Create a mapping with the database source and hadoop target (or vice versa) and apply any necessary transformation expressions.

In the physical design of the map, make sure to set the LKM of the target to LKM SQL Multi-Connect.GLOBAL and choose a Sqoop IKM, such as  IKM SQL to Hive- HBase-File (SQOOP). Change the MapReduce Output Directory IKM property MAPRED_OUTPUT_BASE_DIR to an appropriate HDFS dir. Review all other properties and tune as necessary. Using these simple steps you should be able to perform a quick Sqoop load. 

For more information please review the great ODI Sqoop article from Benjamin Perez-Goytia, or read the ODI 12.1.3 documentation about Sqoop.

HBase support

ODI adds support for HBase as a source and target. HBase metadata can be reverse-engineered using the RKM HBase knowledge module, and HBase can be used as source and target of a Hive transformation using LKM HBase to Hive and IKM Hive to HBase. Sqoop KMs also support HBase as a target for loads from a database. 

For more information please read the ODI 12.1.3 documentation about HBase.

Hive Append support

Prior to Hive 0.8 there had been no direct way to append data to an existing table. Prior Hive KMs emulated such logic by renaming the existing table and concatenating old and new data into a new table with the prior name. This emulated append operation caused major data movement, particularly when the target table has been large.

Starting with version 0.8 Hive has been enhanced to support appending. All ODI 12.1.3 Hive KMs have been updated to support the append capability by default but provide backward compatibility to the old behavior through the KM property HIVE_COMPATIBLE=0.7. 

Conclusion

ODI 12.1.3 provides an optimal and easy-to use way to perform data integration in a Big Data environment. ODI utilizes the processing power of the data storage and processing environment rather than relying on a proprietary transformation engine. This core "ELT" philosophy has its perfect match in a Hadoop environment, where ODI can provide unique value by providing a native and easy-to-use data integration envionment.

Monday Oct 14, 2013

Streaming relational transactions to HBase

Following the introductory blog post on the topic – ' Stream your transactions into Big Data Systems', and blog posts on "Streaming Relational Transactions to HDFS and Hive", in this blog post I focus on the architecture for streaming transactions into HBase.

As shown in the diagram below, integrating database with HBase is accomplished by developing a custom handler using Oracle GoldenGate's Java API and HBase APIs.

 The custom handler is deployed as an integral part of the Oracle GoldenGate Pump process.   The Pump process and the custom adapter are configured through the Pump parameter file and custom adapter's properties file. The Pump process executes adapter in its address space. The Pump reads the Trail File created by the Oracle GoldenGate Capture process and passes the transactions to the adapter. Based on the configuration, the adapter writes the transactions to HBase.

A sample implementation of the HBase adapter is provided on My Oracle Support (Knowledge ID - 1586211.1). This is provided to illustrate the capability and to assist in the adoption of the Oracle GoldenGate Java API in developing custom solutions. The sample implementation illustrates the configuration and the code required for replicating database transactions on an example table to a corresponding HBase table. The instructions for configuring Oracle GoldenGate, compiling and running the sample implementation are also provided.

The sample code and configuration may be extended to develop custom solutions, however, please note that Oracle will not provide support for the code and the configuration illustrated in the knowledge base paper.

As always, I would greatly appreciate if you can share your use case about integrating Oracle GoldenGate with your Big Data strategy and your feedback on using the custom handler for integrating relational database with your Big Data systems. Please post your comments in this blog or in the Oracle GoldenGate public forum - https://forums.oracle.com/community/developer/english/business_intelligence/system_management_and_integration/goldengate

About

Learn the latest trends, use cases, product updates, and customer success examples for Oracle's data integration products-- including Oracle Data Integrator, Oracle GoldenGate and Oracle Enterprise Data Quality

Search

Archives
« May 2016
SunMonTueWedThuFriSat
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
    
       
Today