Friday Oct 24, 2014

Oracle 12c High Availability New Features: Table Recovery in a GoldenGate Environment

By Randy Richeson, Senior Principal Oracle University Instructor



Students often ask if a table can be recovered to a point in time and whether GoldenGate can capture this recovery. Oracle 12c enables you to recover one or more tables or table partitions to a specified point in time without affecting the remaining database objects. This new RMAN functionality reduces the recovery scope and possibly the recovery time and disk space requirements when compared with Database point-in-time-recovery, tablespace point-in-time recovery, and various flashback features which were introduced prior to 12c.

When combined with GoldenGate 12c configured to capture both DML and DDL from a source database, the table recovery can be replicated to a target database effectively recovering 2 tables with 1 RMAN command. The following demo shows how table recovery can be used to recover a table that is part of a replication stream.

This GoldenGate integrated extract is configured to process DML and DDL for the source WEST schema tables.


GGSCI (eddnr2p0) 2> view param extwest

extract extwest

useridalias gguamer

statoptions resetreportstats

report at 00:01

reportrollover at 00:01

reportcount every 60 seconds, rate

reportcount every 1000 records

exttrail ./dirdat/ew

ddl include mapped objname west.*;

ddloptions addtrandata, report

table west.*;


There are 1063 rows in the source AMER database WEST.ACCOUNT table.


AMER_WEST_SQL>select name from v$database;

NAME

---------

AMER

AMER_WEST_SQL>select count(*) from account;

COUNT(*)

----------

1063

AMER_WEST_SQL>select * from account where account_number > 6000;

ACCOUNT_NUMBER ACCOUNT_BALANCE

-------------- ---------------

7000 7000

8000 8000

9000 9000


This GoldenGate integrated replicat is configured to apply DML and DDL to the target EURO database EAST.ACCOUNT table.


GGSCI (eddnr2p0) 2> view param reast

replicat reast

assumetargetdefs

discardfile ./dirrpt/reast.dsc, purge

useridalias ggueuro

ddl include mapped

map west.*, target east.*;


The target EAST.ACCOUNT table has 1063 rows matching the source WEST.ACCOUNT table.


EURO_DATABASE_SQL>select name from v$database;

NAME

---------

EURO

EURO_DATABASE_SQL>select count(*) from account;

COUNT(*)

----------

1063

EURO_DATABASE_SQL>select * from account where account_number > 6000;

ACCOUNT_NUMBER ACCOUNT_BALANCE

-------------- ---------------

7000 7000

8000 8000

9000 9000


A whole RMAN backup is made of the source 'AMER' control file and datafiles.


[oracle@eddnr2p0 gg_amer]$rman target /

Recovery Manager: Release 12.1.0.1.0 - Production on Fri Oct 3 17:50:04 2014

Copyright (c) 1982, 2013, Oracle and/or its affiliates. All rights reserved.

connected to target database: AMER (DBID=1282626787)

RMAN> backup database;

Starting backup at 03-OCT-14

…….

Finished Control File and SPFILE Autobackup at 03-OCT-14


The current scn is queried so that it can be used in this recovery.


RMAN> select timestamp_to_scn(current_timestamp) from v$database;

TIMESTAMP_TO_SCN(CURRENT_TIMESTAMP)

-----------------------------------

2527804


Four transactions occur with negative account balances that logically corrupt the source WEST.ACCOUNT table.


AMER_WEST_SQL> update account set account_balance = -7000 where account_number = 7000;

1 row updated.

AMER_WEST_SQL> commit;

Commit complete.

AMER_WEST_SQL> update account set account_balance = -8000 where account_number = 8000;

1 row updated.

AMER_WEST_SQL> commit;

Commit complete.

AMER_WEST_SQL> update account set account_balance = -9000 where account_number = 9000;

1 row updated.

AMER_WEST_SQL> commit;

Commit complete.

AMER_WEST_SQL> insert into account values (9100,-9100);

1 row created.

AMER_WEST_SQL> commit;

Commit complete.

AMER_WEST_SQL> select count(*) from account;

COUNT(*)

----------

1064

AMER_WEST_SQL> select * from account where account_number > 6000;

ACCOUNT_NUMBER ACCOUNT_BALANCE

-------------- ---------------

7000 -7000

8000 -8000

9000 -9000

9100 -9100


The extract captures the changes from the source WEST.ACCOUNT table and the replicat applies them to the target EAST.ACCOUNT table resulting in negative balances in both tables.


EURO_EAST_SQL> select * from account where account_number > 6000;

ACCOUNT_NUMBER ACCOUNT_BALANCE

-------------- ---------------

7000 -7000

8000 -8000

9000 -9000

9100 -9100


The new 12c recover table command recovers the source table to the time before the updates that corrupted the data using the specified scn and the backups that were previously created. In this example, 'UNTIL SCN' is used. As an alternative, 'UNTIL_TIME' or 'UNTIL_SEQUENCE' can be used. RMAN determines the backup based on the scn and then creates an auxiliary instance to recover the table. RMAN then creates a Data Pump export dump file that contains the recovered table and imports the recovered table into the database. The recovered table is renamed to WEST.ACCOUNT_R so that the recovered data can be compared against the original data in the WEST.ACCOUNT table. During the recovery, the source AMER and target EURO databases remain open.


RMAN> recover table west.account until scn 2527804 auxiliary destination '/u01/app/oracle/backup' remap table 'WEST'.'ACCOUNT':'ACCOUNT_R';

Starting recover at 03-OCT-14

….

Finished recover at 03-OCT-14


After the table recovery, the rows are counted and the data compared between the original and newly recovered table in the source database.


RMAN> select count(*) from west.account;

COUNT(*)

----------

1064

RMAN> select count(*) from west.account_r;

COUNT(*)

----------

1063

RMAN> select * from west.account where account_number > 6000;

ACCOUNT_NUMBER ACCOUNT_BALANCE

-------------- ---------------

7000 -7000

8000 -8000

9000 -9000

9100 -9100

 

RMAN> select * from west.account_r where account_number > 6000;

ACCOUNT_NUMBER ACCOUNT_BALANCE

-------------- ---------------

7000 7000

8000 8000

9000 9000


The pump extract and integrated replicat RBA's advance as the recovery changes are captured from the source database and applied to the target database.


GGSCI (eddnr2p0) 3> info *west

EXTRACT EXTWEST Last Started 2014-10-03 16:52 Status RUNNING

Checkpoint Lag 00:00:06 (updated 00:00:08 ago)

Process ID 21186

Log Read Checkpoint Oracle Integrated Redo Logs

2014-10-03 18:17:47

SCN 0.2533283 (2533283)

EXTRACT PWEST Last Started 2014-10-03 12:26 Status RUNNING

Checkpoint Lag 00:00:00 (updated 00:00:04 ago)

Process ID 14346

Log Read Checkpoint File ./dirdat/ew000009

2014-10-03 18:13:33.000000 RBA 289288

GGSCI (eddnr2p0) 1> info reast

REPLICAT REAST Last Started 2014-10-03 16:51 Status RUNNING

INTEGRATED

Checkpoint Lag 00:00:00 (updated 00:00:08 ago)

Process ID 21155

Log Read Checkpoint File ./dirdat/pe000005

2014-10-03 18:03:43.896883 RBA 171400


The target EAST.ACCOUNT table remains at the original row count while the EAST.ACCOUNT_R is created with the recovered rows.


EURO_EAST_SQL> select count(*) from account;

COUNT(*)

----------

1064

EURO_EAST_SQL> select * from account where account_number > 6000;

ACCOUNT_NUMBER ACCOUNT_BALANCE

-------------- ---------------

7000 -7000

8000 -8000

9000 -9000

9100 -9100

EURO_EAST_SQL> select count(*) from account_r;

COUNT(*)

----------

1063

EURO_EAST_SQL> select * from account_r where account_number > 6000;

ACCOUNT_NUMBER ACCOUNT_BALANCE

-------------- ---------------

7000 7000

8000 8000

9000 9000


Supplemental logging on the new WEST.ACCOUNT_R table needs to be enabled for subsequent changes to be applied by the replicat to the target database.


GGSCI (eddnr2p0) 2> dblogin useridalias gguamer

Successfully logged into database.

GGSCI (eddnr2p0) 3> info trandata west.account*

Logging of supplemental redo log data is enabled for table WEST.ACCOUNT.

Columns supplementally logged for table WEST.ACCOUNT: ACCOUNT_NUMBER.

Logging of supplemental redo log data is disabled for table WEST.ACCOUNT_R.

Logging of supplemental redo log data is enabled for table WEST.ACCOUNT_TRANS.

Columns supplementally logged for table WEST.ACCOUNT_TRANS: TRANS_NUMBER, ACCOUNT_NUMBER, ACCOUNT_TRANS_TS.


After the corrupted WEST.ACCOUNT table is dropped, the extract captures and the replicat applies the drop operation.


AMER_WEST_SQL> drop table account;

Table dropped.

EURO_EAST_SQL> select * from account;

select * from account

*

ERROR at line 1:

ORA-00942: table or view does not exist

After the recovered WEST.ACCOUNT_R is renamed, the extract captures the rename operation and the replicat applies it.

AMER_WEST_SQL> alter table account_r rename to account;

Table altered.


The RMAN table recovery and resulting rename operation results in the WEST.ACCOUNT table having no unique key, which may be added since primary keys are recommended. Supplement logging is enabled to allow for the delivery of updates and deletes by the replicat.


GGSCI (eddnr2p0) 5> info trandata west.account

Logging of supplemental redo log data is disabled for table WEST.ACCOUNT.

GGSCI (eddnr2p0) 6> add trandata west.account

2014-10-03 18:26:47 WARNING OGG-06439 No unique key is defined for table ACCOUNT. All viable columns will be used to represent the key, but may not guarantee uniqueness. KEYCOLS may be used to define the key.

Logging of supplemental redo data enabled for table WEST.ACCOUNT.

TRANDATA for scheduling columns has been added on table 'WEST.ACCOUNT'.

Rows are counted to verify that the account table has the correct count with the recovered data.

EURO_EAST_SQL> select count(*) from account;

COUNT(*)

----------

1063

EURO_EAST_SQL> select count(*) from account where account_number > 6000;

COUNT(*)

----------

3

EURO_EAST_SQL> select * from account where account_number > 6000;

ACCOUNT_NUMBER ACCOUNT_BALANCE

-------------- ---------------

7000 7000

8000 8000

9000 9000

AMER_WEST_SQL> select * from account where account_number > 6000;

ACCOUNT_NUMBER ACCOUNT_BALANCE

-------------- ---------------

7000 7000

8000 8000

9000 9000


RMAN 12c table recovery, combined with GoldenGate 12c configured to capture DML and DDL from a source database, provides a powerful combination that allows a DBA to quickly recover two tables with 1 RMAN command.

Book a seat now in an upcoming Oracle Database 12c: New Features for Administrators class to learn much more about using Oracle 12c new features including Multitenant features, Enterprise Manager 12c, Heat Map and Automatic Data Optimization, In-Database Archiving and Temporal, Auditing, privileges, Data Redaction, RMAN, Real Time Database Monitoring, Schema and Data Change Management, SQL Tuning Enhancements, Emergency Monitoring, ADR, In Memory Caching, SQL Tuning enhancements, Resource Manger, Partitioning, and JSON.

Explore Oracle University Database classes here, or send me an email at randy.richeson@oracle.com if you have other questions.


About the Author:

brentdayley

Randy Richeson joined Oracle University as a Senior Principal Instructor in March 2005. He is an Oracle Certified Professional (10g - 12c) and GoldenGate Certified Implementation Specialist (10g - 11g). He has taught Oracle Database technology since 1997 and other technical curriculums including GoldenGate Software, GoldenGate Management Pack, GoldenGate Director, GoldenGate Veridata, JD Edwards, PeopleSoft, and the Oracle Application Server since 1997.

Friday Jun 06, 2014

Oracle GoldenGate 12c New Features: Trail Encryption and Credentials with Oracle Wallet

Untitled Document

By Randy Richeson, Senior Principal Instructor for Oracle University

Students often ask if GoldenGate supports trail encryption with the Oracle Wallet. Yes, it does now! GoldenGate supported encryption with keygen and the ENCKEYS file for years. GoldenGate 12c now also supports encryption using the Oracle Wallet. This improves security and simplifies its administration.


Two types of wallets can be configured in GoldenGate 12c:

  • The wallet that holds the master key, used with trail or TCP/IP encryption and decryption, stored in the new 12c dirwlt/cwallet.sso file.
  • The wallet that holds the User Id and Password, used for authentication, stored in the new 12c dircrd/cwallet.sso - credential store - file.

 

A wallet can be created using a ‘create wallet’ command. Once created, adding a master key to an existing wallet is easy using ‘open wallet’ and ‘add masterkey’ commands.

 

GGSCI (EDLVC3R27P0) 42> open wallet

Opened wallet at location 'dirwlt'.

GGSCI (EDLVC3R27P0) 43> add masterkey

Master key 'OGG_DEFAULT_MASTERKEY' added to wallet at location 'dirwlt'.

 

Existing GUI Wallet utilities such as the Oracle Database “Oracle Wallet Manager” do not work on this version of the wallet. The default Oracle Wallet location can be changed.

 

GGSCI (EDLVC3R27P0) 44> sh ls -ltr ./dirwlt/*

-rw-r----- 1 oracle oinstall 685 May 30 05:24 ./dirwlt/cwallet.sso

GGSCI (EDLVC3R27P0) 45> info masterkey

Masterkey Name:                 OGG_DEFAULT_MASTERKEY

Creation Date:                  Fri May 30 05:24:04 2014

Version:        Creation Date:                  Status:

1               Fri May 30 05:24:04 2014        Current

 

The second wallet file stores the credential used to connect to a database, without exposing the UserId or Password in a parameter file or macro. Once configured, this file can be copied so that credentials are available to connect to the source or target database.

 

GGSCI (EDLVC3R27P0) 48> sh cp ./dircrd/cwallet.sso $GG_EURO_HOME/dircrd

GGSCI (EDLVC3R27P0) 49> sh ls -ltr ./dircrd/*

-rw-r----- 1 oracle oinstall 709 May 28 05:39 ./dircrd/cwallet.sso

 

The encryption wallet file can also be copied to the target machine so the replicat has access to the master key when decrypting any encrypted records the trail. Similar to the ENCKEYS file, the master key wallet created on the source host must either be stored in a centrally available disk or copied to all GoldenGate target hosts. The wallet is in a platform-independent format, although it is not certified for the iSeries, z/OS, or NonStop platforms.

 

GGSCI (EDLVC3R27P0) 50> sh cp ./dirwlt/cwallet.sso $GG_EURO_HOME/dirwlt

 

The new 12c UserIdAlias parameter is used to locate the credential in the wallet.

 

GGSCI (EDLVC3R27P0) 52> view param extwest

Extract extwest

Exttrail ./dirdat/ew

Useridalias gguamer

Table west.*;


The EncryptTrail parameter is used to encrypt the trail using the FIPS approved Advanced Encryption Standard and the encryption key in the wallet. EncryptTrail can be used with a primary extract or pump extract.


GGSCI (EDLVC3R27P0) 54> view param pwest

Extract pwest

Encrypttrail AES256

Rmthost easthost, mgrport 15001

Rmttrail ./dirdat/pe

Passthru

Table west.*;

Once the extracts are running, records can be encrypted using the wallet.

 

GGSCI (EDLVC3R27P0) 60> info extract *west

EXTRACT    EXTWEST   Last Started 2014-05-30 05:26   Status RUNNING

Checkpoint Lag       00:00:17 (updated 00:00:01 ago)

Process ID           24982

Log Read Checkpoint  Oracle Integrated Redo Logs

                     2014-05-30 05:25:53

                     SCN 0.0 (0)

EXTRACT    PWEST     Last Started 2014-05-30 05:26   Status RUNNING

Checkpoint Lag       24:02:32 (updated 00:00:05 ago)

Process ID           24983

Log Read Checkpoint  File ./dirdat/ew000004

                     2014-05-29 05:23:34.748949  RBA 1483

 

The ‘info masterkey’ command is used to confirm the wallet contains the key. The key is needed to decrypt the data read from the trail before the replicat applies changes to the target table.

 

GGSCI (EDLVC3R27P0) 41> open wallet

Opened wallet at location 'dirwlt'.

GGSCI (EDLVC3R27P0) 42> info masterkey

Masterkey Name:                 OGG_DEFAULT_MASTERKEY

Creation Date:                  Fri May 30 05:24:04 2014

Version:        Creation Date:                  Status:

1               Fri May 30 05:24:04 2014        Current

 

Once the replicat is running, records can be decrypted using the wallet.

 

GGSCI (EDLVC3R27P0) 44> info reast

REPLICAT   REAST     Last Started 2014-05-30 05:28   Status RUNNING

INTEGRATED

Checkpoint Lag       00:00:00 (updated 00:00:02 ago)

Process ID           25057

Log Read Checkpoint  File ./dirdat/pe000004

                     2014-05-30 05:28:16.000000  RBA 1546

 

There is no need for the DecryptTrail parameter when using the wallet, unlike when using the ENCKEYS file.

 

GGSCI (EDLVC3R27P0) 45> view params reast

Replicat reast

AssumeTargetDefs

Discardfile ./dirrpt/reast.dsc, purge

UserIdAlias ggueuro

Map west.*, target east.*;

 

Once a record is committed in the source table, the encryption can be verified using logdump and then querying the target table.

 

SOURCE_AMER_SQL>insert into west.branch values (50, 80071);

1 row created.

SOURCE_AMER_SQL>commit;

Commit complete.

 

The following encrypted record can be found using logdump.


Logdump 40 >n

2014/05/30 05:28:30.001.154 Insert               Len    28 RBA 1546

Name: WEST.BRANCH

After  Image:                                             Partition 4   G  s  

 0a3e 1ba3 d924 5c02 eade db3f 61a9 164d 8b53 4331 | .>...$\....?a..M.SC1 

 554f e65a 5185 0257                               | UO.ZQ..W 

Bad compressed block, found length of  7075 (x1ba3), RBA 1546

  GGS tokens:

TokenID x52 'R' ORAROWID         Info x00  Length   20

 4141 4157 7649 4141 4741 4141 4144 7541 4170 0001 | AAAWvIAAGAAAADuAAp.. 

TokenID x4c 'L' LOGCSN           Info x00  Length    7

 3231 3632 3934 33                                 | 2162943 

TokenID x36 '6' TRANID           Info x00  Length   10

 3130 2e31 372e 3135 3031                          | 10.17.1501 


The replicat automatically decrypts this record from the trail using the wallet and then inserts the row to the target table. This select verifies the row was committed in the target table and the data is not encrypted.


TARGET_EURO_SQL>select * from branch where branch_number=50;

BRANCH_NUMBER                  BRANCH_ZIP

-------------                                   ----------

   50                                              80071

 

Book a seat in an upcoming Oracle GoldenGate 12c: Fundamentals for Oracle Ed 1 class to learn much more about using GoldenGate 12c new features with the Oracle wallet, credentials, integrated extracts, integrated replicats, coordinated replicats, the Oracle Universal Installer, a multi-tenant database, and other features.

Explore Oracle University GoldenGate classes here, or send me an email at randy.richeson[at]oracle.com if you have other questions.

About the Author:

randy

Randy Richeson joined Oracle University as a Senior Principal Instructor in March 2005. He is an Oracle Certified Professional (10g-12c) and GoldenGate Certified Implementation Specialist (10-11g). He has taught GoldenGate since 2010 and other technical curriculums including GoldenGate Management Pack, GoldenGate Director, GoldenGate Veridata, Oracle Database, JD Edwards, PeopleSoft, and the Oracle Application Server since 1997.

Friday Nov 01, 2013

Running a simple integration scenario using the Oracle Big Data Connectors on Hadoop/HDFS cluster

Between the elephant ( the tradional image of the Hadoop framework) and the Oracle Iron Man (Big Data..) an english setter could be seen as the link to the right data


Data, Data, Data, we are living in a world where data technology based on popular applications , search engines, Webservers, rich sms messages, email clients, weather forecasts and so on, have a predominant role in our life.

More and more technologies are used to analyze/track our behavior, try to detect patterns, to propose us "the best/right user experience" from the Google Ad services, to Telco companies or large consumer sites (like Amazon:) ). The more we use all these technologies, the more we generate data, and thus there is a need of huge data marts and specific hardware/software servers (as the Exadata servers) in order to treat/analyze/understand the trends and offer new services to the users.

Some of these "data feeds" are raw, unstructured data, and cannot be processed effectively by normal SQL queries. Large scale distributed processing was an emerging infrastructure need and the solution seemed to be the "collocation of compute nodes with the data", which in turn leaded to MapReduce parallel patterns and the development of the Hadoop framework, which is based on MapReduce and a distributed file system (HDFS) that runs on larger clusters of rather inexpensive servers.

Several Oracle products are using the distributed / aggregation pattern for data calculation ( Coherence, NoSql, times ten ) so once that you are familiar with one of these technologies, lets says with coherence aggregators, you will find the whole Hadoop, MapReduce concept very similar.

Oracle Big Data Appliance is based on the Cloudera Distribution (CDH), and the Oracle Big Data Connectors can be plugged on a Hadoop cluster running the CDH distribution or equivalent Hadoop clusters.

In this paper, a "lab like" implementation of this concept is done on a single Linux X64 server, running an Oracle Database 11g Enterprise Edition Release 11.2.0.4.0, and a single node Apache hadoop-1.2.1 HDFS cluster, using the SQL connector for HDFS.

The whole setup is fairly simple:

  1. Install on a Linux x64 server ( or virtual box appliance) an Oracle Database 11g Enterprise Edition Release 11.2.0.4.0 server
  2. Get the Apache Hadoop distribution from: http://mir2.ovh.net/ftp.apache.org/dist/hadoop/common/hadoop-1.2.1.
  3. Get the Oracle Big Data Connectors from: http://www.oracle.com/technetwork/bdc/big-data-connectors/downloads/index.html?ssSourceSiteId=ocomen.
  4. Check the java version of your Linux server with the command:
    java -version
     java version "1.7.0_40"
    Java(TM) SE Runtime Environment (build 1.7.0_40-b43)
    Java HotSpot(TM) 64-Bit Server VM (build 24.0-b56, mixed mode)
    
  5. Decompress the hadoop hadoop-1.2.1.tar.gz file to /u01/hadoop-1.2.1
  6. Modify your .bash_profile
    export HADOOP_HOME=/u01/hadoop-1.2.1
    export PATH=$PATH:$HADOOP_HOME/bin
    
    export HIVE_HOME=/u01/hive-0.11.0
    export PATH=$PATH:$HADOOP_HOME/bin:$HIVE_HOME/bin
    
    (also see my sample .bash_profile)
  7. Set up ssh trust for Hadoop process, this is a mandatory step, in our case we have to establish a "local trust" as will are using a single node configuration
  8. copy the new public keys to the list of authorized keys
  9. connect and test the ssh setup to your localhost:

  10. We will run a "pseudo-Hadoop cluster", in what is called "local standalone mode", all the Hadoop java components are running in one Java process, this is enough for our demo purposes. We need to "fine tune" some Hadoop configuration files, we have to go at our $HADOOP_HOME/conf, and modify the files:
    core-site.xml
                

    hdfs-site.xml
    

    mapred-site.xml
    

  11. check that the hadoop binaries are referenced correctly from the command line by executing:
    
    hadoop  -version
    
  12. As Hadoop is managing our "clustered HDFS" file system we have to create "the mount point" and format it , the mount point will be declared to core-site.xml as:

    The layout under the /u01/hadoop-1.2.1/data will be created and used by other hadoop components (MapReduce => /mapred/...) HDFS is using the /dfs/... layout structure

  13. format the HDFS hadoop file system:
  14. Start the java components for the HDFS system
  15. As an additional check, you can use the GUI Hadoop browsers to check the content of your HDFS configurations:

    Once our HDFS Hadoop setup is done you can use the HDFS file system to store data ( big data : )), and plug them back and forth to Oracle Databases by the means of the Big Data Connectors ( which is the next configuration step).

    You can create / use a Hive db, but in our case we will make a simple integration of "raw data" , through the creation of an External Table to a local Oracle instance ( on the same Linux box, we run the Hadoop HDFS one node cluster and one Oracle DB).

  16. Download some public "big data", I use the site: http://france.meteofrance.com/france/observations, from where I can get *.csv files for my big data simulations :).

    Here is the data layout of my example file:

    Download the Big Data Connector from the OTN (oraosch-2.2.0.zip), unzip it to your local file system (see picture below)

  17. Modify your environment in order to access the connector libraries , and make the following test:

    [oracle@dg1 bin]$./hdfs_stream
    Usage: hdfs_stream locationFile
    [oracle@dg1 bin]$
    
  18. Load the data to the Hadoop hdfs file system:
    hadoop fs  -mkdir bgtest_data
    hadoop  fs  -put obsFrance.txt bgtest_data/obsFrance.txt
    hadoop fs  -ls  /user/oracle/bgtest_data/obsFrance.txt       
    [oracle@dg1 bg-data-raw]$ hadoop fs -ls  /user/oracle/bgtest_data/obsFrance.txt
    
    Found 1 items
    -rw-r--r--   1 oracle supergroup      54103 2013-10-22 06:10 /user/oracle/bgtest_data/obsFrance.txt
    
    [oracle@dg1 bg-data-raw]$hadoop fs -ls  hdfs:///user/oracle/bgtest_data/obsFrance.txt
    
    Found 1 items
    -rw-r--r--   1 oracle supergroup      54103 2013-10-22 06:10 /user/oracle/bgtest_data/obsFrance.txt
    
  19. Check the content of the HDFS with the browser UI:
  20. Start the Oracle database, and run the following script in order to create the Oracle database user, the Oracle directories for the Oracle Big Data Connector (dg1 it’s my own db id replace accordingly yours):
    #!/bin/bash
    export ORAENV_ASK=NO
    export ORACLE_SID=dg1
    . oraenv
    sqlplus /nolog <<EOF
    CONNECT / AS sysdba;
    CREATE OR REPLACE DIRECTORY osch_bin_path  AS  '/u01/orahdfs-2.2.0/bin';
    CREATE USER BGUSER IDENTIFIED BY oracle;
    GRANT CREATE SESSION, CREATE TABLE TO BGUSER;
    GRANT EXECUTE ON sys.utl_file TO BGUSER;
    GRANT READ, EXECUTE ON DIRECTORY osch_bin_path TO BGUSER;
    CREATE OR REPLACE DIRECTORY BGT_LOG_DIR as '/u01/BG_TEST/logs';
    GRANT READ, WRITE ON DIRECTORY BGT_LOG_DIR to BGUSER;
    CREATE OR REPLACE DIRECTORY BGT_DATA_DIR as '/u01/BG_TEST/data';
    GRANT READ, WRITE ON DIRECTORY BGT_DATA_DIR to BGUSER;
    EOF
    
  21. Put the following in a file named t3.sh and make it executable,
    hadoop jar $OSCH_HOME/jlib/orahdfs.jar \
    oracle.hadoop.exttab.ExternalTable \
    -D oracle.hadoop.exttab.tableName=BGTEST_DP_XTAB \
    -D oracle.hadoop.exttab.defaultDirectory=BGT_DATA_DIR \
    -D oracle.hadoop.exttab.dataPaths="hdfs:///user/oracle/bgtest_data/obsFrance.txt" \
    -D oracle.hadoop.exttab.columnCount=7 \
    -D oracle.hadoop.connection.url=jdbc:oracle:thin:@//localhost:1521/dg1 \
    -D oracle.hadoop.connection.user=BGUSER \
    -D oracle.hadoop.exttab.printStackTrace=true \
    -createTable  --noexecute
    

    then test the creation fo the external table with it:

    [oracle@dg1 samples]$ ./t3.sh
    
    ./t3.sh: line 2: /u01/orahdfs-2.2.0: Is a directory
    Oracle SQL Connector for HDFS Release 2.2.0 - Production
    Copyright (c) 2011, 2013, Oracle and/or its affiliates. All rights reserved.
    Enter Database Password:]
    The create table command was not executed.
    The following table would be created.
    CREATE TABLE "BGUSER"."BGTEST_DP_XTAB"
    (
     "C1"                             VARCHAR2(4000),
     "C2"                             VARCHAR2(4000),
     "C3"                             VARCHAR2(4000),
     "C4"                             VARCHAR2(4000),
     "C5"                             VARCHAR2(4000),
     "C6"                             VARCHAR2(4000),
     "C7"                             VARCHAR2(4000)
    )
    ORGANIZATION EXTERNAL
    (
       TYPE ORACLE_LOADER
       DEFAULT DIRECTORY "BGT_DATA_DIR"
       ACCESS PARAMETERS
       (
         RECORDS DELIMITED BY 0X'0A'
         CHARACTERSET AL32UTF8
         STRING SIZES ARE IN CHARACTERS
         PREPROCESSOR "OSCH_BIN_PATH":'hdfs_stream'
         FIELDS TERMINATED BY 0X'2C'
         MISSING FIELD VALUES ARE NULL
         (
           "C1" CHAR(4000),
           "C2" CHAR(4000),
           "C3" CHAR(4000),
           "C4" CHAR(4000),
           "C5" CHAR(4000),
           "C6" CHAR(4000),
           "C7" CHAR(4000)
         )
       )
       LOCATION
       (
         'osch-20131022081035-74-1'
       )
    ) PARALLEL REJECT LIMIT UNLIMITED;
    The following location files would be created.
    osch-20131022081035-74-1 contains 1 URI, 54103 bytes
           54103 hdfs://localhost:19000/user/oracle/bgtest_data/obsFrance.txt
    
  22. Then remove the --noexecute flag and create the external Oracle table for the Hadoop data.

    Check the results:

    
    The create table command succeeded.
    
    CREATE TABLE "BGUSER"."BGTEST_DP_XTAB"
    (
     "C1"                             VARCHAR2(4000),
     "C2"                             VARCHAR2(4000),
     "C3"                             VARCHAR2(4000),
     "C4"                             VARCHAR2(4000),
     "C5"                             VARCHAR2(4000),
     "C6"                             VARCHAR2(4000),
     "C7"                             VARCHAR2(4000)
    )
    ORGANIZATION EXTERNAL
    ( 
       TYPE ORACLE_LOADER
       DEFAULT DIRECTORY "BGT_DATA_DIR"
       ACCESS PARAMETERS
       (
         RECORDS DELIMITED BY 0X'0A'
         CHARACTERSET AL32UTF8
         STRING SIZES ARE IN CHARACTERS
         PREPROCESSOR "OSCH_BIN_PATH":'hdfs_stream'
         FIELDS TERMINATED BY 0X'2C'
         MISSING FIELD VALUES ARE NULL
         (
           "C1" CHAR(4000),
           "C2" CHAR(4000),
           "C3" CHAR(4000),
           "C4" CHAR(4000),
           "C5" CHAR(4000),
           "C6" CHAR(4000),
           "C7" CHAR(4000)
         )
       )
       LOCATION
       (
         'osch-20131022081719-3239-1'
       )
    ) PARALLEL REJECT LIMIT UNLIMITED;
    
    The following location files were created.
    
    osch-20131022081719-3239-1 contains 1 URI, 54103 bytes
    
           54103 hdfs://localhost:19000/user/oracle/bgtest_data/obsFrance.txt
    

    This is the view from the SQL Developer:

    and finally the number of lines in the oracle table, imported from our Hadoop HDFS cluster

    SQL> select count(*) from "BGUSER"."BGTEST_DP_XTAB";
                      
    COUNT(*)
    ----------
          1151
    
    

In a next post we will integrate data from a Hive database, and try some ODI integrations with the ODI Big Data connector. Our simplistic approach is just a step to show you how these unstructured data world can be integrated to Oracle infrastructure.

Hadoop, BigData, NoSql are great technologies, they are widely used and Oracle is offering a large integration infrastructure based on these services.

Oracle University presents a complete curriculum on all the Oracle related technologies:

NoSQL:

Big Data:

Oracle Data Integrator:

Oracle Coherence 12c:

Oracle Coherence 12c:

Other Resources:

  • Apache Hadoop : http://hadoop.apache.org/ is the homepage for these technologies.
  • "Hadoop Definitive Guide 3rdEdition" by Tom White is a classical lecture for people who want to know more about Hadoop , and some active "googling " will also give you some more references.

About the author:

Eugene Simos is based in France and joined Oracle through the BEA-Weblogic Acquisition, where he worked for the Professional Service, Support, end Education for major accounts across the EMEA Region. He worked in the banking sector, ATT, Telco companies giving him extensive experience on production environments. Eugene currently specializes in Oracle Fusion Middleware teaching an array of courses on Weblogic/Webcenter, Content,BPM /SOA/Identity-Security/GoldenGate/Virtualisation/Unified Comm Suite) throughout the EMEA region.

About

Expert trainers from Oracle University share tips and tricks and answer questions that come up in a classroom.

Search

Archives
« April 2015
SunMonTueWedThuFriSat
   
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
  
       
Today