Friday Nov 01, 2013

Running a simple integration scenario using the Oracle Big Data Connectors on Hadoop/HDFS cluster

Between the elephant ( the tradional image of the Hadoop framework) and the Oracle Iron Man (Big Data..) an english setter could be seen as the link to the right data


Data, Data, Data, we are living in a world where data technology based on popular applications , search engines, Webservers, rich sms messages, email clients, weather forecasts and so on, have a predominant role in our life.

More and more technologies are used to analyze/track our behavior, try to detect patterns, to propose us "the best/right user experience" from the Google Ad services, to Telco companies or large consumer sites (like Amazon:) ). The more we use all these technologies, the more we generate data, and thus there is a need of huge data marts and specific hardware/software servers (as the Exadata servers) in order to treat/analyze/understand the trends and offer new services to the users.

Some of these "data feeds" are raw, unstructured data, and cannot be processed effectively by normal SQL queries. Large scale distributed processing was an emerging infrastructure need and the solution seemed to be the "collocation of compute nodes with the data", which in turn leaded to MapReduce parallel patterns and the development of the Hadoop framework, which is based on MapReduce and a distributed file system (HDFS) that runs on larger clusters of rather inexpensive servers.

Several Oracle products are using the distributed / aggregation pattern for data calculation ( Coherence, NoSql, times ten ) so once that you are familiar with one of these technologies, lets says with coherence aggregators, you will find the whole Hadoop, MapReduce concept very similar.

Oracle Big Data Appliance is based on the Cloudera Distribution (CDH), and the Oracle Big Data Connectors can be plugged on a Hadoop cluster running the CDH distribution or equivalent Hadoop clusters.

In this paper, a "lab like" implementation of this concept is done on a single Linux X64 server, running an Oracle Database 11g Enterprise Edition Release 11.2.0.4.0, and a single node Apache hadoop-1.2.1 HDFS cluster, using the SQL connector for HDFS.

The whole setup is fairly simple:

  1. Install on a Linux x64 server ( or virtual box appliance) an Oracle Database 11g Enterprise Edition Release 11.2.0.4.0 server
  2. Get the Apache Hadoop distribution from: http://mir2.ovh.net/ftp.apache.org/dist/hadoop/common/hadoop-1.2.1.
  3. Get the Oracle Big Data Connectors from: http://www.oracle.com/technetwork/bdc/big-data-connectors/downloads/index.html?ssSourceSiteId=ocomen.
  4. Check the java version of your Linux server with the command:
    java -version
     java version "1.7.0_40"
    Java(TM) SE Runtime Environment (build 1.7.0_40-b43)
    Java HotSpot(TM) 64-Bit Server VM (build 24.0-b56, mixed mode)
    
  5. Decompress the hadoop hadoop-1.2.1.tar.gz file to /u01/hadoop-1.2.1
  6. Modify your .bash_profile
    export HADOOP_HOME=/u01/hadoop-1.2.1
    export PATH=$PATH:$HADOOP_HOME/bin
    
    export HIVE_HOME=/u01/hive-0.11.0
    export PATH=$PATH:$HADOOP_HOME/bin:$HIVE_HOME/bin
    
    (also see my sample .bash_profile)
  7. Set up ssh trust for Hadoop process, this is a mandatory step, in our case we have to establish a "local trust" as will are using a single node configuration
  8. copy the new public keys to the list of authorized keys
  9. connect and test the ssh setup to your localhost:

  10. We will run a "pseudo-Hadoop cluster", in what is called "local standalone mode", all the Hadoop java components are running in one Java process, this is enough for our demo purposes. We need to "fine tune" some Hadoop configuration files, we have to go at our $HADOOP_HOME/conf, and modify the files:
    core-site.xml
                

    hdfs-site.xml
    

    mapred-site.xml
    

  11. check that the hadoop binaries are referenced correctly from the command line by executing:
    
    hadoop  -version
    
  12. As Hadoop is managing our "clustered HDFS" file system we have to create "the mount point" and format it , the mount point will be declared to core-site.xml as:

    The layout under the /u01/hadoop-1.2.1/data will be created and used by other hadoop components (MapReduce => /mapred/...) HDFS is using the /dfs/... layout structure

  13. format the HDFS hadoop file system:
  14. Start the java components for the HDFS system
  15. As an additional check, you can use the GUI Hadoop browsers to check the content of your HDFS configurations:

    Once our HDFS Hadoop setup is done you can use the HDFS file system to store data ( big data : )), and plug them back and forth to Oracle Databases by the means of the Big Data Connectors ( which is the next configuration step).

    You can create / use a Hive db, but in our case we will make a simple integration of "raw data" , through the creation of an External Table to a local Oracle instance ( on the same Linux box, we run the Hadoop HDFS one node cluster and one Oracle DB).

  16. Download some public "big data", I use the site: http://france.meteofrance.com/france/observations, from where I can get *.csv files for my big data simulations :).

    Here is the data layout of my example file:

    Download the Big Data Connector from the OTN (oraosch-2.2.0.zip), unzip it to your local file system (see picture below)

  17. Modify your environment in order to access the connector libraries , and make the following test:

    [oracle@dg1 bin]$./hdfs_stream
    Usage: hdfs_stream locationFile
    [oracle@dg1 bin]$
    
  18. Load the data to the Hadoop hdfs file system:
    hadoop fs  -mkdir bgtest_data
    hadoop  fs  -put obsFrance.txt bgtest_data/obsFrance.txt
    hadoop fs  -ls  /user/oracle/bgtest_data/obsFrance.txt       
    [oracle@dg1 bg-data-raw]$ hadoop fs -ls  /user/oracle/bgtest_data/obsFrance.txt
    
    Found 1 items
    -rw-r--r--   1 oracle supergroup      54103 2013-10-22 06:10 /user/oracle/bgtest_data/obsFrance.txt
    
    [oracle@dg1 bg-data-raw]$hadoop fs -ls  hdfs:///user/oracle/bgtest_data/obsFrance.txt
    
    Found 1 items
    -rw-r--r--   1 oracle supergroup      54103 2013-10-22 06:10 /user/oracle/bgtest_data/obsFrance.txt
    
  19. Check the content of the HDFS with the browser UI:
  20. Start the Oracle database, and run the following script in order to create the Oracle database user, the Oracle directories for the Oracle Big Data Connector (dg1 it’s my own db id replace accordingly yours):
    #!/bin/bash
    export ORAENV_ASK=NO
    export ORACLE_SID=dg1
    . oraenv
    sqlplus /nolog <<EOF
    CONNECT / AS sysdba;
    CREATE OR REPLACE DIRECTORY osch_bin_path  AS  '/u01/orahdfs-2.2.0/bin';
    CREATE USER BGUSER IDENTIFIED BY oracle;
    GRANT CREATE SESSION, CREATE TABLE TO BGUSER;
    GRANT EXECUTE ON sys.utl_file TO BGUSER;
    GRANT READ, EXECUTE ON DIRECTORY osch_bin_path TO BGUSER;
    CREATE OR REPLACE DIRECTORY BGT_LOG_DIR as '/u01/BG_TEST/logs';
    GRANT READ, WRITE ON DIRECTORY BGT_LOG_DIR to BGUSER;
    CREATE OR REPLACE DIRECTORY BGT_DATA_DIR as '/u01/BG_TEST/data';
    GRANT READ, WRITE ON DIRECTORY BGT_DATA_DIR to BGUSER;
    EOF
    
  21. Put the following in a file named t3.sh and make it executable,
    hadoop jar $OSCH_HOME/jlib/orahdfs.jar \
    oracle.hadoop.exttab.ExternalTable \
    -D oracle.hadoop.exttab.tableName=BGTEST_DP_XTAB \
    -D oracle.hadoop.exttab.defaultDirectory=BGT_DATA_DIR \
    -D oracle.hadoop.exttab.dataPaths="hdfs:///user/oracle/bgtest_data/obsFrance.txt" \
    -D oracle.hadoop.exttab.columnCount=7 \
    -D oracle.hadoop.connection.url=jdbc:oracle:thin:@//localhost:1521/dg1 \
    -D oracle.hadoop.connection.user=BGUSER \
    -D oracle.hadoop.exttab.printStackTrace=true \
    -createTable  --noexecute
    

    then test the creation fo the external table with it:

    [oracle@dg1 samples]$ ./t3.sh
    
    ./t3.sh: line 2: /u01/orahdfs-2.2.0: Is a directory
    Oracle SQL Connector for HDFS Release 2.2.0 - Production
    Copyright (c) 2011, 2013, Oracle and/or its affiliates. All rights reserved.
    Enter Database Password:]
    The create table command was not executed.
    The following table would be created.
    CREATE TABLE "BGUSER"."BGTEST_DP_XTAB"
    (
     "C1"                             VARCHAR2(4000),
     "C2"                             VARCHAR2(4000),
     "C3"                             VARCHAR2(4000),
     "C4"                             VARCHAR2(4000),
     "C5"                             VARCHAR2(4000),
     "C6"                             VARCHAR2(4000),
     "C7"                             VARCHAR2(4000)
    )
    ORGANIZATION EXTERNAL
    (
       TYPE ORACLE_LOADER
       DEFAULT DIRECTORY "BGT_DATA_DIR"
       ACCESS PARAMETERS
       (
         RECORDS DELIMITED BY 0X'0A'
         CHARACTERSET AL32UTF8
         STRING SIZES ARE IN CHARACTERS
         PREPROCESSOR "OSCH_BIN_PATH":'hdfs_stream'
         FIELDS TERMINATED BY 0X'2C'
         MISSING FIELD VALUES ARE NULL
         (
           "C1" CHAR(4000),
           "C2" CHAR(4000),
           "C3" CHAR(4000),
           "C4" CHAR(4000),
           "C5" CHAR(4000),
           "C6" CHAR(4000),
           "C7" CHAR(4000)
         )
       )
       LOCATION
       (
         'osch-20131022081035-74-1'
       )
    ) PARALLEL REJECT LIMIT UNLIMITED;
    The following location files would be created.
    osch-20131022081035-74-1 contains 1 URI, 54103 bytes
           54103 hdfs://localhost:19000/user/oracle/bgtest_data/obsFrance.txt
    
  22. Then remove the --noexecute flag and create the external Oracle table for the Hadoop data.

    Check the results:

    
    The create table command succeeded.
    
    CREATE TABLE "BGUSER"."BGTEST_DP_XTAB"
    (
     "C1"                             VARCHAR2(4000),
     "C2"                             VARCHAR2(4000),
     "C3"                             VARCHAR2(4000),
     "C4"                             VARCHAR2(4000),
     "C5"                             VARCHAR2(4000),
     "C6"                             VARCHAR2(4000),
     "C7"                             VARCHAR2(4000)
    )
    ORGANIZATION EXTERNAL
    ( 
       TYPE ORACLE_LOADER
       DEFAULT DIRECTORY "BGT_DATA_DIR"
       ACCESS PARAMETERS
       (
         RECORDS DELIMITED BY 0X'0A'
         CHARACTERSET AL32UTF8
         STRING SIZES ARE IN CHARACTERS
         PREPROCESSOR "OSCH_BIN_PATH":'hdfs_stream'
         FIELDS TERMINATED BY 0X'2C'
         MISSING FIELD VALUES ARE NULL
         (
           "C1" CHAR(4000),
           "C2" CHAR(4000),
           "C3" CHAR(4000),
           "C4" CHAR(4000),
           "C5" CHAR(4000),
           "C6" CHAR(4000),
           "C7" CHAR(4000)
         )
       )
       LOCATION
       (
         'osch-20131022081719-3239-1'
       )
    ) PARALLEL REJECT LIMIT UNLIMITED;
    
    The following location files were created.
    
    osch-20131022081719-3239-1 contains 1 URI, 54103 bytes
    
           54103 hdfs://localhost:19000/user/oracle/bgtest_data/obsFrance.txt
    

    This is the view from the SQL Developer:

    and finally the number of lines in the oracle table, imported from our Hadoop HDFS cluster

    SQL> select count(*) from "BGUSER"."BGTEST_DP_XTAB";
                      
    COUNT(*)
    ----------
          1151
    
    

In a next post we will integrate data from a Hive database, and try some ODI integrations with the ODI Big Data connector. Our simplistic approach is just a step to show you how these unstructured data world can be integrated to Oracle infrastructure.

Hadoop, BigData, NoSql are great technologies, they are widely used and Oracle is offering a large integration infrastructure based on these services.

Oracle University presents a complete curriculum on all the Oracle related technologies:

NoSQL:

Big Data:

Oracle Data Integrator:

Oracle Coherence 12c:

Oracle Coherence 12c:

Other Resources:

  • Apache Hadoop : http://hadoop.apache.org/ is the homepage for these technologies.
  • "Hadoop Definitive Guide 3rdEdition" by Tom White is a classical lecture for people who want to know more about Hadoop , and some active "googling " will also give you some more references.

About the author:

Eugene Simos is based in France and joined Oracle through the BEA-Weblogic Acquisition, where he worked for the Professional Service, Support, end Education for major accounts across the EMEA Region. He worked in the banking sector, ATT, Telco companies giving him extensive experience on production environments. Eugene currently specializes in Oracle Fusion Middleware teaching an array of courses on Weblogic/Webcenter, Content,BPM /SOA/Identity-Security/GoldenGate/Virtualisation/Unified Comm Suite) throughout the EMEA region.

Wednesday Aug 29, 2012

Integrating Oracle Hyperion Smart View Data Queries with MS Word and Power Point

Untitled Document

Most Smart View users probably appreciate that they can use just one add-in to access data from the different sources they might work with, like Oracle Essbase, Oracle Hyperion Planning, Oracle Hyperion Financial Management and others. But not all of them are aware of the options to integrate data analyses not only in Excel, but also in MS Word or Power Point. While in the past, copying and pasting single numbers or tables from a recent analysis in Excel made the pasted content a static snapshot, copying so called Data Points now creates dynamic, updateable references to the data source. It also provides additional nice features, which can make life easier and less stressful for Smart View users.

So, how does this option work: after building an ad-hoc analysis with Smart View as usual in an Excel worksheet, any area including data cells/numbers from the database can be highlighted in order to copy data points - even single data cells only.

 

TIP

It is not necessary to highlight and copy the row or column descriptions

 

Next from the Smart View ribbon select Copy Data Point.

Then transfer to the Word or Power Point document into which the selected content should be copied. Note that in these Office programs you will find a menu item Smart View; from it select the Paste Data Point icon.

The copied details from the Excel report will be pasted, but showing #NEED_REFRESH in the data cells instead of the original numbers.

After clicking the Refresh icon on the Smart View menu the data will be retrieved and displayed. (Maybe at that moment a login window pops up and you need to provide your credentials.)

It works in the same way if you just copy one single number without any row or column descriptions, for example in order to incorporate it into a continuous text:

Before refresh:

After refresh:

From now on (provided that you are connected online to your database or application) for any subsequent updates of the data shown in your documents you only need to refresh data by clicking the Refresh button on the Smart View menu, without copying and pasting the context or content again.

As you might realize, trying out this feature on your own, there won’t be any Point of View shown in the Office document. Also you have seen in the example, where only a single data cell was copied, that there aren’t any member names or row/column descriptions copied, which are usually required in an ad-hoc report in order to exactly define where data comes from or how data is queried from the source. Well, these definitions are not visible, but they are transferred to the Word or Power Point document as well. They are stored in the background for each individual data cell copied and can be made visible by double-clicking the data cell as shown in the following screen shot (but which is taken from another context).

 

So for each cell/number the complete connection information is stored along with the exact member/cell intersection from the database. And that’s not all: you have the chance now to exchange the members originally selected in the Point of View (POV) in the Excel report. Remember, at that time we had the following selection:

 

By selecting the Manage POV option from the Smart View menu in Word or Power Point…

 

… the following POV Manager – Queries window opens:

 

You can now change your selection for each dimension from the original POV by either double-clicking the dimension member in the lower right box under POV: or by selecting the Member Selector icon on the top right hand side of the window. After confirming your changes you need to refresh your document again. Be aware, that this will update all (!) numbers taken from one and the same original Excel sheet, even if they appear in different locations in your Office document, reflecting your recent changes in the POV.

TIP

Build your original report already in a way that dimensions you might want to change from within Word or Power Point are placed in the POV.

And there is another really nice feature I wouldn’t like to miss mentioning: Using Dynamic Data Points in the way described above, you will never miss or need to search again for your original Excel sheet from which values were taken and copied as data points into an Office document. Because from even only one single data cell Smart View is able to recreate the entire original report content with just a few clicks:

Select one of the numbers from within your Word or Power Point document by double-clicking.

 

Then select the Visualize in Excel option from the Smart View menu.

Excel will open and Smart View will rebuild the entire original report, including POV settings, and retrieve all data from the most recent actual state of the database. (It might be necessary to provide your credentials before data is displayed.)

However, in order to make this work, an active online connection to your databases on the server is necessary and at least read access to the retrieved data. But apart from this, your newly built Excel report is fully functional for ad-hoc analysis and can be used in the common way for drilling, pivoting and all the other known functions and features.

So far about embedding Dynamic Data Points into Office documents and linking them back into Excel worksheets. You can apply this in the described way with ad-hoc analyses directly on Essbase databases or using Hyperion Planning and Hyperion Financial Management ad-hoc web forms.

If you are also interested in other new features and smart enhancements in Essbase or Hyperion Planning stay tuned for coming articles or check our training courses and web presentations.

You can find general information about offerings for the Essbase and Planning curriculum or other Oracle-Hyperion products here (please make sure to select your country/region at the top of this page) or in the OU Learning paths section , where Planning, Essbase and other Hyperion products can be found under the Fusion Middleware heading (again, please select the right country/region). Or drop me a note directly: bernhard.kinkel@oracle.com .

About the Author:

Bernhard Kinkel started working for Hyperion Solutions as a Presales Consultant and Consultant in 1998 and moved to Hyperion Education Services in 1999. He joined Oracle University in 2007 where he is a Principal Education Consultant. Based on these many years of working with Hyperion products he has detailed product knowledge across several versions. He delivers both classroom and live virtual courses. His areas of expertise are Oracle/Hyperion Essbase, Oracle Hyperion Planning and Hyperion Web Analysis.

 

About

Expert trainers from Oracle University share tips and tricks and answer questions that come up in a classroom.

Search

Archives
« April 2014
SunMonTueWedThuFriSat
  
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
22
23
24
25
26
27
28
29
30
   
       
Today