Subscribe

Share

Application Developer

Big Data Meets Business Intelligence

Explore Oracle Big Data Lite Virtual Machine examples and tutorials to see how big data gets reported.

By Mark Rittman Oracle ACE Director

September/October 2014

If you’re new to big data and interested in learning more about technologies such as Apache Hadoop, Hadoop Distributed File System (HDFS), MapReduce, and NoSQL, the recently released Oracle Big Data Lite Virtual Machine (available for download from Oracle Technology Network) is a great place to get started. Designed for testing and educational purposes and based on the software used in Oracle Big Data Appliance, this virtual machine (VM) comes with examples and tutorials to get you started with key Oracle big data technologies.

Recent releases of Oracle Business Intelligence Enterprise Edition 11g have introduced new options for connecting to big data sources, including the ability for Oracle Business Intelligence Enterprise Edition 11g to connect to Apache Hive, a subproject within Hadoop that provides a SQL-like interface for Hadoop and NoSQL data. Using this new Apache Hive connectivity, you can include data from Oracle Big Data Appliance and other Hadoop clusters in your reports and dashboards, combining it with other datasources—such as Oracle Database and Oracle Essbase—to provide a complete enterprisewide view of your data.

Connecting to Oracle Big Data Lite Virtual Machine

To show how Oracle Business Intelligence Enterprise Edition 11g works with big data sources, I will use Oracle Business Intelligence Enterprise Edition 11g to connect to the Hadoop installation on Oracle Big Data Lite Virtual Machine, using Hive connectivity.

I’ll connect to a Hive database called moviedemo, which contains a single Hive table, which, in turn, maps to a web server log file containing details of customer movie recommendations. Then, because this Hive table contains only basic user interaction details, I’ll “blend” this Hadoop-sourced data with useful reference data from Oracle Database 12c, also located in Oracle Big Data Lite Virtual Machine.

To follow along with this example, you will need the following software and VMs:

  • The Oracle Big Data Lite Virtual Machine, along with Oracle VM VirtualBox if you do not already have it installed
  • An installation of Oracle Business Intelligence Enterprise Edition 11g (11.1.1.7); for this example, I use a Microsoft Windows–based installation
  • Hive Open Database Connectivity (ODBC) drivers available for download at bit.ly/1j0mtXX (Note that the Hive ODBC drivers that ship with Oracle Business Intelligence Enterprise Edition 11g [11.1.1.7] are for the older HiveServer1 protocol whereas Oracle Big Data Lite Virtual Machine comes with the Cloudera distribution that includes Hadoop 4.6, which uses the newer HiveServer2 protocol.)

Your installations of Oracle Big Data Lite Virtual Machine and Oracle Business Intelligence Enterprise Edition 11g (11.1.1.7) must be visible to each other on the same network. If you want to run the example in this article on your own laptop or desktop computer, the machine will need as much as 12 GB of RAM total for both Oracle Big Data Lite Virtual Machine and your installation of Oracle Business Intelligence Enterprise Edition 11g (11.1.1.7), and you will need to use the “host-only networking” option in Oracle VM VirtualBox.

If your installation of Oracle Business Intelligence Enterprise Edition 11g (11.1.1.7) is on your organization’s network, you will need to configure Oracle VM VirtualBox to work with bridged networking, so that it picks up an IP address from your organization’s DHCP server. See “Oracle Big Data Lite 3.0 Quick Deployment” for details on installing Oracle Big Data Lite Virtual Machine and choosing between the different networking options.

In addition, your installation of Oracle Business Intelligence Enterprise Edition 11g (11.1.1.7) must resolve the DNS name bigdatalite.localdomain to the IP address that Oracle Big Data Lite Virtual Machine is using. In order to configure this, follow these steps:

  1. With Oracle Big Data Lite Virtual Machine open and the desktop visible, right-click in the desktop, and select Open in Terminal from the menu.
  2. At the command prompt, enter the command ifconfig. Then, for the first adapter (typically labeled eth2 or eth4) in the list of network adapters displayed, locate the IP address next to inet addr. In my case, the IP address listed is 172.16.107.221, but yours will be different. Note this IP address for later, and select File -> Close to close the Terminal window.
  3. In Oracle Business Intelligence Enterprise Edition 11g (11.1.1.7), assuming that you are using Microsoft Windows 2008 R2 as your OS environment, click the Start button and right-click the notepad icon when it’s displayed, selecting Run as Administrator from the menu. Then, using the File dialog box, navigate to Windows -> System32 -> drivers -> and so on and open the file called hosts. With this file open, create an entry that resolves Oracle Big Data Lite Virtual Machine to its correct IP address, as shown below (note that your IP address will be different):
    # localhost name resolution is
    # handled within DNS itself.
    #	127.0.0.1       localhost
    #	::1            localhost
    172.16.107.221 bigdatalite.localdomain bigdatalite

    Then, select File -> Close to close the Notepad window.

  4. Next, still within the Oracle Business Intelligence Enterprise Edition 11g environment, install the Hive ODBC drivers, using the Windows program installer you downloaded earlier. Then create a new ODBC datasource using those drivers and pointing to Oracle Big Data Lite Virtual Machine by selecting Start -> Control Panel -> Administrative Tools -> Data Sources (ODBC), and then create a new system datasource name (DSN), using the following options:

    Driver: Cloudera ODBC Driver for Apache Hive
    Data Source Name: hive_bigdatalite
    Port: 10000
    Database: moviedemo
    Hive Server Type: Hive Server 2
    Mechanism: user name
    User Name: oracle

Click Test to check that the datasource connects successfully, click OK, and then click OK again to close the dialog box.

Connect to Hive and Hadoop Sources

At this point, all the connections are in place to start connecting your Oracle Business Intelligence Server repository to the Hive server on Oracle Big Data Lite Virtual Machine. In the following example, I’ll edit the Oracle Business Intelligence Server repository online to bring in the Hive table (but you could also create a new offline repository and upload that to your Oracle Business Intelligence Server, using Oracle Fusion Middleware Control).

  1. With your Oracle Business Intelligence Server started, select Start -> Oracle Business Intelligence -> BI Administration from the Windows desktop. When the Oracle BI Administration tool starts, select File -> Open -> Online and enter the details for your Oracle Business Intelligence server and repository. For example,
    Repository Password: Admin123
    User: weblogic
    Password: welcome1
  2. To import the table metadata for the log file table, select File -> Import Metadata from the BI Administration toolbar and then select and enter the following values to connect to the Hive server on Oracle Big Data Lite Virtual Machine:
    Connection Type: ODBC 3.5
    DSN: hive_bigdatalite
    Username: oracle
    Password: welcome1

    Click Next, and then click Next again.

  3. You will be presented with a list of the databases that are available from the Hive server. Figure 1 shows an example of this list, including any other Hive databases you may have created on Oracle Big Data Lite Virtual Machine prior to this example. Navigate to the moviedemo database, select the movieapp_log_stage Hive table, and then click Import Selected to move it to the Repository View panel.


    o54ba-f1

    Figure 1: Importing the Hive table metadata into the BI repository

  4. With the BI Administration tool still open, locate the hive_bigdatalite database within the Physical layer panel and double-click it to open it for editing. With the General tab selected, change Database Type to Apache Hadoop and click OK. Click No when asked if you want to edit the connection pool that accompanies the physical database. Then click the Save button on the application toolbar, and select No when asked whether you want to check the repository for consistency. (You will perform a consistency check of the repository after all edits are complete.)
  5. Now you can test the connection to the Hive table on Hadoop. Right-click the movieapp_log_stage physical table within the Moviedemo physical database, and select View Data. After a short while, you should see a table of data displayed from the Hive server; the delay in returning data is because Hive, in turn, runs a MapReduce job to return data from the HDFS log file, an approach designed more for batch processing but capable of handling very large sets of data. Figure 2 shows data returned from the Hive table.


    o54ba-f2

    Figure 2: Returning log data from the Hive table

Using Oracle Business Intelligence Enterprise Edition 11g to Blend Hadoop and Relational Data

You now have a successful connection to the Hive table that displays data from the log file in the Hadoop file system, but on its own, this file isn’t very useful. It includes customer IDs, for example, rather than the descriptive name of the customer or where that person is located.

There are various ways you could supplement this log data with customer and product reference data. You could upload customer and product files into HDFS and join that data with the log file data, using a HiveQL join, similar to a join in regular SQL. Alternatively, and as I’ll do now, you could use the data federation capabilities of Oracle Business Intelligence Enterprise Edition 11g (11.1.1.7) to retrieve this data directly from Oracle Database and then have Oracle Business Intelligence Server join it to the Hive table dynamically when users query it. To set up this link to Oracle Database on Oracle Big Data Lite Virtual Machine and import customer and movie reference tables into the Oracle Business Intelligence Server repository, follow these steps:

  1. With the BI Administration tool still open and with your repository open online for editing, select File -> Import Metadata from the application menu. Then select and enter the following values to connect to Oracle Database in Oracle Big Data Lite Virtual Machine:
    Connection Type: OCI 10g/11g
    Data Source Name: (DESCRIPTION=(ADDRESS_LIST=(ADDRESS=(PROTOCOL = TCP)(HOST = bigdatalite .localdomain)(PORT = 1521)))(CONNECT_DATA = (SERVICE_NAME = orcl)))
    User Name: moviedemo
    Password: welcome1

    Click Next, and then click Next again.

  2. From the Data source view in the left-hand panel of the dialog box, locate the MOVIEDEMO schema and then the CUSTOMER and MOVIE tables within it. Use Ctrl-click to select both tables, click Import Selected, and then click OK to move the tables into the Repository view. Click Finish to close the dialog box, and then click Save in the application toolbar to save your repository, clicking No when asked if you want to check the repository for consistency.
  3. To create a physical join between the Hive log table and the two Oracle reference tables: within the Physical layer, use Ctrl-click to select the three tables, right-click, and then select Physical Diagram -> Selected Objects Only. Then, when the three tables appear in the Physical Diagram window, click the movieapp_log_stage table and then New Join. Using the displayed arrow, click the MOVIE table first and join it to the movieapp_log_stage table at MOVIE_ID = movieid. Then create another join, this time joining the movieapp_log_stage table to the CUSTOMER table at CUST_ID = custid. When you’re done, close the Physical Diagram window and return to the main BI Administration window.
  4. Now you will create a business model that brings in data from both Hive and Oracle Database. To do so, right-click anywhere in the Business Model and Mapping pane and select New Business Model from the menu. When prompted, name the model Movie Recommendation Analysis, and click OK to close the dialog box.
  5. Use Ctrl-click to select the three tables in the Physical layer, and drag and drop them on top of the new business model in the Business Model and Mapping layer. When you do so, an error message reports that the ADDRESS_POINT column in the CUSTOMER table has no datatype. You won’t need this for the example, so return to the CUSTOMER table in the Physical layer, double-click it to edit it, select the Columns tab, and click Delete to delete the ADDRESS_POINT column from that physical table. Then repeat the process of using Ctrl-click and dragging the three tables into the new business model, which should now work for you (without returning an error).
  6. Now, within the new business model, you need to set default aggregation types for the measure columns in the movieapp_log_stage logical table. To do so, locate the table within the business model, click the + button next to the table name, and then double-click the rating column within it. In the Logical Column dialog box, select the Aggregation tab and then choose Avg for Default Aggregation Rule. Repeat this step once again for the sales column, but set its Default Aggregation Rule to Sum, and then click OK.
  7. Finally, to create a Presentation layer subject area for this business model, click and drag the business model within the Business Model and Mapping layer, drop it into the Presentation layer, and then click Save. Click Yes when asked if you want to check the repository for consistency, and then click Yes and OK in the various dialog boxes as the repository is checked in and saved back to Oracle Business Intelligence Server. Your repository should then, in its final state, look like the one shown in Figure 3.


    o54ba-f3

    Figure 3: The final Oracle Business Intelligence Server repository

Easy Reporting Against Blended Datasources You can now open up the Oracle Business Intelligence website and create your first report using data from Hive and Oracle Database. To do this, navigate to the Oracle Business Intelligence website (for example, http://localhost:7001/analytics if you are working with a local VM) and enter your login credentials, such as weblogic/welcome1. Then, from the website menu, do the following to create a sample analysis:
  1. Select New -> Analysis, and then select your subject area from the list of subject areas, which—if you have followed the previous steps—will be called Movie Recommendation Analysis.
  2. With the Criteria tab selected, select the following three subject area columns as the analysis criteria:
    movieapp_log_stage.rating
    movieapp_log_stage.sales
    CUSTOMER.COUNTRY

    and use the Filter menu item for the CUSTOMER.COUNTRY column to create the following filter condition:
    COUNTRY is equal to / is in Canada; Netherlands; United Kingdom
  3. Click the Results tab to see the results of the query. The average ratings and total sales for the countries in the filter condition will be displayed, with measures from a log file on HDFS accessed through Hive, combined with reference data from Oracle Database. Figure 4 shows the query results.


    o54ba-f4

    Figure 4: The final analysis, combining Hive and Oracle Database data

Final Report

Oracle Business Intelligence Enterprise Edition 11g (11.1.1.7) can report against a wide range of enterprise datasources and can bring those datasources together into a single analysis, with the underlying data extraction complexity hidden from the user. Recent releases of Oracle Business Intelligence Enterprise Edition 11g introduced the ability to connect to Hadoop as a datasource, and using Oracle Big Data Lite Virtual Machine is a great way to get familiar with big data technologies such as Hadoop, HDFS, Hive, and MapReduce.

Next Steps

 LEARN more about big data

READ more Rittman
 Rittman Mead blog
 Oracle Magazine business intelligence columns

DOWNLOAD
 Oracle Big Data Lite Virtual Machine
 Oracle Business Intelligence Enterprise Edition 11g (11.1.1.7)
 Hive ODBC drivers

Photography by Ricardo Gomez, Unsplash