Monday Feb 03, 2014

How to generate Scatterplot Matrices using R script in Data Miner

Data Miner provides Explorer node that produces descriptive statistical data and histogram graph, which allows analyst to analyze input data columns individually. Often time an analyst is interested in analyzing the relationships among the data columns, so that he can choose the columns that are closely correlated to the target column for model build purpose. To examine relationships among data columns, he can create scatter plots using the Graph node.

For example, an analyst may want to build a regression model that predicts the customer LTV (long term value) using the INSUR_CUST_LTV_SAMPLE demo data. Before building the model, he can create the following workflow with the Graph node to examine the relationships between interested data columns and the LTV target column.

In the Graph node editor, create a scatter plot with an interested data column (X Axis) against the LTV target column (Y Axis). For the demo, let’s create three scatter plots using these data columns: HOUSE_OWNERSHIP, N_MORTGAGES, and MORTGAGE_AMOUNT.

Here are the scatter plots generated by the Graph node. As you can see the HOUSE_OWNERSHIP and N_MORTGAGES are quite positively correlated to the LTV target column. However, the MORTGAGE_AMOUNT seems less correlated to the LTV target column.

The problem with the above approach is it is laborious to create scatter plots one by one and you cannot examine relationships among those data columns themselves. To solve the problem, we can create a Scatterplot matrix graph as the following:

This is a 4 x4 scatterplot matrix of data column LTV, HOUSE_OWNERSHIP, N_MORTGAGES, and MORTGAGE_AMOUNT. In the top row, you can examine the relationships between HOUSE_OWNERSHIP, N_MORTGAGES, and MORTGAGE_AMOUNT against the LTV target column. In the second row, you can examine the relationships between LTV, N_MORTGAGES, and MORTGAGE_AMOUNT against the HOUSE_OWNERSHIP column. In the third and forth rows, you can examine the relationships of other columns against the N_MORTGAGES, and MORTGAGE_AMOUNT respectively.

To generate this scatterplot matrix, we need to invoke the readily available R script RQG$pairs (via the SQL Query node) in the Oracle R Enterprise. Please refer to http://www.oracle.com/technetwork/database/options/advanced-analytics/r-enterprise/index.html?ssSourceSiteId=ocomen for Oracle R Enterprise installation.

Let’s create the following workflow with the SQL Query node to invoke the R script. Note: a Sample node may be needed to sample down the data size (e.g. 1000 rows) for large data set before it is used for charting.

Enter the following SQL statement in the SQL Query editor. The rqTableEval is a R SQL function that allows user to invoke R script from the SQL side. The first SELECT statement within the function specifies the input data (LTV, HOUSE_OWNERSHIP, N_MORTGAGES, and MORTGAGE_AMOUNT). The second SELECT statement specifies the optional parameter to the R script, where we define the graph title “Scatterplot Matrices”. The output of the function is an XML document with the graph data embedded in it.

SELECT VALUE FROM TABLE
(
rqTableEval(
cursor(select "INSUR_CUST_LTV_SAMPLE_N$10001"."LTV",
"INSUR_CUST_LTV_SAMPLE_N$10001"."HOUSE_OWNERSHIP",
"INSUR_CUST_LTV_SAMPLE_N$10001"."N_MORTGAGES",
"INSUR_CUST_LTV_SAMPLE_N$10001"."MORTGAGE_AMOUNT"
from "INSUR_CUST_LTV_SAMPLE_N$10001"), -- Input Cursor
cursor(select 'Scatterplot Matrices' as MAIN from DUAL), -- Param Cursor
'XML', -- Output Definition
'RQG$pairs' -- R Script
)
)

You can see what default R scripts are available in the R Scripts tab. This tab is visible only when the Oracle R Enterprise installation is detected.

Click the button in the toolbar to invoke the R script to produce the Scatterplot matrix below.

You can copy the Scatterplot matrix image to a clipboard or save it to an image file (PNG) for reporting purpose. To do so, right click on the graph to bring up the pop-up menu below.

The Scatterplot matrix is also available in the Data Viewer of the SQL Query node. To open the Data Viewer, select the “View Data” item in the pop-up menu of the node.

The returning XML data is shown in the Data Viewer as shown below. To view the Scatterplot matrix embedded in the data, click on the XML data to bring up the icon in the far right of the cell, and then click on the icon to bring up the viewer.

Tuesday Nov 12, 2013

Oracle Big Data Learning Library

Click on LEARN BY PRODUCT to view all learning resources.

Oracle Big Data Essentials

Attend this Oracle University Course!

Using Oracle NoSQL Database

Attend this Oracle University class!

Oracle and Big Data on OTN

See the latest resource on OTN.

<script type="text/javascript"> var _gaq = _gaq || []; _gaq.push(['_setAccount', 'UA-46756583-1']); _gaq.push(['_trackPageview']); (function() { var ga = document.createElement('script'); ga.type = 'text/javascript'; ga.async = true; ga.src = ('https:' == document.location.protocol ? 'https://ssl' : 'http://www') + '.google-analytics.com/ga.js'; var s = document.getElementsByTagName('script')[0]; s.parentNode.insertBefore(ga, s); })(); </script>

Monday Jul 15, 2013

Oracle Data Miner GUI, part of SQL Developer 4.0 Early Adopter 1 is now available for download on OTN

The NEW Oracle Data Miner GUI, part of SQL Developer 4.0 Early Adopter 1 is now available for download on OTN.  See link to SQL Developer 4.0 EA1.   


The Oracle Data Miner 4.0 New Features are applicable to Oracle Database 11g Release 2 and Oracle Database Release 12c:  See Oracle Data Miner Extension to SQL Developer 4.0 Release Notes for EA1 for additional information  

· Workflow SQL Script Deployment

o Generates SQL scripts to support full deployment of workflow contents

· SQL Query Node

o Integrate SQL queries to transform data or provide a new data source

o Supports the running of R Language Scripts and viewing of R generated data and graphics


· Graph Node

o Generate Line, Scatter, Bar, Histogram and Box Plots



· Model Build Node Improvements

o Node level data usage specification applied to underlying models

o Node level text specifications to govern text transformations

o Displays heuristic rules responsible for excluding predictor columns

o Ability to control the amount of Classification and Regression test results generated

· View Data

o Ability to drill in to view custom objects and nested tables

These new Oracle Data Miner GUI capabilities expose Oracle Database 12c and Oracle Advanced Analytics/Data Mining Release 1 features:

· Predictive Query Nodes

o Predictive results without the need to build models using Analytical Queries

o Refined predictions based on data partitions

· Clustering Node New Algorithm

o Added Expectation Maximization algorithm

· Feature Extraction Node New Algorithms

o Added Singular Value Decomposition and Principal Component Analysis algorithms

· Text Mining Enhancements

o Text transformations integrated as part of Model's Automatic Data Preparation

o Ability to import Build Text node specifications into a Model Build node

· Prediction Result Explanations

o Scoring details that explain predictive result

· Generalized Linear Model New Algorithm Settings

o New algorithm settings provide feature selection and generation

See OAA on OTN pages http://www.oracle.com/technetwork/database/options/advanced-analytics/index.html for more information on Oracle Advanced Analytics.

<script type="text/javascript"> var _gaq = _gaq || []; _gaq.push(['_setAccount', 'UA-46756583-1']); _gaq.push(['_trackPageview']); (function() { var ga = document.createElement('script'); ga.type = 'text/javascript'; ga.async = true; ga.src = ('https:' == document.location.protocol ? 'https://ssl' : 'http://www') + '.google-analytics.com/ga.js'; var s = document.getElementsByTagName('script')[0]; s.parentNode.insertBefore(ga, s); })(); </script>

Friday Jun 08, 2012

New Oracle Advanced Analytics presentation

I recently updated my presentation on Oracle's new Advanced Analytics Option which bundles Oracle Data Mining with Oracle R Enterprise for maximum depth and breadth of data mining, statistics and advanced analytic functions from Oracle.  See New Oracle Advanced Analytics presentation.  

Wednesday Apr 04, 2012

Recorded YouTube-like presentation and "live" demos of Oracle Advanced Analytics/Oracle Data Mining

Ever want to just sit and watch a YouTube-like presentation and "live" demos of Oracle Advanced Analytics/Oracle Data Mining?  Then click here! (plays large MP4 file in a browser)

This 1+ hour long session focuses primarily on the Oracle Data Mining component of the Oracle Advanced Analytics Option and is tied to the Oracle SQL Developer Days virtual and onsite events.   I cover:

  • Big Data + Big Data Analytics
  • Competing on analytics & value proposition
  • What is data mining?
  • Typical use cases
  • Oracle Data Mining high performance in-database SQL based data mining functions
  • Exadata "smart scan" scoring
  • Oracle Data Miner GUI (an Extension that ships with SQL Developer)
  • Oracle Business Intelligence EE + Oracle Data Mining results/predictions in dashboards
  • Applications "powered by Oracle Data Mining for factory installed predictive analytics methodologies
  • Oracle R Enterprise

Please contact charlie.berger@oracle.com should you have any questions.  Hope you enjoy! 

Charlie Berger, Sr. Director of Product Management, Oracle Data Mining & Advanced Analytics, Oracle Corporation

Wednesday Feb 08, 2012

Oracle Announces Availability of Oracle Advanced Analytics for Big Data

Oracle Announces Availability of Oracle Advanced Analytics for Big Data

Oracle Integrates R Statistical Programming Language into Oracle Database 11g

REDWOOD SHORES, Calif. - February 8, 2012

News Facts

  • Oracle today announced the availability of     Oracle Advanced Analytics, a new option for Oracle Database 11g that bundles Oracle R Enterprise together with Oracle Data Mining.
  • Oracle R Enterprise delivers enterprise class performance for users of the R statistical programming language, increasing the scale of data that can be analyzed by orders of magnitude using Oracle Database 11g.
  • R has attracted over two million users since its introduction in 1995, and Oracle R Enterprise dramatically advances capability for R users. Their existing R development skills, tools, and scripts can now also run transparently, and scale against data stored in Oracle Database 11g.
  • Customer testing of Oracle R Enterprise for Big Data analytics on Oracle Exadata has shown up to 100x increase in performance in comparison to their current environment.
  • Oracle Data Mining, now part of Oracle Advanced Analytics, helps enable customers to easily build and deploy predictive analytic applications that help deliver new insights into business performance. Oracle Advanced Analytics, in conjunction with Oracle Big Data Appliance, Oracle Exadata Database Machine and Oracle Exalytics In-Memory Machine, delivers the industry’s most integrated and comprehensive platform for Big Data analytics.

Comprehensive In-Database Platform for Advanced Analytics

  • Oracle Advanced Analytics brings analytic algorithms to data stored in Oracle Database 11g and Oracle Exadata as opposed to the traditional approach of extracting data to laptops or specialized servers.
  • With Oracle Advanced Analytics, customers have a comprehensive platform for real-time analytic applications that deliver insight into key business subjects such as churn prediction, product recommendations, and fraud alerting.
  • By providing direct and controlled access to data stored in Oracle Database 11g, customers can accelerate data analyst productivity while maintaining data security throughout the enterprise.
  • Powered by decades of Oracle Database innovation, Oracle R Enterprise helps enable analysts to run a variety of sophisticated numerical techniques on billion row data sets in a matter of seconds making iterative, speed of thought, and high-quality numerical analysis on Big Data practical.
  • Oracle R Enterprise drastically reduces the time to deploy models by eliminating the need to translate the models to other languages before they can be deployed in production.
  • Oracle R Enterprise integrates the extensive set of Oracle Database data mining algorithms, analytics, and access to Oracle OLAP cubes into the R language for transparent use by R users.
  • Oracle Data Mining provides an extensive set of in-database data mining algorithms that solve a wide range of business problems. These predictive models can be deployed in Oracle Database 11g and use Oracle Exadata Smart Scan to rapidly score huge volumes of data.
  • The tight integration between R, Oracle Database 11g, and Hadoop enables R users to write one R script that can run in three different environments: a laptop running open source R, Hadoop running with Oracle Big Data Connectors, and Oracle Database 11g.
  • Oracle provides single vendor support for the entire Big Data platform spanning the hardware stack, operating system, open source R, Oracle R Enterprise and Oracle Database 11g. To enable easy enterprise-wide Big Data analysis, results from Oracle Advanced Analytics can be viewed from Oracle Business Intelligence Foundation Suite and Oracle Exalytics In-Memory Machine.

Supporting Quotes

  • “Oracle is committed to meeting the challenges of Big Data analytics. By building upon the analytical depth of Oracle SQL, Oracle Data Mining and the R environment, Oracle is delivering a scalable and secure Big Data platform to help our customers solve the toughest analytics problems,” said Andrew Mendelsohn, senior vice president, Oracle Server Technologies.
  • “We work with leading edge customers who rely on us to deliver better BI from their Oracle Databases. The new Oracle R Enterprise functionality allows us to perform deep analytics on Big Data stored in Oracle Databases. By leveraging R and its library of open source contributed CRAN packages combined with the power and scalability of Oracle Database 11g, we can now do that,” said Mark Rittman, co-founder, Rittman Mead.

Supporting Resources

About Oracle

Oracle engineers hardware and software to work together in the cloud and in your data center. For more information about Oracle (NASDAQ: ORCL), visit http://www.oracle.com.

Trademarks

Oracle and Java are registered trademarks of Oracle and/or its affiliates. Other names may be trademarks of their respective owners.

Contact Info

Eloy Ontiveros
Oracle
+1.650.607.6458

eloy.ontiveros@oracle.com

Joan Levy
Blanc & Otus for Oracle
+1.415.856.5110

jlevy@blancandotus.com

About

Everything about Oracle Data Mining, a component of the Oracle Advanced Analytics Option - News, Technical Information, Opinions, Tips & Tricks. All in One Place

Search

Categories
Archives
« April 2014
SunMonTueWedThuFriSat
  
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
   
       
Today