Tuesday Jul 12, 2016

Mining Structured Data and Unstructured Data using Oracle Advanced Analytics 12c

Oracle Advanced Analytics (OAA) Database Option leverages Oracle Text, a free feature of the Oracle Database, to pre-process (tokenize) unstructured data for ingestion by the OAA data mining algorithms.  By moving, parallelized implementations of  machine learning algorithms inside the Oracle Database, data movement is eliminated and we can leverage other strengths of the Database such as Oracle Text (not to mention security, scalability, auditing, encryption, back up, high availability, geospatial data, etc.

This Mining Structured Data and Unstructured Data using Oracle Advanced Analytics 12c YouTube video presents an overview of the capabilities for combining and performing data mining on both structured and unstructured data.  The YouTube includes several quick demonstrations on classification and clustering using unstructured data and provides instructions and links on how to get started--either on premise or on the Oracle Cloud.  

I hope you find this helpful and a pleasure to watch.   

Presentation Slides.

You can also access similar YouTube videos at this Oracle Data Mining at the Movies blog posting.

Follow CharlieDataMine on Twitter.

Thanks for watching!


Sr. Dir. of Product Management, Oracle Advanced Analytics and Data Mining


Tuesday Mar 29, 2016

My Favoriate Oracle Data Miner Demo Workflows - Part 1 in a Series: CUST_INSUR_LTV

Part 1 (of a planned series of blog posts):  Here are a few of my favorite Oracle Data Miner demo workflows.  They all are simple, easy to create examples of data mining and predictive analytics using Oracle Advanced Analytics and SQL Developer's Oracle Data Miner extension.


Oracle Data Miner ships with some small datasets to get users started including INSUR_CUST_LTV_SAMPLE (1,015 records).  While this tiny dataset doesn't bloat the SQL Developer download size and helps get Oracle Data Miner users quickly up and running, the data size is so small that the the resulting predictive models and insights can seem at times a bit trivial.  Hence, I prefer to use larger files that ship with the Oracle 12c Database Sample Examples (SH.schema, MINING_DATA_BUILD, etc.) and this CUST_INSUR_LTV demo data:  

CUST_INSUR_LTV.DMP (~25K records, ~25 attributes)  

CUST_INSUR_LTV_APPLY.DMP (~25K records, ~25 attributes)  

Predicting Insurance Buyers ODMr workflow

You can import the workflow and and datasets and everything should run.  

This workflow includes an Explore node and Graph node that are typically used to visualize the data before performing data mining.  The Explore node step is important to make sure the data you are about to analyze makes sense and seems accurate and reasonable.  For example, AGE should all be positive numbers and range from 0 to say 100+.

The Column Filter node performs data profiling and data quality checks on the data and is also used to perform an Attribute Importance analysis to determine which attributes (or input variables) have the largest correlation with the target attribute (Buy_Insurance).  Sometimes this step alone provides significant value to a company to better understand the key factors, but here, we're also using it to better understand which attributes have the most inpact on our business problem--targeting customers who are likley to buy insurance.  Note:  Each of the OAA/ODM algorithms have their own embedded attribute importance/feature selection capabilties and each can handle hundreds to thousands of input attributes.  However, many times we want to get a feel for what's driving our business problem and learn where we could focus to pull in additional attributes and "engineered features" e.g "AGE/INCOME ratio" or "Maximum_Amount" etc..  

We build four (4) Oracle Data Mining Classification models by default (Decision Tree, Naive Bayes, GLM Logistic Regression and Support Vector Machine (SVM)).  For simplicity, we accept the ODMr defaults for Data Preparation and Algorithm Settings and can be assured that with Oracle Data Miner default settings, we should achieve a "good predictive model".

Decision Trees generally produce good predictive models and have the added benefit of being easy to understand. Notice the IF.... THEN... rules.

Lastly, we use the Apply node and our Classification node to make predictions on our CUST_INSUR_LTV_APPLY table and get our predictions.  

The predictions and associated Prediction_Details are stored inside the Oracle Database and hence easily available for inclusion in any BI Dashboard or real-time application.  

Oracle Data Miner generates the PL/SQL and SQL scripts for accelerating deploying analytical methodologies that leverage the scalability and infrstructure of the Oracle Database.  See this Oracle Data Miner: Use Reposiitory APIs to Manage and Schedule Workflows to Run White Paper for more details on the many model deployment options. 

Hope you enjoy!


Monday Feb 29, 2016

Guest Lecture on Big Data & Analytics to U. Kansas Business School Students

Recently, I was asked by a friend and colleague, Chris Claterbos, Lecturer at University of Kansas' Business School, to deliver a guest lecture to his business analytics students.  

In preparation, so as to not make it an entirely Oracle "product" presentation, I tried to gather some general information on the big data + analytics market, job opportunities & careers and future musings about where the industry is headed.  I liked resulting presentation so am posting and sharing it here.  

U. Kansas Guest Lecture on Big Data Analytics with Oracle's Advanced Analytics, Big Data SQL and Cloud

  • Big Data + Analytics “Phenomenon”
  • Careers in Big Data Analytics
  • Product
    • Oracle Advanced Analytics Overview & Features/Benefits
    • Brief Demos
  • Example Customer References
  • Applications “Powered by OAA”
  • Getting Started
  • Q & A

Enjoy!  Hopefully all you all become data'n'science stars!  


Wednesday Jul 15, 2015

Call for Abstracts at BIWA Summit'16 - The Oracle Big Data + Analytics User Conference

Please email shyamvaran@gmail.com with any questions regarding the submission process.

What Successes Can You Share?

We want to hear your story. Submit your proposal today for the Oracle BIWA Summit 2016.

Proposals will be accepted through Monday evening, November 2, 2015, at midnight, EST. Don’t wait, though—we’re accepting submissions on a rolling basis, so that selected sessions can be published early on our online agenda.

To submit your abstract, click here, select a track, fill out the form.

Please note:

  • Presentations must be noncommercial.
  • Sales promotions for products or services disguised as proposals will be eliminated. 
  • Speakers whose abstracts are accepted will be expected to submit (at a later date) a PowerPoint presentation slide set. 
  • Accompanying technical and use case papers are encouraged, but not required.

Speakers whose abstracts are accepted will be given a complimentary registration to the conference. (Any additional co-presenters must register for the event separately and provide appropriate registration fees. It is up to the co-presenters’ discretion which presenter to designate for the complimentary registration.) 

This Year’s Tracks

Proposals can be submitted for the following tracks: 

More About the Conference

The Oracle BIWA Summit 2016 is organized and managed by the Oracle BIWA SIG, the Oracle Spatial SIG, and the Oracle Northern California User Group. The event attracts top BI, data warehousing, analytics, Spatial, IoT and Big Data experts.

The three-day event includes keynotes from industry experts, educational sessions, hands-on labs, and networking events.

Hot topics include: 

  • Database, data warehouse and cloud, Big Data architecture
  • Deep dives and hands-on labs on existing Oracle BI, data warehouse, and analytics products
  • Updates on the latest Oracle products and technologies (e.g. Big Data Discovery, Oracle Visual Analyzer, Oracle Big Data SQL)
  • Novel and interesting use cases on everything – Spatial, Graph, Text, Data Mining, IoT, ETL, Security, Cloud
  • Working with Big Data (e.g., Hadoop, "Internet of Things,” SQL, R, Sentiment Analysis)
  • Oracle Business Intelligence (OBIEE), Oracle Big Data Discovery, Oracle Spatial, and Oracle Advanced Analytics—Better Together

Hope to see you at BIWA'16 in January, 2016!


Monday May 04, 2015

Oracle Data Miner 4.1, SQL Developer 4.1 Extension Now Available!

To download, visit:  


New Data Miner Features in SQL Developer 4.1

These new Data Miner 4.1 features are supported for database versions supported by Oracle Data Miner: 
JSON Data Support for Oracle Database and above

In response to the growing popularity of JSON data and its use in Big Data configurations, Data Miner now provides an easy to use JSON Query node. The JSON Query node allows you to select and aggregate JSON data without entering any SQL commands. The JSON Query node opens up using all of the existing Data Miner features with JSON data. The enhancements include:

Data Source Node
o    Automatically identifies columns containing JSON data by identifying those with the IS_JSON constraint.
o    Generates JSON schema for any selected column that contain JSON data.
o    Imports a JSON schema for a given column.
o    JSON schema viewer.

Create Table Node
o    Ability to select a column to be typed as JSON.
o    Generates JSON schema in the same manner as the Data Source node.

JSON Data Type
o    Columns can be specifically typed as JSON data.

JSON Query Node (see related JSON node blog posting)
o    Ability to utilize any of the selection and aggregation features without having to enter SQL commands.
o    Ability to select data from a graphical layout of the JSON schema, making data selection as easy as it is with scalar relational data columns.
o    Ability to partially select JSON data as standard relational scalar data while leaving other parts of the same JSON document as JSON data.
o    Ability to aggregate JSON data in combination with relational data. Includes the Sub-Group By option, used to generate nested data that can be passed into mining model build nodes. 

General Improvements
o    Improved database session management resulting in less database sessions being generated and a more responsive user interface.
o    Filter Columns Node - Combined primary Editor and associated advanced panel to improve usability.
o    Explore Data Node - Allows multiple row selection to provide group chart display.
o    Classification Build Node - Automatically filters out rows where the Target column contains NULLs or all Spaces. Also, issues a warning to user but continues with Model build.
o    Workflow - Enhanced workflows to ensure that Loading, Reloading, Stopping, Saving operations no longer block the UI.
o    Online Help - Revised the Online Help to adhere to topic-based framework.

Selected Bug Fixes (does not include 4.0 patch release fixes)
o    GLM Model Algorithm Settings: Added GLM feature identification sampling option (Oracle Database 12.1 and above).
o    Filter Rows Node: Custom Expression Editor not showing all possible available columns.
o    WebEx Display Issues: Fixed problems affecting the display of the Data Miner UI through WebEx conferencing.

For More Information and Support, please visit the Oracle Data Mining Discussion Forum on the Oracle Technology Network (OTN)

Return to Oracle Data Miner page on OTN

Thursday Mar 21, 2013

Recorded Webcast: Best Practices using Oracle Advanced Analytics with Oracle Exadata

Best Practices using Oracle Advanced Analytics with Oracle Exadata

 On Demand
Launch Presentation

Join us to learn how Oracle Advanced Analytics extends the Oracle database into a comprehensive advanced analytics platform through two major components, Oracle R Enterprise and Oracle Data Mining. Using these tools with Oracle Exadata Database Machine will allow organizations to perform at their peak and find real business value within their data.

You need to visit this Oracle Exadata Webcast Main page first and submit your registration information.  Then you’ll receive an email so you can view the Webcast.  This is external so you can share with anyone can download the presentation as well.  FYI.  Charlie

<script type="text/javascript"> var _gaq = _gaq || []; _gaq.push(['_setAccount', 'UA-46756583-1']); _gaq.push(['_trackPageview']); (function() { var ga = document.createElement('script'); ga.type = 'text/javascript'; ga.async = true; ga.src = ('https:' == document.location.protocol ? 'https://ssl' : 'http://www') + '.google-analytics.com/ga.js'; var s = document.getElementsByTagName('script')[0]; s.parentNode.insertBefore(ga, s); })(); </script>

Sunday Oct 31, 2010

Effective Hypertensive Treatment using Oracle Data Mining in Saudi Arabia

[Read More]

Monday Mar 08, 2010

OpenWorld 2010 Call for Presentations is Now Open

[Read More]

Thursday Feb 18, 2010

Oracle Data Mining Races with America's Cup

Oracle Data Mining was used by the performance analysis team of the BMW/Oracle Racing team in their preparation to win the America's Cup race off the coast of Spain.
BMW Oracle America Cup logo.jpg

The America's Cup has been away from U.S. shores for 15 years, the longest drought since 1851.  With the challenge of squeezing out every micro-joule of energy from the wind and with the goal of maximizing "velocity made good", the BMW Oracle Racing Team turned to Oracle Data Mining. 

"Imagine standing under an avalanche of data - 2500 variables, 10 times per second and a sailing team demanding answers to design and sailing variations immediately. This was the challenge facing the BMW ORACLE Racing Performance Analysis Team every sailing day as they refined and improved their giant 90 foot wide, 115 foot long trimaran sporting the largest hard-sail wing ever made. Using ORACLE DATA MINING accessing an ORACLE DATABASE and presenting results real time using ORACLE APPLICATION EXPRESS the performance team managed to provide the information required to optimise the giant multihull to the point that it not only beat the reigning America's Cup champions Alinghi in their giant Catamaran but resoundingly crushed them in a power display of high speed sailing. After two races - and two massive winning margins - the America's Cup was heading back to America - a triumph for the team, ORACLE and American technology."
--Ian Burns, Performance Director, BMW ORACLE Racing Team

America Cup Boat Blog pic.jpg

Visit the http://www.sail-world.com/USA/Americas-Cup:-Oracle-Data-Mining-supports-crew-and-BMW-ORACLE-Racing/68834 for pictures, videos and full information.

Wednesday Feb 10, 2010

Funny YouTube video that features Oracle Data Mining

[Read More]

Monday Jan 18, 2010

Fraud and Anomaly Detection Made Simple

Here is a quick and simple application for fraud and anomaly detection.  To replicate this on your own computer, download and install the Oracle Database 11g Release 1 or 2.  (See http://www.oracle.com/technology/products/bi/odm/odm_education.html for more information).  This small application uses the Automatic Data Preparation (ADP) feature that we added in Oracle Data Mining 11g.  Click here to download the CLAIMS data table.  [Download the .7z file and save it somwhere, unzip to a .csv file and then use SQL Developer data import wizard to import the claims.csv file into a table in the Oracle Database.]

First, we instantiate the ODM settings table to override the defaults.  The default value for Classification data mining function is to use our Naive Bayes algorithm, but since this is a different problem, looking for anomalous records amongst a larger data population, we want to change that to SUPPORT_VECTOR_MACHINES.  Also, as the 1-Class SVM does not rely on a Target field, we have to change that parameter to "null".  See http://download.oracle.com/docs/cd/B28359_01/datamine.111/b28129/anomalies.htm for detailed Documentation on ODM's anomaly detection.

drop table CLAIMS_SET;

exec dbms_data_mining.drop_model('CLAIMSMODEL');

create table CLAIMS_SET (setting_name varchar2(30), setting_value varchar2(4000));


insert into CLAIMS_SET values ('PREP_AUTO','ON');


Then, we run the dbms_data_mining.create_model function and let the in-database Oracle Data Mining algorithm run through the data, find patterns and relationships within the CLAIMS data, and infer a CLAIMS data mining model from the data.  


dbms_data_mining.create_model('CLAIMSMODEL', 'CLASSIFICATION',




After that, we can use the CLAIMS data mining model to "score" all customer auto insurance policies, sort them by our prediction_probability and select the top 5 most unusual claims.  

-- Top 5 most suspicious fraud policy holder claims

select * from

(select POLICYNUMBER, round(prob_fraud*100,2) percent_fraud,

rank() over (order by prob_fraud desc) rnk from

(select POLICYNUMBER, prediction_probability(CLAIMSMODEL, '0' using *) prob_fraud


where PASTNUMBEROFCLAIMS in ('2 to 4', 'more than 4')))

where rnk <= 5

order by percent_fraud desc;

Leave these results inside the database and you can create powerful dashboards using Oracle Business Intelligence EE (or any reporting or dashboard tool that can query the Oracle Database) that multiple ODM's probability of the record being anomalous times (x) the dollar amount of the claim, and then use stoplight color coding (red, orange, yellow) to flag only the more suspicious claims.  Very automated, very easy, and all inside the Oracle Database!

Powerful, Yet Simple: In-Database SQL Data Mining Functions

[Read More]

Thursday Jan 07, 2010

Welcome to Oracle Data Mining!

[Read More]

Everything about Oracle Data Mining, a component of the Oracle Advanced Analytics Option - News, Technical Information, Opinions, Tips & Tricks. All in One Place


« July 2016