By Charlie Berger, Advanced Analytics-Oracle on Mar 29, 2016
Part 1 (of a planned series of blog posts): Here are a few of my favorite Oracle Data Miner demo workflows. They all are simple, easy to create examples of data mining and predictive analytics using Oracle Advanced Analytics and SQL Developer's Oracle Data Miner extension.
Oracle Data Miner ships with some small datasets to get users started including INSUR_CUST_LTV_SAMPLE (1,015 records). While this tiny dataset doesn't bloat the SQL Developer download size and helps get Oracle Data Miner users quickly up and running, the data size is so small that the the resulting predictive models and insights can seem at times a bit trivial. Hence, I prefer to use larger files that ship with the Oracle 12c Database Sample Examples (SH.schema, MINING_DATA_BUILD, etc.) and this CUST_INSUR_LTV demo data:
CUST_INSUR_LTV.DMP (~25K records, ~25 attributes)
CUST_INSUR_LTV_APPLY.DMP (~25K records, ~25 attributes)
You can import the workflow and and datasets and everything should run.
This workflow includes an Explore node and Graph node that are typically used to visualize the data before performing data mining. The Explore node step is important to make sure the data you are about to analyze makes sense and seems accurate and reasonable. For example, AGE should all be positive numbers and range from 0 to say 100+.
The Column Filter node performs data profiling and data quality checks on the data and is also used to perform an Attribute Importance analysis to determine which attributes (or input variables) have the largest correlation with the target attribute (Buy_Insurance). Sometimes this step alone provides significant value to a company to better understand the key factors, but here, we're also using it to better understand which attributes have the most inpact on our business problem--targeting customers who are likley to buy insurance. Note: Each of the OAA/ODM algorithms have their own embedded attribute importance/feature selection capabilties and each can handle hundreds to thousands of input attributes. However, many times we want to get a feel for what's driving our business problem and learn where we could focus to pull in additional attributes and "engineered features" e.g "AGE/INCOME ratio" or "Maximum_Amount" etc..
We build four (4) Oracle Data Mining Classification models by default (Decision Tree, Naive Bayes, GLM Logistic Regression and Support Vector Machine (SVM)). For simplicity, we accept the ODMr defaults for Data Preparation and Algorithm Settings and can be assured that with Oracle Data Miner default settings, we should achieve a "good predictive model".
Decision Trees generally produce good predictive models and have the added benefit of being easy to understand. Notice the IF.... THEN... rules.
Lastly, we use the Apply node and our Classification node to make predictions on our CUST_INSUR_LTV_APPLY table and get our predictions.
The predictions and associated Prediction_Details are stored inside the Oracle Database and hence easily available for inclusion in any BI Dashboard or real-time application.
Oracle Data Miner generates the PL/SQL and SQL scripts for accelerating deploying analytical methodologies that leverage the scalability and infrstructure of the Oracle Database. See this Oracle Data Miner: Use Reposiitory APIs to Manage and Schedule Workflows to Run White Paper for more details on the many model deployment options.
Hope you enjoy!