Tuesday Jul 12, 2016

Mining Structured Data and Unstructured Data using Oracle Advanced Analytics 12c

Oracle Advanced Analytics (OAA) Database Option leverages Oracle Text, a free feature of the Oracle Database, to pre-process (tokenize) unstructured data for ingestion by the OAA data mining algorithms.  By moving, parallelized implementations of  machine learning algorithms inside the Oracle Database, data movement is eliminated and we can leverage other strengths of the Database such as Oracle Text (not to mention security, scalability, auditing, encryption, back up, high availability, geospatial data, etc.

This Mining Structured Data and Unstructured Data using Oracle Advanced Analytics 12c YouTube video presents an overview of the capabilities for combining and performing data mining on both structured and unstructured data.  The YouTube includes several quick demonstrations on classification and clustering using unstructured data and provides instructions and links on how to get started--either on premise or on the Oracle Cloud.  

I hope you find this helpful and a pleasure to watch.   

Presentation Slides.

You can also access similar YouTube videos at this Oracle Data Mining at the Movies blog posting.

Follow CharlieDataMine on Twitter.

Thanks for watching!


Sr. Dir. of Product Management, Oracle Advanced Analytics and Data Mining


Tuesday Mar 29, 2016

My Favoriate Oracle Data Miner Demo Workflows - Part 1 in a Series: CUST_INSUR_LTV

Part 1 (of a planned series of blog posts):  Here are a few of my favorite Oracle Data Miner demo workflows.  They all are simple, easy to create examples of data mining and predictive analytics using Oracle Advanced Analytics and SQL Developer's Oracle Data Miner extension.


Oracle Data Miner ships with some small datasets to get users started including INSUR_CUST_LTV_SAMPLE (1,015 records).  While this tiny dataset doesn't bloat the SQL Developer download size and helps get Oracle Data Miner users quickly up and running, the data size is so small that the the resulting predictive models and insights can seem at times a bit trivial.  Hence, I prefer to use larger files that ship with the Oracle 12c Database Sample Examples (SH.schema, MINING_DATA_BUILD, etc.) and this CUST_INSUR_LTV demo data:  

CUST_INSUR_LTV.DMP (~25K records, ~25 attributes)  

CUST_INSUR_LTV_APPLY.DMP (~25K records, ~25 attributes)  

Predicting Insurance Buyers ODMr workflow

You can import the workflow and and datasets and everything should run.  

This workflow includes an Explore node and Graph node that are typically used to visualize the data before performing data mining.  The Explore node step is important to make sure the data you are about to analyze makes sense and seems accurate and reasonable.  For example, AGE should all be positive numbers and range from 0 to say 100+.

The Column Filter node performs data profiling and data quality checks on the data and is also used to perform an Attribute Importance analysis to determine which attributes (or input variables) have the largest correlation with the target attribute (Buy_Insurance).  Sometimes this step alone provides significant value to a company to better understand the key factors, but here, we're also using it to better understand which attributes have the most inpact on our business problem--targeting customers who are likley to buy insurance.  Note:  Each of the OAA/ODM algorithms have their own embedded attribute importance/feature selection capabilties and each can handle hundreds to thousands of input attributes.  However, many times we want to get a feel for what's driving our business problem and learn where we could focus to pull in additional attributes and "engineered features" e.g "AGE/INCOME ratio" or "Maximum_Amount" etc..  

We build four (4) Oracle Data Mining Classification models by default (Decision Tree, Naive Bayes, GLM Logistic Regression and Support Vector Machine (SVM)).  For simplicity, we accept the ODMr defaults for Data Preparation and Algorithm Settings and can be assured that with Oracle Data Miner default settings, we should achieve a "good predictive model".

Decision Trees generally produce good predictive models and have the added benefit of being easy to understand. Notice the IF.... THEN... rules.

Lastly, we use the Apply node and our Classification node to make predictions on our CUST_INSUR_LTV_APPLY table and get our predictions.  

The predictions and associated Prediction_Details are stored inside the Oracle Database and hence easily available for inclusion in any BI Dashboard or real-time application.  

Oracle Data Miner generates the PL/SQL and SQL scripts for accelerating deploying analytical methodologies that leverage the scalability and infrstructure of the Oracle Database.  See this Oracle Data Miner: Use Reposiitory APIs to Manage and Schedule Workflows to Run White Paper for more details on the many model deployment options. 

Hope you enjoy!


Wednesday Dec 16, 2015

BIWA's Got Talent YouTube Demo Contest! - Enter and Win $500!!!

Best Oracle "Tech Stack" YouTube Demo Contest!

BIWA Wants YOU (Customers, Partners, Oracle Employees, whatever--everyone!) to post on YouTube one or multiple YouTube videos that highlight BIWA focused Oracle technologies/products/features or anything BIWA related!   See #BIWASGOTTALENT

Contest Details

Two categories

  • Customers, Partners, Students, Friends of BIWA--Anyone!
  • Oracle Employees--Note:  Any concerns about eligibility for Oracle employees is the responsibility of the employee

Judges will award points per the following scheme--MAX 100 points

  • Maximum of 40 points: Perception of usefulness and value added to the BIWA community, user or company
  • Maximum 25 points (5 points each): Each Oracle product or major feature highlighted e.g. 5 points for OAA, 5 points for Spatial, 5 points for OBIEE, BDA, BDD, etc.
  • Maximum of 10 points:  Completeness and clarity of associated documentation, reusable code, etc.
  • Maximum of 15 points:  Intangibles e.g. cleverness, sizzle, coolness, etc.--whatever excites and moves the judges
  • Maximum of 10 points: Most "likes" on YouTube

Each YouTube recorded "live" entry must include:

  • BIWAS GOT TALENT with BIWA Summit 2016 Logo (above on this page)
  • Title of your YouTube Video
  • Author(s), titles and contact information
  • Include #BIWASGOTTALENT in the meta information on YouTube
  • When submitting on YouTube and send an email to biwasgottalent@gmail.com with a link
  • Presentation must be not to exceed more than 10 minutes of YouTube video. Submissions longer than 10 minute will be severely penalized by the judges. :(

The top two presentations in each category will be shown at BIWA Summit 2016 in Redwood Shores, California, January 26-28, 2016

Winners will be chosen based on a combination of the number of points received from the judges.  Submitters are encouraged to promote their #BIWASGOTTALENT video to accumulate "likes".  Prize can be taken as cash or donation to charity.

Rules, Regulations and Other Details:

  • By submitting your entry, you agree that BIWA may use your submission for marketing or other purposes
  • The winner will be notified by email by January 28th, 2016 and does not have to be present at BIWA Summit 2016 to win

For questions, please email biwasgottalent@gmail.com

Wednesday Jul 15, 2015

Call for Abstracts at BIWA Summit'16 - The Oracle Big Data + Analytics User Conference

Please email shyamvaran@gmail.com with any questions regarding the submission process.

What Successes Can You Share?

We want to hear your story. Submit your proposal today for the Oracle BIWA Summit 2016.

Proposals will be accepted through Monday evening, November 2, 2015, at midnight, EST. Don’t wait, though—we’re accepting submissions on a rolling basis, so that selected sessions can be published early on our online agenda.

To submit your abstract, click here, select a track, fill out the form.

Please note:

  • Presentations must be noncommercial.
  • Sales promotions for products or services disguised as proposals will be eliminated. 
  • Speakers whose abstracts are accepted will be expected to submit (at a later date) a PowerPoint presentation slide set. 
  • Accompanying technical and use case papers are encouraged, but not required.

Speakers whose abstracts are accepted will be given a complimentary registration to the conference. (Any additional co-presenters must register for the event separately and provide appropriate registration fees. It is up to the co-presenters’ discretion which presenter to designate for the complimentary registration.) 

This Year’s Tracks

Proposals can be submitted for the following tracks: 

More About the Conference

The Oracle BIWA Summit 2016 is organized and managed by the Oracle BIWA SIG, the Oracle Spatial SIG, and the Oracle Northern California User Group. The event attracts top BI, data warehousing, analytics, Spatial, IoT and Big Data experts.

The three-day event includes keynotes from industry experts, educational sessions, hands-on labs, and networking events.

Hot topics include: 

  • Database, data warehouse and cloud, Big Data architecture
  • Deep dives and hands-on labs on existing Oracle BI, data warehouse, and analytics products
  • Updates on the latest Oracle products and technologies (e.g. Big Data Discovery, Oracle Visual Analyzer, Oracle Big Data SQL)
  • Novel and interesting use cases on everything – Spatial, Graph, Text, Data Mining, IoT, ETL, Security, Cloud
  • Working with Big Data (e.g., Hadoop, "Internet of Things,” SQL, R, Sentiment Analysis)
  • Oracle Business Intelligence (OBIEE), Oracle Big Data Discovery, Oracle Spatial, and Oracle Advanced Analytics—Better Together

Hope to see you at BIWA'16 in January, 2016!


Monday May 04, 2015

Oracle Data Miner 4.1, SQL Developer 4.1 Extension Now Available!

To download, visit:  


New Data Miner Features in SQL Developer 4.1

These new Data Miner 4.1 features are supported for database versions supported by Oracle Data Miner: 
JSON Data Support for Oracle Database and above

In response to the growing popularity of JSON data and its use in Big Data configurations, Data Miner now provides an easy to use JSON Query node. The JSON Query node allows you to select and aggregate JSON data without entering any SQL commands. The JSON Query node opens up using all of the existing Data Miner features with JSON data. The enhancements include:

Data Source Node
o    Automatically identifies columns containing JSON data by identifying those with the IS_JSON constraint.
o    Generates JSON schema for any selected column that contain JSON data.
o    Imports a JSON schema for a given column.
o    JSON schema viewer.

Create Table Node
o    Ability to select a column to be typed as JSON.
o    Generates JSON schema in the same manner as the Data Source node.

JSON Data Type
o    Columns can be specifically typed as JSON data.

JSON Query Node (see related JSON node blog posting)
o    Ability to utilize any of the selection and aggregation features without having to enter SQL commands.
o    Ability to select data from a graphical layout of the JSON schema, making data selection as easy as it is with scalar relational data columns.
o    Ability to partially select JSON data as standard relational scalar data while leaving other parts of the same JSON document as JSON data.
o    Ability to aggregate JSON data in combination with relational data. Includes the Sub-Group By option, used to generate nested data that can be passed into mining model build nodes. 

General Improvements
o    Improved database session management resulting in less database sessions being generated and a more responsive user interface.
o    Filter Columns Node - Combined primary Editor and associated advanced panel to improve usability.
o    Explore Data Node - Allows multiple row selection to provide group chart display.
o    Classification Build Node - Automatically filters out rows where the Target column contains NULLs or all Spaces. Also, issues a warning to user but continues with Model build.
o    Workflow - Enhanced workflows to ensure that Loading, Reloading, Stopping, Saving operations no longer block the UI.
o    Online Help - Revised the Online Help to adhere to topic-based framework.

Selected Bug Fixes (does not include 4.0 patch release fixes)
o    GLM Model Algorithm Settings: Added GLM feature identification sampling option (Oracle Database 12.1 and above).
o    Filter Rows Node: Custom Expression Editor not showing all possible available columns.
o    WebEx Display Issues: Fixed problems affecting the display of the Data Miner UI through WebEx conferencing.

For More Information and Support, please visit the Oracle Data Mining Discussion Forum on the Oracle Technology Network (OTN)

Return to Oracle Data Miner page on OTN

Tuesday Jan 01, 2013

Turkcell Combats Pre-Paid Calling Card Fraud Using In-Database Oracle Advanced Analytics

Turkcell İletişim Hizmetleri A.S. Successfully Combats Communications Fraud with Advanced In-Database Analytics

[Original link available on oracle.com http://www.oracle.com/us/corporate/customers/customersearch/turkcell-1-exadata-ss-1887967.html]

Turkcell İletişim Hizmetleri A.Ş. is a leading provider of mobile communications in Turkey with more than 34 million subscribers. Established in 1994, Turkcell created the first global system for a mobile communications (GSM) network in Turkey. It was the first Turkish company listed on the New York Stock Exchange.

Communications fraud, or the  use of telecommunications products or services without intention to pay, is a major issue for the organization. The practice is fostered by prepaid card usage, which is growing rapidly. Anonymous network-branded prepaid cards are a tempting vehicle for money launderers, particularly since these cards can be used as cash vehicles—for example, to withdraw cash at ATMs. It is estimated that prepaid card fraud represents an average loss of US$5 per US$10,000 in transactions. For a communications company with billions of transactions, this could result in millions of dollars lost through fraud every year.

Consequently, Turkcell wanted to combat communications fraud and money laundering by introducing advanced analytical solutions to monitor key parameters of prepaid card usage and issue alerts or block fraudulent activity. This type of fraud prevention would require extremely fast analysis of the company’s one petabyte of uncompressed customer data to identify patterns and relationships, build predictive models, and apply those models to even larger data volumes to make accurate fraud predictions.

To achieve this, Turkcell deployed Oracle Exadata Database Machine X2-2 HC Full Rack, so that data analysts can build predictive antifraud models inside the Oracle Database and deploy them into Oracle Exadata for scoring, using Oracle Data Mining, a component of Oracle Advanced Analytics, leveraging Oracle Database11g technology. This enabled the company to create predictive antifraud models faster than with any other machine, as models can be built using search and query language (SQL) inside the database, and Oracle Exadata can access raw data without summarized tables, thereby achieving extremely fast analyses.


A word from Turkcell İletişim Hizmetleri A.Ş.

“Turkcell manages 100 terabytes of compressed data—or one petabyte of uncompressed raw data—on Oracle Exadata. With Oracle Data Mining, a component of the Oracle Advanced Analytics Option, we can analyze large volumes of customer data and call-data records easier and faster than with any other tool and rapidly detect and combat fraudulent phone use.” – Hasan Tonguç Yılmaz, Manager, Turkcell İletişim Hizmetleri A.Ş.

  • Combat communications fraud and money laundering by introducing advanced analytical solutions to monitor prepaid card usage and alert or block suspicious activity
  • Monitor numerous parameters for up to 10 billion daily call-data records and value-added service logs, including the number of accounts and cards per customer, number of card loads per day, number of account loads over time, and number of account loads on a subscriber identity module card at the same location
  • Enable extremely fast sifting through huge data volumes to identify patterns and relationships, build predictive antifraud models, and apply those models to even larger data volumes to make accurate fraud predictions
  • Detect fraud patterns as soon as possible and enable quick response to minimize the negative financial impact


Oracle Product and Services

  • Used Oracle Exadata Database Machine X2-2 HC Full Rack to create predictive antifraud models more quickly than with previous solutions by accessing raw data without summarized tables and providing unmatched query speed, which optimizes and shortens the project design phases for creating predictive antifraud models
  • Leveraged SQL for the preparation and transformation of one petabyte of uncompressed raw communications data, using Oracle Data Mining, a feature of Oracle Advanced Analytics to increase the performance of predictive antifraud models
  • Deployed Oracle Data Mining models on Oracle Exadata to identify actionable information in less time than traditional methods—which would require moving large volumes of customer data to a third-party analytics software—and achieve an average gain of four hours and more, taking into consideration the absence of any system crash (as occurred in the previous environment) during data import
  • Achieved extreme data analysis speed with in-database analytics performed inside Oracle Exadata, through a row-wise information search—including day, time, and duration of calls, as well as number of credit recharges on the same day or at the same location—and query language functions that enabled analysts to detect fraud patterns almost immediately
  • Implemented a future-proof solution that could support rapidly growing data volumes that tend to double each year with Oracle Exadata’s massively scalable data warehouse performance

Why Oracle

“We selected Oracle because in-database mining to support antifraud efforts will be a major focus for Turkcell in the future. With Oracle Exadata Database Machine and the analytics capabilities of Oracle Advanced Analytics, we can complete antifraud analysis for large amounts of call-data records in just a few hours. Further, we can scale the solution as needed to support rapid communications data growth,” said Hasan Tonguç Yılmaz, datawarehouse/data mining developer, Turkcell Teknoloji Araştırma ve Geliştirme A.Ş.


Oracle Partner: Turkcell Teknoloji Araştırma ve Geliştirme A.Ş.

All development and test processes were performed by Turkcell Teknoloji. The company also made significant contributions to the configuration of numerous technical analyses which are carried out regularly by Turkcell İletişim Hizmetleri's antifraud specialists.


<script type="text/javascript"> var _gaq = _gaq || []; _gaq.push(['_setAccount', 'UA-46756583-1']); _gaq.push(['_trackPageview']); (function() { var ga = document.createElement('script'); ga.type = 'text/javascript'; ga.async = true; ga.src = ('https:' == document.location.protocol ? 'https://ssl' : 'http://www') + '.google-analytics.com/ga.js'; var s = document.getElementsByTagName('script')[0]; s.parentNode.insertBefore(ga, s); })(); </script>

Thursday May 10, 2012

Oracle Virtual SQL Developer Days DB May 15th - Session #3: 1Hr. Predictive Analytics and Data Mining Made Easy!


Oracle Data Mining's SQL Developer based ODM'r GUI + ODM is being featured in this upcoming Virtual SQL Developer Day online event next Tuesday, May 15th.  Several thousand people have already registered and registration is still growing.  We recorded and uploaded presentations/demos and then anyone can view them "on demand", but at the specified date/time per the SQL DD event agenda.  Anyone can also download a complete 11gR2 Database w/ SQL Developer 3.1 & Oracle Data Miner GUI extension VM installation for the Hands-on Labs and follow our 4 ODM Oracle by Examples e-training.  We moderators monitor the online chat and answer questions. 
Session #3: 1Hr. Predictive Analytics and Data Mining Made Easy!
Oracle Data Mining, a component of the Oracle Advanced Analytics database option, embeds powerful data mining algorithms in the SQL kernel of the Oracle Database for problems such as customer churn, predicting customer behavior, up-sell and cross-sell, detecting fraud, market basket analysis (e.g. beer & diapers), customer profiling and customer loyalty. Oracle Data Miner, SQL Developer 3.1 extension, provides data analysts a “workflow” paradigm to build analytical methodologies to explore data and build, evaluate and apply data mining models—all while keeping the data inside the Oracle Database. This workshop will teach the student the basics of getting started using Oracle Data Mining.
We're also included in the June 7th physical event in NYC and future virtual and physical events.  Great event(s) and great "viz" for OAA/ODM.


Tuesday Mar 22, 2011

OpenWorld 2011 Call for Papers: Deadline March 27

[Read More]

Everything about Oracle Data Mining, a component of the Oracle Advanced Analytics Option - News, Technical Information, Opinions, Tips & Tricks. All in One Place


« October 2016