Sunday Jul 26, 2015

Big Data Analytics with Oracle Advanced Analytics: Making Big Data and Analytics Simple white paper

Big Data Analytics with Oracle Advanced Analytics:

Making Big Data and Analytics Simple

Oracle White Paper  |  July 2014 

Executive Summary:  Big Data Analytics with Oracle Advanced Analytics

(Click HERE to read entire Oracle white paper)   (Click HERE to watch YouTube video)

The era of “big data” and the “cloud” are driving companies to change.  Just to keep pace, they must learn new skills and implement new practices that leverage those new data sources and technologies.  Increasing customer expectations from sharing their digital exhaust with corporations in exchange for improved customer interactions and greater perceived value are pushing companies forward.  Big data and analytics offer the promise to satisfy these new requirements.  Cloud, competition, big data analytics and next-generation “predictive” applications are driving companies towards achieving new goals of delivering improved “actionable insights” and better outcomes.  Traditional BI & Analytics approaches don’t deliver these detailed predictive insights and simply can’t satisfy the emerging customer expectations in this new world order created by big data and the cloud.

Unfortunately, with big data, as the data grows and expands in the three V’s; velocity, volume and variety (data types), new problems emerge.  Data volumes grow and data becomes unmanageable and immovable.  Scalability, security, and information latency become new issues.  Dealing with unstructured data, sensor data and spatial data all introduce new data type complexities.  

Traditional advanced analytics has several information technology inherent weak points: data extracts and data movement, data duplication resulting in no single-source of truth, data security exposures, separate and many times, depending on the skills of the data analysts/scientists involved, multiple analytical tools (commercial and open source) and languages (SAS, R, SQL, Python, SPSS, etc.).  Problems become particularly egregious during a deployment phase when the worlds of data analysis and information management collide.   

Traditional data analysis typically starts with a representative sample or subset of the data that is exported to separate analytical servers and tools (SAS, R, Python, SPSS, etc.) that have been especially designed for statisticians and data scientists to analyze data.  The analytics they perform range from simple descriptive statistical analysis to advanced, predictive and prescriptive analytics.  If a data scientist builds a predictive model that is determined to be useful and valuable, then IT needs to be involved to figure out deployment and enterprise deployment and application integration issues become the next big challenge. The predictive model(s)—and all its associated data preparation and transformation steps—have to be somehow translated to SQL and recreated inside the database in order to apply the models and make predictions on the larger datasets maintained inside the data warehouse.  This model translation phase introduces tedious, time consuming and expensive manual coding steps from the original statistical language (SAS, R, and Python) into SQL.  DBAs and IT must somehow “productionize” these separate statistical models inside the database and/or data warehouse for distribution throughout the enterprise.  Some vendors will charge for specialized products and options for just for predictive model deployment.  This is where many advanced analytics projects fail.  Add Hadoop, sensor data, tweets, and expanding big data reservoirs and the entire “data to actionable insights” process becomes more challenging.  

Not with Oracle.  Oracle delivers a big data and analytics platform that eliminates the traditional extract, move, load, analyze, export, move load paradigm.  With Oracle Database 12c and the Oracle Advanced Analytics Option, big data management and big data analytics are designed into the data management platform from the beginning.  Oracle’s multiple decades of R&D investment in developing the industry’s leading data management platform, Oracle SQL, Big Data SQL, Oracle Exadata, Oracle Big Data Appliance and integration with open source R are seamlessly combined and integrated into a single platform—the Oracle Database.  

Oracle’s vision is a big data and analytic platform for the era of big data and cloud to:

  • Make big data and analytics simple (for any data size, on any computer infrastructure and any variety of data, in any combination) and

  • Make big data and analytics deployment simple (as a service, as a platform, as an application)

Oracle Advanced Analytics offers a wide library of powerful in-database algorithms and integration with open source R that together can solve a wide variety of business problems and can be accessed via SQL, R or GUI.  Oracle Advanced Analytics, an option to the Oracle Database Enterprise Edition 12c, extends the database into an enterprise-wide analytical platform for data-driven problems such as churn prediction, customer segmentation, fraud and anomaly detection, identifying cross-sell and up-sell opportunities, market basket analysis, and text mining and sentiment analysis.  Oracle Advanced Analytics empowers data analyst, data scientists and business analysts to more extract knowledge, discover new insights and make informed predictions—working directly with large data volumes in the Oracle Database.   

Data analysts/scientists have choice and flexibility in how they interact with Oracle Advanced Analytics.  Oracle Data Miner is an Oracle SQL Developer extension designed for data analysts that provides an easy to use “drag and drop” workflow GUI to the Oracle Advanced Analytics SQL data mining functions (Oracle Data Mining).  Oracle SQL Developer is a free integrated development environment that simplifies the development and management of Oracle Database in both traditional and Cloud deployments. When Oracle Data Miner users are satisfied with their analytical methodologies, they can share their workflows with other analysts and/or generate SQL scripts to hand to their DBAs to accelerate model deployment.  Oracle Data Miner also provides a PL/SQL API for workflow scheduling and automation.  

R programmers and data scientists can use the familiar open source R statistical programming language console, RStudio or any IDE to work directly with data inside the database and leverage Oracle Advanced Analytics’ R integration with the database (Oracle R Enterprise).  Oracle Advanced Analytics’ Oracle R Enterprise provides transparent SQL to R translation to equivalent SQL and Oracle Data Mining functions for in-database performance, parallelism, and scalability—this making R ready for the enterprise.  

Application developers, using the ODM SQL data mining functions and ORE R integration can build completely automated predictive analytic solutions that leverage the strengths of the database and the flexibly of R to integrate Oracle Advanced Analytics analytical solutions into BI dashboards and enterprise applications.

By integrating big data management and big data analytics into the same powerful Oracle Database 12c data management platform, Oracle eliminates data movement, reduces total cost of ownership and delivers the fastest way to deliver enterprise-wide predictive analytics solutions and applications.  

(Click HERE to read entire Oracle white paper)

Friday Jul 24, 2015

2015 BIWA SIG Virtual Conference - Two Days of "Live" Talks by Experts - FREE

2015 BIWA SIG Virtual Conference

July 30-31, 2015 9:00 a.m. - 1:00 p.m. CDT

Join us for two full days where you will hear about the latest Business Intelligence trends. 

Day One:

  • 9:00 a.m. - 10:00 a.m.: What’s new in Oracle EPM and BI Infrastructure - Eric Helmer, ADI Strategies

Hyperion EPM abd BI Fusion edition is a dramatic change under the covers. Corporations must consider more globalapproaches to infrastructure to maintain availability and performance while reducing footprint and cost. Technologies such as Exalytics, Oracle virtualization, cloud computing, software as a service, etc and open source operating systems (Linux) are more commonplace. Join Oracle Are Director Eric Helmer as he covers what’s new, what’s supported, and what options you have when implementing your EPM/BI project.

  • 10:00 a.m. - 11:00 a.m.Italian Ministry of Labor & Social Policy -- A Journey to Digital Government - Nicola Sandoli, ICONSULTING

The Italian Ministry of Labor and Social Policy (MLPS) is a branch of the Italian government responsible for all labormatters, including employment policies, promotions, worker protection, and social security. In its evolution towards a digital government, MLPS is streamlining and simplifying its administrative processes. MLPS has embarked on a data-driven journey to redefine business models and interactions with citizens – and optimize and transform government services. MLPS is focusing on four areas: - Information delivery: transitioning its data warehouse platform from reporting to centralizing and certifying data - Business Intelligence: monitoring activities, web publishing, and analyzing socio–political impact - Web analytics and semantic intelligence: interacting more efficiently with citizens - Job-hunting online guidance services: real time answers to young people looking for jobs MLPS is using a wide range of Oracle technologies to manage large amounts of diverse data, and apply advanced analytics, including - Oracle Exalytics for daily updates of 5TB of data - Oracle Spatial and Graph and MapViewer 11g for location intelligence capabilities - Oracle Business Intelligence for desktop and mobile reporting - Oracle Endeca Information Discovery for web analytics, data discovery, and data analysis using social and semantic intelligence - Oracle Real-Time Decisions - Oracle Service-Oriented Architecture Suite: central point for accessing and managing information made available through the Ministry web portal Cliclavoro Learn more about MLPS and its innovative platform that is delivering better information and services to their constituents.

  • 11:00 a.m. - 12:00 p.m.Exadata:  Elastic Configurations and IaaS – Private Cloud - Amit Kanda, Oracle

Customers are faced with challenges in their business, which include taking real time data driven decisions and  reducing costs.  Exadata’s extreme performance combined with Database In-Memory answer the real time data driven decisions. Elastic configurations and an updated subscription model (IaaS – Private Cloud) for Exadata  hardware and software accompanied the launch of Exadata X5–2.  This presentation will describe these updates and how customers can start small with Exadata and grow Exadata with their business – making it easier to reach business objectives.

  • 12:00 p.m. - 1:00 p.m.The State of Internet of Things (IoT) - Shyam Varan Nath, GE

The Internet of Things or IoT is poised to have a tremendous amount of impact around us. This session will look at  the industry landscape of IoT. The different flavors of IoT will be discussed with use cases from the consumer,  commercial and industrial sectors. Learn about the edge and cloud computing platforms to power the IoT solutions.  Finally, walk-thru of use-cases that show how machine/sensor data is being monetized through analytics. Such use  cases will span Aviation and other industries.


Day Two:

  • 9:00 a.m. - 10:00 a.m.: Big Data Analytics with Oracle Advanced Analytics 12c and Big Data SQL - Charlie Berger, Oracle

Oracle Advanced Analytics 12c, delivers parallelized in-database implementations of data mining algorithms andintegration with R. Data analysts use Oracle Data Miner GUI and R to build and evaluate predictive models and leverage R packages and graphs. Application developers deploy OAA models using SQL data mining functions and R. Oracle extends the Database to an analytical platform that mines more data and more data types, eliminates data movement and preserves security to automatically detect patterns and anticipate customer behavior and deliver actionable insights. Oracle Big Data SQL adds new big data sources and ORAAH provides algorithms that run on Hadoop. Come learn what’s new, best practices, and hear customer examples.

  • 10:00 a.m. - 11:00 a.m.: Graph Data Management and Analytics for Big DataBill Beauregard, Oracle & Zhe Wu, Oracle

The newest Oracle big data product, Oracle Big Data Spatial and Graph, offers a set of spatial analytic services, and a graph database with rich graph analytics that support big data workloads on Apache Hadoop and NoSQL technologies. Oracle is applying over a decade of expertise with spatial and graph analytic technologies to big data architectures. Graphs are an important data model for big data systems. Property graphs can be used for discovery, for instance, to discover underlying communities and influencers within a social graph, relationships and connections in cyber security networks, and to generate recommendations based on interests, profiles, and past behaviors. Oracle Big Data Spatial and Graph provides optimized storage, search and querying in Oracle NoSQL Database and Apache HBase for distributed property graphs. It offers 35 built-in, in-memory, parallel property graph analytic functions. We will discuss use cases, features, architecture, and show a demo. Learn how developers and data scientists can manage their most challenging graph data processing in a single enterprise-class Big Data platform.

  • 11:00 a.m. - 12:00 p.m.Why Oracle Database In-Memory?  Use Cases and Overview - Andy Rivenes, Oracle

Oracle recently announced the availability of the Oracle Database In-Memory option, a memory-optimized database technology that transparently adds real-time analytics to applications. Because the In-Memory option is 100% compatible with existing Oracle Database applications, it’s easy to integrate it into your environment and to begin reaping the benefits. But how do you get started with it? What do you need to know to take full advantage of this new functionality? This session will give an overview of what Oracle Database In-Memory is and then discuss some use cases to highlight how it can be used.

| Register Here |


Wednesday Jul 15, 2015

Call for Abstracts at BIWA Summit'16 - The Oracle Big Data + Analytics User Conference


Please email shyamvaran@gmail.com with any questions regarding the submission process.

What Successes Can You Share?

We want to hear your story. Submit your proposal today for the Oracle BIWA Summit 2016.

Proposals will be accepted through Monday evening, November 2, 2015, at midnight, EST. Don’t wait, though—we’re accepting submissions on a rolling basis, so that selected sessions can be published early on our online agenda.

To submit your abstract, click here, select a track, fill out the form.

Please note:

  • Presentations must be noncommercial.
  • Sales promotions for products or services disguised as proposals will be eliminated. 
  • Speakers whose abstracts are accepted will be expected to submit (at a later date) a PowerPoint presentation slide set. 
  • Accompanying technical and use case papers are encouraged, but not required.

Speakers whose abstracts are accepted will be given a complimentary registration to the conference. (Any additional co-presenters must register for the event separately and provide appropriate registration fees. It is up to the co-presenters’ discretion which presenter to designate for the complimentary registration.) 

This Year’s Tracks

Proposals can be submitted for the following tracks: 

More About the Conference

The Oracle BIWA Summit 2016 is organized and managed by the Oracle BIWA SIG, the Oracle Spatial SIG, and the Oracle Northern California User Group. The event attracts top BI, data warehousing, analytics, Spatial, IoT and Big Data experts.

The three-day event includes keynotes from industry experts, educational sessions, hands-on labs, and networking events.

Hot topics include: 

  • Database, data warehouse and cloud, Big Data architecture
  • Deep dives and hands-on labs on existing Oracle BI, data warehouse, and analytics products
  • Updates on the latest Oracle products and technologies (e.g. Big Data Discovery, Oracle Visual Analyzer, Oracle Big Data SQL)
  • Novel and interesting use cases on everything – Spatial, Graph, Text, Data Mining, IoT, ETL, Security, Cloud
  • Working with Big Data (e.g., Hadoop, "Internet of Things,” SQL, R, Sentiment Analysis)
  • Oracle Business Intelligence (OBIEE), Oracle Big Data Discovery, Oracle Spatial, and Oracle Advanced Analytics—Better Together

Hope to see you at BIWA'16 in January, 2016!

Charlie

Monday May 04, 2015

Oracle Data Miner 4.1, SQL Developer 4.1 Extension Now Available!

To download, visit:  

http://www.oracle.com/technetwork/developer-tools/sql-developer/overview/index-097090.html

New Data Miner Features in SQL Developer 4.1

These new Data Miner 4.1 features are supported for database versions supported by Oracle Data Miner: 
JSON Data Support for Oracle Database 12.1.0.2 and above

In response to the growing popularity of JSON data and its use in Big Data configurations, Data Miner now provides an easy to use JSON Query node. The JSON Query node allows you to select and aggregate JSON data without entering any SQL commands. The JSON Query node opens up using all of the existing Data Miner features with JSON data. The enhancements include:

Data Source Node
o    Automatically identifies columns containing JSON data by identifying those with the IS_JSON constraint.
o    Generates JSON schema for any selected column that contain JSON data.
o    Imports a JSON schema for a given column.
o    JSON schema viewer.

Create Table Node
o    Ability to select a column to be typed as JSON.
o    Generates JSON schema in the same manner as the Data Source node.

JSON Data Type
o    Columns can be specifically typed as JSON data.

JSON Query Node (see related JSON node blog posting)
o    Ability to utilize any of the selection and aggregation features without having to enter SQL commands.
o    Ability to select data from a graphical layout of the JSON schema, making data selection as easy as it is with scalar relational data columns.
o    Ability to partially select JSON data as standard relational scalar data while leaving other parts of the same JSON document as JSON data.
o    Ability to aggregate JSON data in combination with relational data. Includes the Sub-Group By option, used to generate nested data that can be passed into mining model build nodes. 

General Improvements
o    Improved database session management resulting in less database sessions being generated and a more responsive user interface.
o    Filter Columns Node - Combined primary Editor and associated advanced panel to improve usability.
o    Explore Data Node - Allows multiple row selection to provide group chart display.
o    Classification Build Node - Automatically filters out rows where the Target column contains NULLs or all Spaces. Also, issues a warning to user but continues with Model build.
o    Workflow - Enhanced workflows to ensure that Loading, Reloading, Stopping, Saving operations no longer block the UI.
o    Online Help - Revised the Online Help to adhere to topic-based framework.

Selected Bug Fixes (does not include 4.0 patch release fixes)
o    GLM Model Algorithm Settings: Added GLM feature identification sampling option (Oracle Database 12.1 and above).
o    Filter Rows Node: Custom Expression Editor not showing all possible available columns.
o    WebEx Display Issues: Fixed problems affecting the display of the Data Miner UI through WebEx conferencing.


For More Information and Support, please visit the Oracle Data Mining Discussion Forum on the Oracle Technology Network (OTN)

Return to Oracle Data Miner page on OTN

Wednesday Oct 08, 2014

2014 was a very good year for Oracle Advanced Analytics at Oracle Open World 2014

2014 was a very good year for Oracle Advanced Analytics at Oracle Open World 2014.   We had a number of customer, partner and Oracle talks that focused on the Oracle Advanced Analytics Database Option.    See below with links to presentations.  Check back later to OOW Sessions Content Catalog as not all presentations have been uploaded yet.  :-(

Big Data and Predictive Analytics: Fiserv Data Mining Case Study [CON8631]

Moving data mining algorithms to run as native data mining SQL functions eliminates data movement, automates knowledge discovery, and accelerates the transformation of large-scale data to actionable insights from days/weeks to minutes/hours. In this session, Fiserv, a leading global provider of electronic commerce systems for the financial services industry, shares best practices for turning in-database predictive models into actionable policies and illustrates the use of Oracle Data Miner for fraud prevention in online payments. Attendees will learn how businesses that implement predictive analytics in their production processes significantly improve profitability and maximize their ROI.

Developing Relevant Dining Visits with Oracle Advanced Analytics at Olive Garden [CON2898]

Olive Garden, traditionally managing its 830 restaurants nationally, transitioned to a localized approach with the help of predictive analytics. Using k-means clustering and logistic classification algorithms, it divided its stores into five behavioral segments. The analysis leveraged Oracle SQL Developer 4.0 and Oracle R Enterprise 1.3 to evaluate 115 million transactions in just 5 percent the time required by the company’s BI tool. While saving both time and money by making it possible to develop the solution internally, this analysis has informed Olive Garden’s latest remodel campaign and continues to uncover millions in profits by optimizing pricing and menu assortment. This session illustrates how Oracle Advanced Analytics solutions directly affect the bottom line.

A Perfect Storm: Oracle Big Data Science for Enterprise R and SAS Users [CON8331]

With the advent of R and a rich ecosystem of users and developers, a myriad of bloggers, and thousands of packages with functionality ranging from social network analysis and spatial data analysis to empirical finance and phylogenetics, use of R is on a steep uptrend. With new R tools from Oracle, including Oracle R Enterprise, Oracle R Distribution, and Oracle R Advanced Analytics for Hadoop, users can scale and integrate R for their enterprise big data needs. Come to this session to learn about Oracle’s R technologies and what data scientists from smart companies around the world are doing with R.

Extending the Power of In-Database Analytics with Oracle Big Data Appliance [CON2452]

The need for speed could not be greater—not speed of processing but time to market. The problem is driven by the long journey data takes before evolving into insight. Insight, however, is always relative to assumption. In fact, analytics is often seen as a battle between assumption and data. Assumptions can be classified into three types: related to distributions, ratios, and relations. In this session, you will see how the most-valuable business insights can come in the matter of hours, not months, when assumptions are challenged with data. This is made possible by the integration of Oracle Big Data Appliance, enabling transparent access to in-database analytics from the data warehouse and avoiding the traditional long journey of data to insight.

Market Basket Analysis at Dunkin’ Brands [CON6545]

With almost 120 years of franchising experience, Dunkin’ Brands owns two of the world’s most recognized, beloved franchises: Dunkin’ Donuts and Baskin-Robbins. This session describes a market basket analysis solution built from scratch on the Oracle Advanced Analytics platform at Dunkin’ Brands. This solution enables Dunkin’ to look at product affinity and a host of associated sales metrics with a view to improving promotional effectiveness and cross-sell/up-sell to increase customer loyalty. The presentation discusses the business value achieved and technical challenges faced in scaling the solution to Dunkin’ Brands’ transaction volumes, including engineered systems (Oracle Exadata) hardware and parallel processing at the core of the implementation.

Predictive Analytics with Oracle Data Mining [CON8596]

This session presents three case studies related to predictive analytics with the Oracle Data Mining feature of Oracle Advanced Analytics. Service contracts cancellation avoidance with Oracle Data Mining is about predicting the contracts at risk of cancellation at least nine months in advance. Predicting hardware opportunities that have a high likelihood of being won means identifying such opportunities at least four months in advance to provide visibility into suppliers of required materials. Finally, predicting cloud customer churn involves identifying the customers that are not as likely to renew subscriptions as others.

SQL Is the Best Development Language for Big Data [CON7439]

SQL has a long and storied history. From the early 1980s till today, data processing has been dominated by this language. It has changed and evolved greatly over time, gaining features such as analytic windowing functions, model clauses, and row-pattern matching. This session explores what's new in SQL and Oracle Database for exploiting big data. You'll see how to use SQL to efficiently and effectively process data that is not stored directly in Oracle Database.

Advanced Predictive Analytics for Database Developers on Oracle [CON7977]

Traditional database applications use SQL queries to filter, aggregate, and summarize data. This is called descriptive analytics. The next level is predictive analytics, where hidden patterns are discovered to answer questions that give unique insights that cannot be derived with descriptive analytics. Businesses are increasingly using machine learning techniques to perform predictive analytics, which helps them better understand past data, predict future trends, and enable better decision-making. This session discusses how to use machine learning algorithms such as regression, classification, and clustering to solve a few selected business use cases.

What Are They Thinking? With Oracle Application Express and Oracle Data Miner [UGF2861]

Have you ever wanted to add some data science to your Oracle Application Express applications? This session shows you how you can combine predictive analytics from Oracle Data Miner into your Oracle Application Express application to monitor sentiment analysis. Using Oracle Data Miner features, you can build data mining models of your data and apply them to your new data. The presentation uses Twitter feeds from conference events to demonstrate how this data can be fed into your Oracle Application Express application and how you can monitor sentiment with the native SQL and PL/SQL functions of Oracle Data Miner. Oracle Application Express comes with several graphical techniques, and the presentation uses them to create a sentiment dashboard.

Transforming Customer Experience with Big Data and Predictive Analytics [CON8148]

Delivering a high-quality customer experience is essential for long-term profitability and customer retention in the communications industry. Although service providers own a wealth of customer data within their systems, the sheer volume and complexity of the data structures inhibit their ability to extract the full value of the information. To change this situation, service providers are increasingly turning to a new generation of business intelligence tools. This session begins by discussing the key market challenges for business analytics and continues by exploring Oracle’s approach to meeting these challenges, including the use of predictive analytics, big data, and social network analytics.

There are a few others where Oracle Advanced Analytics is included e.g. Retail GBU, Big Data Strategy, etc. but they are typically more broadly focused.  If you search the Content Catalog for “Advanced Analytics” etc. you can find other related presentations that involve OAA.

Hope this helps.  Enjoy!

cb

Sunday May 18, 2014

Oracle Data Miner and Oracle R Enterprise Integration - Watch Demo

Oracle Data Miner and Oracle R Enterprise Integration - Watch Demo

Oracle Advanced Analytics (Database EE) Option turns the database into an enterprise-wide analytical platform that can quickly deliver enterprise-wide predictive analytics and actionable insights.  Oracle Advanced Analytics is comprised of both the Oracle Data Mining SQL data mining functions, Oracle Data Miner, an extension to SQL Developer that exposes the data mining SQL functions for data analysts, and Oracle R Enterprise which integrates the R statistical programming language with SQL.  15 powerful in-database SQL data mining functions, the SQL Developer/Oracle Data Miner workflow GUI and the ability to integrate open source R within an analytical methodology, makes the Oracle Database + Oracle Advanced Analytics Option the ideal platform for building and deploying enterprise-wide predictive analytics applications/solutions.  

In Oracle Data Miner 4.0 we added a new SQL Query node to allow users to insert arbitrary SQL scripts within an ODMr analytical workflow. Additionally, the SQL Query node allows users to leverage registered R scripts to extend Oracle Data Miner's analytical capabilities.  For applications that are mostly OAA/Oracle Data Mining SQL data mining functions based but require additional analytical techniques found in the R community, this is an ideal method for integrating the power of in-database SQL analytical and data mining functions with the flexibility of open source R.  For applications that are built entirely using the R statistical programming language, it may be more practical to stay within the R console or RStudio environments, but for SQL-centric in-database predictive methodologies, this integration is just what might satisfy your needs.

Watch this Oracle Data Miner and Oracle R Enteprise Integration YouTube to see the demo. 

There is an excellent related Oracle Data Miner:  Integrate Oracle R Enterprise Algorithms into workflow using the SQL Query node (pdf, companion files) white paper on this topic that includes examples on the Oracle Technology Network in the Oracle Data Mining pages.  

Wednesday Feb 26, 2014

How to generate training and test dataset using SQL Query node in Data Miner

Overview

In Data Miner, the Classification and Regression Build nodes include a process that splits the input dataset into training and test dataset internally, which are then used by the model build and test processes within the nodes. This internal data split feature alleviates user from performing external data split, and then tie the split dataset into a build and test process separately as found in other competitive products. However, there are times user may want to perform an external data split. For example, user may want to generate a single training and test dataset, and reuse them in multiple workflows. The generation of training and test dataset can be done easily via the SQL Query node.

Stratified Split

The stratified split is used internally by the Classification Build node, because this technique can preserve the categorical target distribution in the resulting training and test dataset, which is important for the classification model build. The following shows the SQL statements that are essentially used by the Classification Build node to produce the training and test dataset internally:

SQL statement for Training dataset

SELECT

v1.*

FROM

(

-- randomly divide members of the population into subgroups based on target classes

SELECT a.*,

row_number() OVER (partition by {target column} ORDER BY ORA_HASH({case id column})) "_partition_caseid"

FROM {input data} a

) v1,

(

-- get the count of subgroups based on target classes

SELECT {target column},

COUNT(*) "_partition_target_cnt"

FROM {input data} GROUP BY {target column}

) v2

WHERE v1. {target column} = v2. {target column}

-- random sample subgroups based on target classes in respect to the sample size

AND ORA_HASH(v1."_partition_caseid", v2."_partition_target_cnt"-1, 0) <= (v2."_partition_target_cnt" * {percent of training dataset} / 100)


SQL statement for Test dataset

SELECT

v1.*

FROM

(

-- randomly divide members of the population into subgroups based on target classes

SELECT a.*,

row_number() OVER (partition by {target column} ORDER BY ORA_HASH({case id column})) "_partition_caseid"

FROM {input data} a

) v1,

(

-- get the count of subgroups based on target classes

SELECT {target column},

COUNT(*) "_partition_target_cnt"

FROM {input data} GROUP BY {target column}

) v2

WHERE v1. {target column} = v2. {target column}

-- random sample subgroups based on target classes in respect to the sample size

AND ORA_HASH(v1."_partition_caseid", v2."_partition_target_cnt"-1, 0) > (v2."_partition_target_cnt" * {percent of training dataset} / 100)

The followings describe the placeholders used in the SQL statements:

{target column} - target column. It must be categorical type.

{case id column} - case id column. It must contain unique numbers that identify the rows.

{input data} - input data set.

{percent of training dataset} - percent of training dataset. For example, if you want to split 60% of input dataset into training dataset, use the value 60. The test dataset will contain 100%-60% = 40% of the input dataset. The training and test dataset are mutually exclusive.

Random Split

The random split is used internally by the Regression Build node because the target is usually numerical type. The following shows the SQL statements that are essentially used by the Regression Build node to produce the training and test dataset:

SQL statement for Training dataset

SELECT

v1.*

FROM

{input data} v1

WHERE ORA_HASH({case id column}, 99, 0) <= {percent of training dataset}

SQL statement for Test dataset

SELECT

    v1.*

FROM

{input data} v1

WHERE ORA_HASH({case id column}, 99, 0) > {percent of training dataset}

The followings describe the placeholders used in the SQL statements:

{case id column} - case id column. It must contain unique numbers that identify the rows.

{input data} - input data set.

{percent of training dataset} - percent of training dataset. For example, if you want to split 60% of input dataset into training dataset, use the value 60. The test dataset will contain 100%-60% = 40% of the input dataset. The training and test dataset are mutually exclusive.

Use SQL Query node to create training and test dataset

Assume you want to create the training and test dataset out of the demo INSUR_CUST_LTV_SAMPLE dataset using the stratified split technique; you can create the following workflow to utilize the SQL Query nodes to execute the above split SQL statements to generate the dataset, and then use the Create Table nodes to persist the resulting dataset.

Assume the case id is CUSTOMER_ID, target is BUY_INSURANCE, and the training dataset is 60% of the input dataset. You can enter the following SQL statement to create the training dataset in the “SQL Query Stratified Training” SQL Query node:

SELECT

v1.*

FROM

(

-- randomly divide members of the population into subgroups based on target classes

SELECT a.*,

row_number() OVER (partition by "BUY_INSURANCE" ORDER BY ORA_HASH("CUSTOMER_ID")) "_partition_caseid"

FROM "INSUR_CUST_LTV_SAMPLE_N$10009" a

) v1,

(

-- get the count of subgroups based on target classes

SELECT "BUY_INSURANCE",

COUNT(*) "_partition_target_cnt"

FROM "INSUR_CUST_LTV_SAMPLE_N$10009" GROUP BY "BUY_INSURANCE"

) v2

WHERE v1."BUY_INSURANCE" = v2."BUY_INSURANCE"

-- random sample subgroups based on target classes in respect to the sample size

AND ORA_HASH(v1."_partition_caseid", v2."_partition_target_cnt"-1, 0) <= (v2."_partition_target_cnt" * 60 / 100)



Likewise, you can enter the following SQL statement to create the test dataset in the “SQL Query Stratified Test” SQL Query node:

SELECT

v1.*

FROM

(

-- randomly divide members of the population into subgroups based on target classes

SELECT a.*,

row_number() OVER (partition by "BUY_INSURANCE" ORDER BY ORA_HASH("CUSTOMER_ID")) "_partition_caseid"

FROM "INSUR_CUST_LTV_SAMPLE_N$10009" a

) v1,

(

-- get the count of subgroups based on target classes

SELECT "BUY_INSURANCE",

COUNT(*) "_partition_target_cnt"

FROM "INSUR_CUST_LTV_SAMPLE_N$10009" GROUP BY "BUY_INSURANCE"

) v2

WHERE v1."BUY_INSURANCE" = v2."BUY_INSURANCE"

-- random sample subgroups based on target classes in respect to the sample size

AND ORA_HASH(v1."_partition_caseid", v2."_partition_target_cnt"-1, 0) > (v2."_partition_target_cnt" * 60 / 100)

Now run the workflow to create the training and test dataset. You can find the table names of the persisted dataset in the associated Create Table nodes.


Conclusion

This blog shows how easily to create the training and test dataset using the stratified split SQL statements via the SQL Query nodes. Similarly, you can generate the training and test dataset using the random split technique by replacing SQL statements with the random split SQL statements in the SQL Query nodes in the above workflow. If a large dataset (tens of millions of rows) is used in multiple model build nodes, it may be a good idea to split the data ahead of time to optimize the overall processing time (avoid multiple internal data splits inside the model build nodes).

Tuesday Jan 14, 2014

How to export data from the Explore Node using Data Miner and SQL Developer

Blog posting by Denny Wong, Principal Member of Technical Staff, User Interfaces and Components, Oracle Data Mining Development

The Explorer node generates descriptive statistical data and histogram data for all input table columns.  These statistical and histogram data may help user to analyze the input data to determine if any action (e.g. transformation) is needed before using it for data mining purpose.  An analyst may want to export this data to a file for offline analysis (e.g. Excel) or reporting purpose.  The Explorer node generates this data to a database table specified in the Output tab of the Property Inspector.  In this case, the data is generated to a table named “OUTPUT_1_2”.


To export the table to a file, we can use the SQL Developer Export wizard. Go to the Connections tab in the Navigator Window, search for the table “OUTPUT_1_2” within the proper connection, then bring up the pop-up menu off the table. Click on the Export menu to launch the Export Wizard.


In the wizard, uncheck the “Export DDL” and select the “Export Data” option since we are only interested in the data itself. In the Format option, select “excel” in this example (a dozen of output formats are supported) and specify the output file name. Upon wizard finish, an excel file is generated.


Let’s open the file to examine what is in it. As expected, it contains all statistical data for all input columns. The histogram data is listed as the last column (HISTOGRAMS), and it has this ODMRSYS.ODMR_HISTOGRAMS structure.


For example, let’s take a closer look at the histogram data for the BUY_INSURANCE column:

ODMRSYS.ODMR_HISTOGRAMS(ODMRSYS.ODMR_HISTOGRAM_POINT('"BUY_INSURANCE"',''No'',NULL,NULL,73.1),ODMRSYS.ODMR_HISTOGRAM_POINT('"BUY_INSURANCE"',''Yes'',NULL,NULL,26.9))

This column contains an ODMRSYS.ODMR_HISTOGRAMS object which is an array of ODMRSYS.ODMR_HISTOGRAM_POINT structure. We can describe the structure to see what is in it.


The ODMRSYS.ODMR_HISTOGRAM_POINT contains five attributes, which represent the histogram data. The ATTRIBUTE_NAME contains the attribute name (e.g. BUY_INSURANCE), the ATTRIBUTE_VALUE contains the attribute values (e.g. No, Yes), the GROUPING_ATTRIBUTE_NAME and GROUPING_ ATTRIBUTE_VALUE are not used (these fields are used when the Group By option is specified), and the ATTRIBUTE_PERCENT contains the percents (e.g. 73.1, 26.9) for the attribute values respectively.


As you can see the ODMRSYS.ODMR_HISTOGRAMS complex output format may be difficult to read and it may require some processing before the data can be used. Alternatively, we can “unnest” the histogram data to transactional data format before exporting it. This way we don’t have to deal with the complex array structure, thus the data is more consumable. To do that, we can write a simple SQL query to “unnest” the data and use the new SQL Query node (Extract histogram data) to run this query (see below). We then use a Create Table node (Explorer output table) to persist the “unnested” histogram data along with the statistical data.

1. Create a SQL Query node

Create a SQL Query node and connect the “Explore Data” node to it. You may rename the SQL Query node to “Extract histogram data” to make it clear it is used to “unnest” the histogram data.

2. Specify a SQL query to “unnest” histogram data

Double click the “Extract histogram data” node to bring up the editor, enter the following SELECT statement in the editor:

SELECT
    "Explore Data_N$10002"."ATTR",
    "Explore Data_N$10002"."AVG",
    "Explore Data_N$10002"."DATA_TYPE",
    "Explore Data_N$10002"."DISTINCT_CNT",
    "Explore Data_N$10002"."DISTINCT_PERCENT",
    "Explore Data_N$10002"."MAX",
    "Explore Data_N$10002"."MEDIAN_VAL",
    "Explore Data_N$10002"."MIN",
    "Explore Data_N$10002"."MODE_VALUE",
    "Explore Data_N$10002"."NULL_PERCENT",
    "Explore Data_N$10002"."STD",
    "Explore Data_N$10002"."VAR",
    h.ATTRIBUTE_VALUE,
    h.ATTRIBUTE_PERCENT
FROM
    "Explore Data_N$10002", TABLE("Explore Data_N$10002"."HISTOGRAMS") h

Click OK to close the editor. This query is used to extract out the ATTRIBUTE_VALUE and ATTRIBUTE_PERCENT fields from the ODMRSYS.ODMR_HISTOGRAMS nested object.

Note: you may select only columns that contain the statistics you are interested in.  The "Explore Data_N$10002" is a generated unique name reference to the Explorer node, you may have a slightly different name ending with some other unique number. 

The query produces the following output.  The last two columns are the histogram data in transactional format.

3. Create a Create Table node to persist the “unnested” histogram data

Create a Create Table node and connect the “Extract histogram data” node to it. You may rename the Create Table node to “Explorer output table” to make it clear it is used to persist the “unnested” histogram data.


4. Export “unnested” histogram data to Excel file

Run the “Explorer output table” node to persist the “unnested” histogram data to a table. The name of the output table (OUTPUT_3_4) can be found in the Property Inspector below.


Next, we can use the SQL Developer Export wizard as described above to export the table to an Excel file. As you can see the histogram data are now in transactional format; they are more readable and can readily be consumed.


Tuesday Nov 12, 2013

Oracle Big Data Learning Library

Click on LEARN BY PRODUCT to view all learning resources.

Oracle Big Data Essentials

Attend this Oracle University Course!

Using Oracle NoSQL Database

Attend this Oracle University class!

Oracle and Big Data on OTN

See the latest resource on OTN.

<script type="text/javascript"> var _gaq = _gaq || []; _gaq.push(['_setAccount', 'UA-46756583-1']); _gaq.push(['_trackPageview']); (function() { var ga = document.createElement('script'); ga.type = 'text/javascript'; ga.async = true; ga.src = ('https:' == document.location.protocol ? 'https://ssl' : 'http://www') + '.google-analytics.com/ga.js'; var s = document.getElementsByTagName('script')[0]; s.parentNode.insertBefore(ga, s); })(); </script>

Tuesday Jan 01, 2013

Turkcell Combats Pre-Paid Calling Card Fraud Using In-Database Oracle Advanced Analytics

Turkcell İletişim Hizmetleri A.S. Successfully Combats Communications Fraud with Advanced In-Database Analytics

[Original link available on oracle.com http://www.oracle.com/us/corporate/customers/customersearch/turkcell-1-exadata-ss-1887967.html]

Turkcell İletişim Hizmetleri A.Ş. is a leading provider of mobile communications in Turkey with more than 34 million subscribers. Established in 1994, Turkcell created the first global system for a mobile communications (GSM) network in Turkey. It was the first Turkish company listed on the New York Stock Exchange.

Communications fraud, or the  use of telecommunications products or services without intention to pay, is a major issue for the organization. The practice is fostered by prepaid card usage, which is growing rapidly. Anonymous network-branded prepaid cards are a tempting vehicle for money launderers, particularly since these cards can be used as cash vehicles—for example, to withdraw cash at ATMs. It is estimated that prepaid card fraud represents an average loss of US$5 per US$10,000 in transactions. For a communications company with billions of transactions, this could result in millions of dollars lost through fraud every year.

Consequently, Turkcell wanted to combat communications fraud and money laundering by introducing advanced analytical solutions to monitor key parameters of prepaid card usage and issue alerts or block fraudulent activity. This type of fraud prevention would require extremely fast analysis of the company’s one petabyte of uncompressed customer data to identify patterns and relationships, build predictive models, and apply those models to even larger data volumes to make accurate fraud predictions.

To achieve this, Turkcell deployed Oracle Exadata Database Machine X2-2 HC Full Rack, so that data analysts can build predictive antifraud models inside the Oracle Database and deploy them into Oracle Exadata for scoring, using Oracle Data Mining, a component of Oracle Advanced Analytics, leveraging Oracle Database11g technology. This enabled the company to create predictive antifraud models faster than with any other machine, as models can be built using search and query language (SQL) inside the database, and Oracle Exadata can access raw data without summarized tables, thereby achieving extremely fast analyses.

Challenges

A word from Turkcell İletişim Hizmetleri A.Ş.

“Turkcell manages 100 terabytes of compressed data—or one petabyte of uncompressed raw data—on Oracle Exadata. With Oracle Data Mining, a component of the Oracle Advanced Analytics Option, we can analyze large volumes of customer data and call-data records easier and faster than with any other tool and rapidly detect and combat fraudulent phone use.” – Hasan Tonguç Yılmaz, Manager, Turkcell İletişim Hizmetleri A.Ş.

  • Combat communications fraud and money laundering by introducing advanced analytical solutions to monitor prepaid card usage and alert or block suspicious activity
  • Monitor numerous parameters for up to 10 billion daily call-data records and value-added service logs, including the number of accounts and cards per customer, number of card loads per day, number of account loads over time, and number of account loads on a subscriber identity module card at the same location
  • Enable extremely fast sifting through huge data volumes to identify patterns and relationships, build predictive antifraud models, and apply those models to even larger data volumes to make accurate fraud predictions
  • Detect fraud patterns as soon as possible and enable quick response to minimize the negative financial impact

Solutions

Oracle Product and Services

  • Used Oracle Exadata Database Machine X2-2 HC Full Rack to create predictive antifraud models more quickly than with previous solutions by accessing raw data without summarized tables and providing unmatched query speed, which optimizes and shortens the project design phases for creating predictive antifraud models
  • Leveraged SQL for the preparation and transformation of one petabyte of uncompressed raw communications data, using Oracle Data Mining, a feature of Oracle Advanced Analytics to increase the performance of predictive antifraud models
  • Deployed Oracle Data Mining models on Oracle Exadata to identify actionable information in less time than traditional methods—which would require moving large volumes of customer data to a third-party analytics software—and achieve an average gain of four hours and more, taking into consideration the absence of any system crash (as occurred in the previous environment) during data import
  • Achieved extreme data analysis speed with in-database analytics performed inside Oracle Exadata, through a row-wise information search—including day, time, and duration of calls, as well as number of credit recharges on the same day or at the same location—and query language functions that enabled analysts to detect fraud patterns almost immediately
  • Implemented a future-proof solution that could support rapidly growing data volumes that tend to double each year with Oracle Exadata’s massively scalable data warehouse performance

Why Oracle

“We selected Oracle because in-database mining to support antifraud efforts will be a major focus for Turkcell in the future. With Oracle Exadata Database Machine and the analytics capabilities of Oracle Advanced Analytics, we can complete antifraud analysis for large amounts of call-data records in just a few hours. Further, we can scale the solution as needed to support rapid communications data growth,” said Hasan Tonguç Yılmaz, datawarehouse/data mining developer, Turkcell Teknoloji Araştırma ve Geliştirme A.Ş.

Partner

Oracle Partner: Turkcell Teknoloji Araştırma ve Geliştirme A.Ş.

All development and test processes were performed by Turkcell Teknoloji. The company also made significant contributions to the configuration of numerous technical analyses which are carried out regularly by Turkcell İletişim Hizmetleri's antifraud specialists.

Resources

<script type="text/javascript"> var _gaq = _gaq || []; _gaq.push(['_setAccount', 'UA-46756583-1']); _gaq.push(['_trackPageview']); (function() { var ga = document.createElement('script'); ga.type = 'text/javascript'; ga.async = true; ga.src = ('https:' == document.location.protocol ? 'https://ssl' : 'http://www') + '.google-analytics.com/ga.js'; var s = document.getElementsByTagName('script')[0]; s.parentNode.insertBefore(ga, s); })(); </script>

Thursday May 10, 2012

Oracle Virtual SQL Developer Days DB May 15th - Session #3: 1Hr. Predictive Analytics and Data Mining Made Easy!

All,

Oracle Data Mining's SQL Developer based ODM'r GUI + ODM is being featured in this upcoming Virtual SQL Developer Day online event next Tuesday, May 15th.  Several thousand people have already registered and registration is still growing.  We recorded and uploaded presentations/demos and then anyone can view them "on demand", but at the specified date/time per the SQL DD event agenda.  Anyone can also download a complete 11gR2 Database w/ SQL Developer 3.1 & Oracle Data Miner GUI extension VM installation for the Hands-on Labs and follow our 4 ODM Oracle by Examples e-training.  We moderators monitor the online chat and answer questions. 
Session #3: 1Hr. Predictive Analytics and Data Mining Made Easy!
Oracle Data Mining, a component of the Oracle Advanced Analytics database option, embeds powerful data mining algorithms in the SQL kernel of the Oracle Database for problems such as customer churn, predicting customer behavior, up-sell and cross-sell, detecting fraud, market basket analysis (e.g. beer & diapers), customer profiling and customer loyalty. Oracle Data Miner, SQL Developer 3.1 extension, provides data analysts a “workflow” paradigm to build analytical methodologies to explore data and build, evaluate and apply data mining models—all while keeping the data inside the Oracle Database. This workshop will teach the student the basics of getting started using Oracle Data Mining.
We're also included in the June 7th physical event in NYC and future virtual and physical events.  Great event(s) and great "viz" for OAA/ODM.

 Charlie

Wednesday Apr 04, 2012

Recorded YouTube-like presentation and "live" demos of Oracle Advanced Analytics/Oracle Data Mining

Ever want to just sit and watch a YouTube-like presentation and "live" demos of Oracle Advanced Analytics/Oracle Data Mining?  Then click here! (plays large MP4 file in a browser)

This 1+ hour long session focuses primarily on the Oracle Data Mining component of the Oracle Advanced Analytics Option and is tied to the Oracle SQL Developer Days virtual and onsite events.   I cover:

  • Big Data + Big Data Analytics
  • Competing on analytics & value proposition
  • What is data mining?
  • Typical use cases
  • Oracle Data Mining high performance in-database SQL based data mining functions
  • Exadata "smart scan" scoring
  • Oracle Data Miner GUI (an Extension that ships with SQL Developer)
  • Oracle Business Intelligence EE + Oracle Data Mining results/predictions in dashboards
  • Applications "powered by Oracle Data Mining for factory installed predictive analytics methodologies
  • Oracle R Enterprise

Please contact charlie.berger@oracle.com should you have any questions.  Hope you enjoy! 

Charlie Berger, Sr. Director of Product Management, Oracle Data Mining & Advanced Analytics, Oracle Corporation

Monday Jan 18, 2010

Fraud and Anomaly Detection Made Simple

Here is a quick and simple application for fraud and anomaly detection.  To replicate this on your own computer, download and install the Oracle Database 11g Release 1 or 2.  (See http://www.oracle.com/technology/products/bi/odm/odm_education.html for more information).  This small application uses the Automatic Data Preparation (ADP) feature that we added in Oracle Data Mining 11g.  Click here to download the CLAIMS data table.  [Download the .7z file and save it somwhere, unzip to a .csv file and then use SQL Developer data import wizard to import the claims.csv file into a table in the Oracle Database.]


First, we instantiate the ODM settings table to override the defaults.  The default value for Classification data mining function is to use our Naive Bayes algorithm, but since this is a different problem, looking for anomalous records amongst a larger data population, we want to change that to SUPPORT_VECTOR_MACHINES.  Also, as the 1-Class SVM does not rely on a Target field, we have to change that parameter to "null".  See http://download.oracle.com/docs/cd/B28359_01/datamine.111/b28129/anomalies.htm for detailed Documentation on ODM's anomaly detection.

drop table CLAIMS_SET;

exec dbms_data_mining.drop_model('CLAIMSMODEL');

create table CLAIMS_SET (setting_name varchar2(30), setting_value varchar2(4000));

insert into CLAIMS_SET values ('ALGO_NAME','ALGO_SUPPORT_VECTOR_MACHINES');

insert into CLAIMS_SET values ('PREP_AUTO','ON');

commit;


Then, we run the dbms_data_mining.create_model function and let the in-database Oracle Data Mining algorithm run through the data, find patterns and relationships within the CLAIMS data, and infer a CLAIMS data mining model from the data.  

begin

dbms_data_mining.create_model('CLAIMSMODEL', 'CLASSIFICATION',

'CLAIMS', 'POLICYNUMBER', null, 'CLAIMS_SET');

end;

/


After that, we can use the CLAIMS data mining model to "score" all customer auto insurance policies, sort them by our prediction_probability and select the top 5 most unusual claims.  

-- Top 5 most suspicious fraud policy holder claims

select * from

(select POLICYNUMBER, round(prob_fraud*100,2) percent_fraud,

rank() over (order by prob_fraud desc) rnk from

(select POLICYNUMBER, prediction_probability(CLAIMSMODEL, '0' using *) prob_fraud

from CLAIMS

where PASTNUMBEROFCLAIMS in ('2 to 4', 'more than 4')))

where rnk <= 5

order by percent_fraud desc;


Leave these results inside the database and you can create powerful dashboards using Oracle Business Intelligence EE (or any reporting or dashboard tool that can query the Oracle Database) that multiple ODM's probability of the record being anomalous times (x) the dollar amount of the claim, and then use stoplight color coding (red, orange, yellow) to flag only the more suspicious claims.  Very automated, very easy, and all inside the Oracle Database! <script type="text/javascript"> var _gaq = _gaq || []; _gaq.push(['_setAccount', 'UA-46756583-1']); _gaq.push(['_trackPageview']); (function() { var ga = document.createElement('script'); ga.type = 'text/javascript'; ga.async = true; ga.src = ('https:' == document.location.protocol ? 'https://ssl' : 'http://www') + '.google-analytics.com/ga.js'; var s = document.getElementsByTagName('script')[0]; s.parentNode.insertBefore(ga, s); })(); </script>
About

Everything about Oracle Data Mining, a component of the Oracle Advanced Analytics Option - News, Technical Information, Opinions, Tips & Tricks. All in One Place

Search

Categories
Archives
« August 2015
SunMonTueWedThuFriSat
      
1
2
3
4
5
6
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
     
Today