Sunday Jul 26, 2015

Big Data Analytics with Oracle Advanced Analytics: Making Big Data and Analytics Simple white paper

Big Data Analytics with Oracle Advanced Analytics:

Making Big Data and Analytics Simple

Oracle White Paper  |  July 2014 

Executive Summary:  Big Data Analytics with Oracle Advanced Analytics

(Click HERE to read entire Oracle white paper)

The era of “big data” and the “cloud” are driving companies to change.  Just to keep pace, they must learn new skills and implement new practices that leverage those new data sources and technologies.  Increasing customer expectations from sharing their digital exhaust with corporations in exchange for improved customer interactions and greater perceived value are pushing companies forward.  Big data and analytics offer the promise to satisfy these new requirements.  Cloud, competition, big data analytics and next-generation “predictive” applications are driving companies towards achieving new goals of delivering improved “actionable insights” and better outcomes.  Traditional BI & Analytics approaches don’t deliver these detailed predictive insights and simply can’t satisfy the emerging customer expectations in this new world order created by big data and the cloud.

Unfortunately, with big data, as the data grows and expands in the three V’s; velocity, volume and variety (data types), new problems emerge.  Data volumes grow and data becomes unmanageable and immovable.  Scalability, security, and information latency become new issues.  Dealing with unstructured data, sensor data and spatial data all introduce new data type complexities.  

Traditional advanced analytics has several information technology inherent weak points: data extracts and data movement, data duplication resulting in no single-source of truth, data security exposures, separate and many times, depending on the skills of the data analysts/scientists involved, multiple analytical tools (commercial and open source) and languages (SAS, R, SQL, Python, SPSS, etc.).  Problems become particularly egregious during a deployment phase when the worlds of data analysis and information management collide.   

Traditional data analysis typically starts with a representative sample or subset of the data that is exported to separate analytical servers and tools (SAS, R, Python, SPSS, etc.) that have been especially designed for statisticians and data scientists to analyze data.  The analytics they perform range from simple descriptive statistical analysis to advanced, predictive and prescriptive analytics.  If a data scientist builds a predictive model that is determined to be useful and valuable, then IT needs to be involved to figure out deployment and enterprise deployment and application integration issues become the next big challenge. The predictive model(s)—and all its associated data preparation and transformation steps—have to be somehow translated to SQL and recreated inside the database in order to apply the models and make predictions on the larger datasets maintained inside the data warehouse.  This model translation phase introduces tedious, time consuming and expensive manual coding steps from the original statistical language (SAS, R, and Python) into SQL.  DBAs and IT must somehow “productionize” these separate statistical models inside the database and/or data warehouse for distribution throughout the enterprise.  Some vendors will charge for specialized products and options for just for predictive model deployment.  This is where many advanced analytics projects fail.  Add Hadoop, sensor data, tweets, and expanding big data reservoirs and the entire “data to actionable insights” process becomes more challenging.  

Not with Oracle.  Oracle delivers a big data and analytics platform that eliminates the traditional extract, move, load, analyze, export, move load paradigm.  With Oracle Database 12c and the Oracle Advanced Analytics Option, big data management and big data analytics are designed into the data management platform from the beginning.  Oracle’s multiple decades of R&D investment in developing the industry’s leading data management platform, Oracle SQL, Big Data SQL, Oracle Exadata, Oracle Big Data Appliance and integration with open source R are seamlessly combined and integrated into a single platform—the Oracle Database.  

Oracle’s vision is a big data and analytic platform for the era of big data and cloud to:

  • Make big data and analytics simple (for any data size, on any computer infrastructure and any variety of data, in any combination) and

  • Make big data and analytics deployment simple (as a service, as a platform, as an application)

Oracle Advanced Analytics offers a wide library of powerful in-database algorithms and integration with open source R that together can solve a wide variety of business problems and can be accessed via SQL, R or GUI.  Oracle Advanced Analytics, an option to the Oracle Database Enterprise Edition 12c, extends the database into an enterprise-wide analytical platform for data-driven problems such as churn prediction, customer segmentation, fraud and anomaly detection, identifying cross-sell and up-sell opportunities, market basket analysis, and text mining and sentiment analysis.  Oracle Advanced Analytics empowers data analyst, data scientists and business analysts to more extract knowledge, discover new insights and make informed predictions—working directly with large data volumes in the Oracle Database.   

Data analysts/scientists have choice and flexibility in how they interact with Oracle Advanced Analytics.  Oracle Data Miner is an Oracle SQL Developer extension designed for data analysts that provides an easy to use “drag and drop” workflow GUI to the Oracle Advanced Analytics SQL data mining functions (Oracle Data Mining).  Oracle SQL Developer is a free integrated development environment that simplifies the development and management of Oracle Database in both traditional and Cloud deployments. When Oracle Data Miner users are satisfied with their analytical methodologies, they can share their workflows with other analysts and/or generate SQL scripts to hand to their DBAs to accelerate model deployment.  Oracle Data Miner also provides a PL/SQL API for workflow scheduling and automation.  

R programmers and data scientists can use the familiar open source R statistical programming language console, RStudio or any IDE to work directly with data inside the database and leverage Oracle Advanced Analytics’ R integration with the database (Oracle R Enterprise).  Oracle Advanced Analytics’ Oracle R Enterprise provides transparent SQL to R translation to equivalent SQL and Oracle Data Mining functions for in-database performance, parallelism, and scalability—this making R ready for the enterprise.  

Application developers, using the ODM SQL data mining functions and ORE R integration can build completely automated predictive analytic solutions that leverage the strengths of the database and the flexibly of R to integrate Oracle Advanced Analytics analytical solutions into BI dashboards and enterprise applications.

By integrating big data management and big data analytics into the same powerful Oracle Database 12c data management platform, Oracle eliminates data movement, reduces total cost of ownership and delivers the fastest way to deliver enterprise-wide predictive analytics solutions and applications.  

(Click HERE to read entire Oracle white paper)

Friday Jul 24, 2015

2015 BIWA SIG Virtual Conference - Two Days of "Live" Talks by Experts - FREE

2015 BIWA SIG Virtual Conference

July 30-31, 2015 9:00 a.m. - 1:00 p.m. CDT

Join us for two full days where you will hear about the latest Business Intelligence trends. 

Day One:

  • 9:00 a.m. - 10:00 a.m.: What’s new in Oracle EPM and BI Infrastructure - Eric Helmer, ADI Strategies

Hyperion EPM abd BI Fusion edition is a dramatic change under the covers. Corporations must consider more globalapproaches to infrastructure to maintain availability and performance while reducing footprint and cost. Technologies such as Exalytics, Oracle virtualization, cloud computing, software as a service, etc and open source operating systems (Linux) are more commonplace. Join Oracle Are Director Eric Helmer as he covers what’s new, what’s supported, and what options you have when implementing your EPM/BI project.

  • 10:00 a.m. - 11:00 a.m.Italian Ministry of Labor & Social Policy -- A Journey to Digital Government - Nicola Sandoli, ICONSULTING

The Italian Ministry of Labor and Social Policy (MLPS) is a branch of the Italian government responsible for all labormatters, including employment policies, promotions, worker protection, and social security. In its evolution towards a digital government, MLPS is streamlining and simplifying its administrative processes. MLPS has embarked on a data-driven journey to redefine business models and interactions with citizens – and optimize and transform government services. MLPS is focusing on four areas: - Information delivery: transitioning its data warehouse platform from reporting to centralizing and certifying data - Business Intelligence: monitoring activities, web publishing, and analyzing socio–political impact - Web analytics and semantic intelligence: interacting more efficiently with citizens - Job-hunting online guidance services: real time answers to young people looking for jobs MLPS is using a wide range of Oracle technologies to manage large amounts of diverse data, and apply advanced analytics, including - Oracle Exalytics for daily updates of 5TB of data - Oracle Spatial and Graph and MapViewer 11g for location intelligence capabilities - Oracle Business Intelligence for desktop and mobile reporting - Oracle Endeca Information Discovery for web analytics, data discovery, and data analysis using social and semantic intelligence - Oracle Real-Time Decisions - Oracle Service-Oriented Architecture Suite: central point for accessing and managing information made available through the Ministry web portal Cliclavoro Learn more about MLPS and its innovative platform that is delivering better information and services to their constituents.

  • 11:00 a.m. - 12:00 p.m.Exadata:  Elastic Configurations and IaaS – Private Cloud - Amit Kanda, Oracle

Customers are faced with challenges in their business, which include taking real time data driven decisions and  reducing costs.  Exadata’s extreme performance combined with Database In-Memory answer the real time data driven decisions. Elastic configurations and an updated subscription model (IaaS – Private Cloud) for Exadata  hardware and software accompanied the launch of Exadata X5–2.  This presentation will describe these updates and how customers can start small with Exadata and grow Exadata with their business – making it easier to reach business objectives.

  • 12:00 p.m. - 1:00 p.m.The State of Internet of Things (IoT) - Shyam Varan Nath, GE

The Internet of Things or IoT is poised to have a tremendous amount of impact around us. This session will look at  the industry landscape of IoT. The different flavors of IoT will be discussed with use cases from the consumer,  commercial and industrial sectors. Learn about the edge and cloud computing platforms to power the IoT solutions.  Finally, walk-thru of use-cases that show how machine/sensor data is being monetized through analytics. Such use  cases will span Aviation and other industries.


Day Two:

  • 9:00 a.m. - 10:00 a.m.: Big Data Analytics with Oracle Advanced Analytics 12c and Big Data SQL - Charlie Berger, Oracle

Oracle Advanced Analytics 12c, delivers parallelized in-database implementations of data mining algorithms andintegration with R. Data analysts use Oracle Data Miner GUI and R to build and evaluate predictive models and leverage R packages and graphs. Application developers deploy OAA models using SQL data mining functions and R. Oracle extends the Database to an analytical platform that mines more data and more data types, eliminates data movement and preserves security to automatically detect patterns and anticipate customer behavior and deliver actionable insights. Oracle Big Data SQL adds new big data sources and ORAAH provides algorithms that run on Hadoop. Come learn what’s new, best practices, and hear customer examples.

  • 10:00 a.m. - 11:00 a.m.: Graph Data Management and Analytics for Big DataBill Beauregard, Oracle & Zhe Wu, Oracle

The newest Oracle big data product, Oracle Big Data Spatial and Graph, offers a set of spatial analytic services, and a graph database with rich graph analytics that support big data workloads on Apache Hadoop and NoSQL technologies. Oracle is applying over a decade of expertise with spatial and graph analytic technologies to big data architectures. Graphs are an important data model for big data systems. Property graphs can be used for discovery, for instance, to discover underlying communities and influencers within a social graph, relationships and connections in cyber security networks, and to generate recommendations based on interests, profiles, and past behaviors. Oracle Big Data Spatial and Graph provides optimized storage, search and querying in Oracle NoSQL Database and Apache HBase for distributed property graphs. It offers 35 built-in, in-memory, parallel property graph analytic functions. We will discuss use cases, features, architecture, and show a demo. Learn how developers and data scientists can manage their most challenging graph data processing in a single enterprise-class Big Data platform.

  • 11:00 a.m. - 12:00 p.m.Why Oracle Database In-Memory?  Use Cases and Overview - Andy Rivenes, Oracle

Oracle recently announced the availability of the Oracle Database In-Memory option, a memory-optimized database technology that transparently adds real-time analytics to applications. Because the In-Memory option is 100% compatible with existing Oracle Database applications, it’s easy to integrate it into your environment and to begin reaping the benefits. But how do you get started with it? What do you need to know to take full advantage of this new functionality? This session will give an overview of what Oracle Database In-Memory is and then discuss some use cases to highlight how it can be used.

| Register Here |


Wednesday Jul 15, 2015

Call for Abstracts at BIWA Summit'16 - The Oracle Big Data + Analytics User Conference


Please email shyamvaran@gmail.com with any questions regarding the submission process.

What Successes Can You Share?

We want to hear your story. Submit your proposal today for the Oracle BIWA Summit 2016.

Proposals will be accepted through Monday evening, November 2, 2015, at midnight, EST. Don’t wait, though—we’re accepting submissions on a rolling basis, so that selected sessions can be published early on our online agenda.

To submit your abstract, click here, select a track, fill out the form.

Please note:

  • Presentations must be noncommercial.
  • Sales promotions for products or services disguised as proposals will be eliminated. 
  • Speakers whose abstracts are accepted will be expected to submit (at a later date) a PowerPoint presentation slide set. 
  • Accompanying technical and use case papers are encouraged, but not required.

Speakers whose abstracts are accepted will be given a complimentary registration to the conference. (Any additional co-presenters must register for the event separately and provide appropriate registration fees. It is up to the co-presenters’ discretion which presenter to designate for the complimentary registration.) 

This Year’s Tracks

Proposals can be submitted for the following tracks: 

More About the Conference

The Oracle BIWA Summit 2016 is organized and managed by the Oracle BIWA SIG, the Oracle Spatial SIG, and the Oracle Northern California User Group. The event attracts top BI, data warehousing, analytics, Spatial, IoT and Big Data experts.

The three-day event includes keynotes from industry experts, educational sessions, hands-on labs, and networking events.

Hot topics include: 

  • Database, data warehouse and cloud, Big Data architecture
  • Deep dives and hands-on labs on existing Oracle BI, data warehouse, and analytics products
  • Updates on the latest Oracle products and technologies (e.g. Big Data Discovery, Oracle Visual Analyzer, Oracle Big Data SQL)
  • Novel and interesting use cases on everything – Spatial, Graph, Text, Data Mining, IoT, ETL, Security, Cloud
  • Working with Big Data (e.g., Hadoop, "Internet of Things,” SQL, R, Sentiment Analysis)
  • Oracle Business Intelligence (OBIEE), Oracle Big Data Discovery, Oracle Spatial, and Oracle Advanced Analytics—Better Together

Hope to see you at BIWA'16 in January, 2016!

Charlie

Monday May 04, 2015

Oracle Data Miner 4.1, SQL Developer 4.1 Extension Now Available!

To download, visit:  

http://www.oracle.com/technetwork/developer-tools/sql-developer/overview/index-097090.html

New Data Miner Features in SQL Developer 4.1

These new Data Miner 4.1 features are supported for database versions supported by Oracle Data Miner: 
JSON Data Support for Oracle Database 12.1.0.2 and above

In response to the growing popularity of JSON data and its use in Big Data configurations, Data Miner now provides an easy to use JSON Query node. The JSON Query node allows you to select and aggregate JSON data without entering any SQL commands. The JSON Query node opens up using all of the existing Data Miner features with JSON data. The enhancements include:

Data Source Node
o    Automatically identifies columns containing JSON data by identifying those with the IS_JSON constraint.
o    Generates JSON schema for any selected column that contain JSON data.
o    Imports a JSON schema for a given column.
o    JSON schema viewer.

Create Table Node
o    Ability to select a column to be typed as JSON.
o    Generates JSON schema in the same manner as the Data Source node.

JSON Data Type
o    Columns can be specifically typed as JSON data.

JSON Query Node (see related JSON node blog posting)
o    Ability to utilize any of the selection and aggregation features without having to enter SQL commands.
o    Ability to select data from a graphical layout of the JSON schema, making data selection as easy as it is with scalar relational data columns.
o    Ability to partially select JSON data as standard relational scalar data while leaving other parts of the same JSON document as JSON data.
o    Ability to aggregate JSON data in combination with relational data. Includes the Sub-Group By option, used to generate nested data that can be passed into mining model build nodes. 

General Improvements
o    Improved database session management resulting in less database sessions being generated and a more responsive user interface.
o    Filter Columns Node - Combined primary Editor and associated advanced panel to improve usability.
o    Explore Data Node - Allows multiple row selection to provide group chart display.
o    Classification Build Node - Automatically filters out rows where the Target column contains NULLs or all Spaces. Also, issues a warning to user but continues with Model build.
o    Workflow - Enhanced workflows to ensure that Loading, Reloading, Stopping, Saving operations no longer block the UI.
o    Online Help - Revised the Online Help to adhere to topic-based framework.

Selected Bug Fixes (does not include 4.0 patch release fixes)
o    GLM Model Algorithm Settings: Added GLM feature identification sampling option (Oracle Database 12.1 and above).
o    Filter Rows Node: Custom Expression Editor not showing all possible available columns.
o    WebEx Display Issues: Fixed problems affecting the display of the Data Miner UI through WebEx conferencing.


For More Information and Support, please visit the Oracle Data Mining Discussion Forum on the Oracle Technology Network (OTN)

Return to Oracle Data Miner page on OTN

Wednesday Apr 22, 2015

OpenWorld 2015 Call for Proposals Extended to Wed, May 6th, 11:59 p.m

OpenWorld 2015 Call for Proposals Extended to Wed, May 6th, 11:59 p.m https://www.oracle.com/openworld/call-for-proposals.html Submit your Oracle Advanced Analytics stories now

If you’re an Oracle technology expert, conference attendees want to hear it straight from you. So don’t wait—proposals must be submitted by April 29.

Wanted: Outstanding Oracle Experts

The Oracle OpenWorld 2015 Call for Proposals is now open. Attendees at the conference are eager to hear from experts on Oracle business and technology. They’re looking for insights and improvements they can put to use in their own jobs: exciting innovations, strategies to modernize their business, different or easier ways to implement, unique use cases, lessons learned, the best of best practices.

If you’ve got something special to share with other Oracle users and technologists, they want to hear from you, and so do we. Submit your proposal now for this opportunity to present at Oracle OpenWorld, the most important Oracle technology and business conference of the year.

We recommend you take the time to review the General Information, Submission Information, Content Program Policies, and Tips and Guidelines pages before you begin. We look forward to your submissions.


Submit Your Proposal

By submitting a session for consideration, you authorize Oracle to promote, publish, display, and disseminate the content submitted to Oracle, including your name and likeness, for use associated with the Oracle OpenWorld and JavaOne San Francisco 2015 conferences. Press, analysts, bloggers and social media users may be in attendance at OpenWorld or JavaOne sessions.


General Information

  • Conference location: San Francisco, California, USA
  • Dates: Sunday, October 25 to Thursday, October 29, 2015
  • Website: Oracle OpenWorld

Key Dates for 2015

Deliverables Due Dates
Call for Proposals—Open Wednesday, March 25
Call for Proposals—Closed Wednesday, April 29, 11:59 p.m. PDT
Notifications for accepted and declined submissions sent Mid-June

Contact us

  • For questions regarding the Call for Proposals, send an e-mail to speaker-services_ww@oracle.com.
  • For technical questions about the submission tool or issues with submitting your proposal, send an e-mail to OpenWorldContent@gpj.com.
  • Oracle employee submitters should contact the appropriate Oracle track leads before submitting. To view a list of track leads, click here.

Saturday Mar 28, 2015

Use Repository APIs to Manage and Schedule Workflows to run

Data Miner 4.1 ships with a set of repository PL/SQL APIs that allow applications to manage Data Miner projects and workflows directly. The workflow APIs enable applications to execute workflows immediately or schedule workflows to execute using specific time intervals or using defined schedules. The workflow run APIs internally use Oracle Scheduler for scheduling functionality. Moreover, repository views are provided for applications to query project and workflow information. Applications can also monitor workflow execution status and query generated results using these views.

With the workflow APIs, applications can seamlessly integrate the workflow running process.  Moreover, all generated results are accessible by the Data Miner, so you can view the results using the Data Miner user interface.

For more information, please read the White Paper Use Repository APIs to Manage and Schedule Workflows to run

Monday Dec 15, 2014

Use Oracle Data Miner to Perform Sentiment Analysis inside Database using Twitter Data Demo

Sentiment analysis has been a hot topic recently; sentiment analysis or opinion mining refers to the application of natural language processing, computational linguistics, and text analytics to identify and extract subjective information in source materials.  Social media websites are good source of people sentiments.  Companies have been using social networking sites to make new product announcements, promote their products, collect product reviews and user feedback, interact with their customers, etc.  It is important for companies to sense customer sentiments toward their products, so they can react accordingly to benefit from customers’ opinion.

In this blog, we will show you how to use Data Miner to perform some basic sentiment analysis (based on text analytics) using Twitter data.  The demo data was downloaded from the developer API console page of the Twitter website.  The data itself originated from the Oracle Twitter page, and it contains about a thousand tweets posted in the past six months (May to Oct 2014).  We will determine the sentiments (highly favored, moderately favored, and less favored) of tweets based on their favorite counts, and assign the sentiment to each tweet.  We then build classification models using these tweets along with their assigned sentiments.  The goal is to predict how well a new tweet will be received by customers.  This may help marketing department to better craft a tweet before it is posted.

The demo (click here to download demo twitter data and workflow) will use the newly added JSON Query node in the Data Miner 4.1 to import the twitter data; please review the “How to import JSON data to Data Miner for Mining” blog entry in previous post.

Workflow for Sentiment Analysis

The following workflow shows the process we use to prepare the twitter data, determine the sentiments of tweets, and build classification models on the data.

The following describes the nodes used in the above workflow:

  • Data Source (TWITTER_LARGE)
    • Select the demo Twitter data source.  The sample Twitter data is attached with this blog.
  • JSON Query (JSON Query)
    • Select the required JSON attributes used for analysis; we only use the “id”, “text”, and “favorite_count” attributes.  The “text” attribute contains the tweet, and the “favorite_count” attribute indicates how many times the tweet has been favorited.
  • SQL Query (Cleanse Tweets)
    • Remove shorten URLs and punctuations within tweets because these data contain no predictive information.
  • Filter Rows (Filter Rows)
    • Remove retweeted tweets because these are duplicate tweets.
  • Transform (Transform)
    • Perform quantile bin of the “favorite_count” data into three quantiles; each quantile represent a sentiment.  The top quantile represents “highly favored” sentiment, the middle quantile represents “moderately favored” sentiment, and the bottom quantile represents “less favored” sentiment.
  • SQL Query (Recode Sentiment)
    • Assign quantiles as determined sentiments to tweets.
  • Create Table (OUTPUT_4_29)
    • Persist the data to a table for classification model build (optional).
  • Classification (Class Build)
    • Build classification models to predict customer sentiment toward a new tweet (how much will customer like this new tweet?).

Data Source Node (TWITTER_LARGE)

Select the JSON_DATA in the TWITTER_LARGE table.  The JSON_DATA contains about a thousand tweets to be used for sentiment analysis.

JSON Query Node (JSON Query)

Use the new JSON Query node to select the following JSON attributes.  This node projects the JSON data to relational data format, so that it can be consumed within the workflow process.

SQL Query Node (Cleanse Tweets)

Use the REGEXP_REPLACE function to remove numbers, punctuations, and shorten URLs inside tweets because these data are considered noises and do not provide any predictive information.  Notice we do not treat hash tags inside tweets specially; these tags are treated as regular words.

We specify the number, punctuation, and URL patterns in regular expression syntax and use the database function REGEXP_REPLACE to replace these patterns inside all tweets with empty spaces.

SELECT
REGEXP_REPLACE("JSON Query_N$10055"."TWEET", '([[:digit:]*]|[[:punct:]*]|(http[s]?://(.*?)(\s|$)))', '', 1, 0) "TWEETS",
"JSON Query_N$10055"."FAVORITE_COUNT",
"JSON Query_N$10055"."ID"
FROM
"JSON Query_N$10055"

Filter Rows Node (Filter Rows)

Remove retweeted tweets because these are duplicate tweets.  Usually, retweeted tweets start with a “RT” abbreviate, so we specify the following row filter condition to filter out those tweets.

Transform Node (Transform)

Use the Transform node to perform quantile bin of the “favorite_count” data into three quantiles; each quantile represent a sentiment.  For simplicity, we just bin the count into three quantiles without applying any special treatment first.

SQL Query Node (Recode Sentiment)

Assign quantiles as determined sentiments to tweets; top quantile represents “highly favored” sentiment, the middle quantile represents “moderately favored” sentiment, and the bottom quantile represents “less favored”.  These sentiments become target classes for the classification model build.

Classification Node (Class Build)

Build Classification models using the sentiment as target and tweet id as case id.

Since the TWEETS column contains the textual tweets, so we change the mining type to Text Custom.

Enable the Stemming option for text processing.

Compare Test Results

After the model build completes successfully, open the test viewer to compare model test results, the SVM model seems to produce the best prediction for the “highly favored” sentiment (57% correct prediction).

Moreover, the SVM model has better lift result than other models, so we will use this model for scoring.

Sentiment Prediction (Scoring)

Let’s score this tweet “this is a boring tweet!” using the SVM model.

As expected, this tweet receives a “less favored” prediction.

How about this tweet “larry is doing a data mining demo now!” ?

Not surprisingly, this tweet receives a “highly favored” prediction.

Last but not least, let’s see the sentiment prediction for the title of this blog

Not bad it gets a “highly favored” prediction, so it seems this title will be well received by audience.

Conclusion

The best SVM model only produces 57% accuracy for the “highly favored” sentiment prediction, but it is reasonably better than random guess.  For a larger sample of tweet data, the model accuracy could be improved.  With the new JSON Query node, it enables us to perform data mining on JSON data which is the most popular data format produced by prominent social networking sites.

Monday Dec 08, 2014

How to import JSON data to Data Miner for Mining

JSON is a popular lightweight data structure used by Big Data. Increasingly, a lot of data produced by Big Data are in JSON format. For example, web logs generated in the middle tier web servers are likely in JSON format. NoSQL database vendors have chosen JSON as their primary data representation. Moreover, the JSON format is widely used in the RESTful style Web services responses generated by most popular social media websites like Facebook, Twitter, LinkedIn, etc. This JSON data could potentially contain wealth of information that is valuable for business use. So it is important that we can bring this data over to Data Miner for analysis and mining purposes.

Oracle database 12.1.0.2 provides ability to store and query JSON data. To take advantage of the database JSON support, the upcoming Data Miner 4.1 added a new JSON Query node that allows users to query JSON data as relational format. In additional, the current Data Source node and Create Table node are enhanced to allow users to specify JSON data in the input data source.

In this blog, I will show you how to specify a JSON data in the input data source and use JSON Query node to selectively query desirable attributes and project the result in relational format. Once the data is in relational format, users can treat it as a normal relational data source and start analyzing and mining it immediately. The Data Miner repository installation installs a sample JSON dataset ODMR_SALES_JSON_DATA, which I will be using it here. However, Oracle Big Data SQL supports queries against vast amounts of big data stored in multiple data sources, including Hadoop. Users can view and analyze data from various data stores together, as if it were all stored in an Oracle database.

Specify JSON Data

The Data Source node and Create Table nodes are enhanced to allow users to specify the JSON data type in the input data source.

Data Source Node

For this demo, we will focus on the Data Source node. To specify JSON data, create a new workflow with a Data Source node. In the Define Data Source wizard, select the ODMR_SALES_JSON_DATA table. Notice there is only one column (JSON_DATA) in this table, which contains the JSON data.

Click Next to go to the next step where it shows the JSON_DATA is selected with the JSON(CLOB) data type. The JSON prefix indicates the data stored is in JSON format; the CLOB is the original data type. The JSON_DATA column is defined with the new “IS JSON” constraint, which indicates only valid JSON document can be stored there. The UI can detect this constraint and automatically select the column as JSON type. If there was not a “IS JSON” constraint defined, the column would be shown with a CLOB data type. To manually designate a column as a JSON type, click on the data type itself to bring up a in-place dropdown where it lists the original data type (e.g. CLOB) and a corresponding JSON type (e.g. JSON(CLOB)), so just select the JSON type. Note: only the following data types can be set to JSON type: VARCHAR2, CLOB, BLOB, RAW, NCLOB, and NVARCHAR2.

Click Finish and run the node now.

Once the node is run successfully, open the editor to examine the generated JSON schema.

Notice the message “System Generated Data Guide is available” at the bottom of the Selected Attributes listbox. What happens here is when the Data Source node is run, it parsed the JSON documents to produce a schema that represents the document structure. Here is what the schema looks like:

PATH

TYPE

$."CUST_ID"

NUMBER

$."EDUCATION"

STRING

$."OCCUPATION"

STRING

$."HOUSEHOLD_SIZE"

STRING

$."YRS_RESIDENCE"

STRING

$."AFFINITY_CARD"

STRING

$."BULK_PACK_DISKETTES"

STRING

$."FLAT_PANEL_MONITOR"

$."HOME_THEATER_PACKAGE"

$."BOOKKEEPING_APPLICATION"

$."PRINTER_SUPPLIES"

$."Y_BOX_GAMES"

$."OS_DOC_SET_KANJI"

$."COMMENTS"

$."SALES"

$."SALES"."PROD_ID"

$."SALES"."QUANTITY_SOLD"

$."SALES"."AMOUNT_SOLD"

$."SALES"."CHANNEL_ID"

$."SALES"."PROMO_ID"

STRING

STRING

STRING

STRING

STRING

STRING

STRING

ARRAY

NUMBER

NUMBER

NUMBER

NUMBER

NUMBER

The JSON Path expression syntax and associated data type info (OBJECT, ARRAY, NUMBER, STRING, BOOLEAN, NULL) are used to represent JSON document structure. We will refer to this JSON schema as Data Guide throughout the product.

Before we look at the Data Guide in the UI, let’s look at the settings that can affect how it is generated. Click the “JSON Settings…” button to open the JSON Parsing Settings dialog.

The settings are described below:

· Generate Data Guide if necessary

o Generate a Data Guide if it is not already generated in parent node.

· Sampling

o Sample JSON documents for Data Guide generation.

· Max. number of documents

o Specify maximum number of JSON documents to be parsed for Data Guide generation.

· Limit Document Values to Process

o Sample JSON document values for Data Guide generation.

· Max. number per document

o Specify maximum number of JSON document scalar values (e.g. NUMBER, STRING, BOOLEAN, NULL) per document to be parsed for Data Guide generation.

The sampling option is enabled by default to prevent long-running parsing of JSON documents; parsing could take a while for large number of documents. However, users may supply a Data Guide (Import from File) or reuse an existing Data Guide (Import from Workflow) if compatible Data Guide is available.

Now let’s look at the Data Guide, go back to the Edit Data Source Node dialog, select the JSON_DATA column and click the above to open the Edit Data Guide dialog. The dialog shows the JSON structure in a hierarchical tree view with data type information. The “Number of Values Processed” shows the total number of JSON scalar values was parsed to produce the Data Guide.

Users can control whether to enable Data Guide generation or import a compatible Data Guide via the menu under the icon.

The menu options are described below:

· Default

o Use the “Generate Data Guide if necessary” setting found in the JSON Parsing Setting dialog (see above).

· On

o Always generate a Data Guide.

· Off

o Do not generate a Data Guide.

· Import From Workflow

o Import a compatible Data Guide from a workflow node (e.g. Data Source, Create Table). The option will be set to Off after the import (disable Data Guide generation).

· Import From File

o Import a compatible Data Guide from a file. The option will be set to Off after the import (disable Data Guide generation).

Users can also export the current Data Guide to a file via the icon.

Select JSON Data

In Data Miner 4.1, a new JSON Query node is added to allow users to selectively bring over desirable JSON attributes as relational format.

JSON Query Node

The JSON Query node is added to the Transforms group of the Workflow.

Let’s create a JSON Query node and connect the Data Source node to it.

Double click the JSON Query node to open the editor. The editor consists of four tabs, and these tabs are described as followings:

· JSON

The Column dropdown lists all available columns in the data source where JSON structure (Data Guide) is found. It consists of the following two sub tabs:

o Structure

o Show the JSON structure of the selected column in a hierarchical tree view.

o Data

o Show sample of JSON documents found in the selected column. By default it displays first 2,000 characters (including spaces) of the documents. Users can change the sample size (max. 50,000 chars) and run the query to see more of the documents.

· Addition output

o Allow users to select any non-JSON columns in the data source as additional output columns.

· Aggregation

o Allow users to define aggregations of JSON attributes.

· Preview

o Output Columns

o Show columns in the generated relational output.

o Output Data

o Show data in the generated relational output.

JSON Tab

Let’s select some JSON attributes to bring over. Skip the SALES attributes because we want to define aggregations for these attributes (QUANTITY_SOLD and AMOUNT_SOLD).

To peek at the JSON documents, go to the Data tab. You can change the Sample Size to look at more JSON data. Also, you can search for specific data within the displayed documents by using the search control.

Addition Output Tab

If you have any non-JSON columns in the data source that you want to carry over for output, you can select those columns here.

Aggregate Tab

Let’s define aggregations (use SUM function) for QUANTITY_SOLD and AMOUNT_SOLD attributes (within the SALES array) for each customer group (group by CUST_ID).

Click the icon in the top toolbar to open the Edit Group By dialog, where you can select the CUST_ID as the Group-By attribute. Notice the Group-By attribute can consists of multiple attributes.

Click OK to return to the Aggregate tab, where you can see the selected CUST_ID Group-By attribute is now added to the Group By Attributes table at the top.

Click the icon in the bottom toolbar to open the Add Aggregations dialog, where you can define the aggregations for both QUANTITY_SOLD and AMOUNT_SOLD attributes using the SUM function.

Next, click the icon in the toolbar to open the Edit Sub Group By dialog, where you can specify a Sub-Group By attribute (PROD_ID) to calculate quantity sold and amount sold per product per customer.

Specifying a Sub-Group By column creates a nested table; the nested table contains columns with data type DM_NESTED_NUMERICALS.

Click OK to return to the Aggregate tab, where you can see the defined aggregations are now added to the Aggregation table at the bottom.

Preview Tab

Let’s go to the Preview tab to look at the generated relational output. The Output Columns tab shows all output columns and their corresponding source JSON attributes. The output columns can be renamed by using the in-place edit control.

The Output Data tab shows the actual data in the generated relational output.

Click OK to close the editor when you are done. The generated relational output is single-record case format; each row represents a case. If we had not defined the aggregations for the JSON array attributes, the relational output would have been in multiple-record case format. The multiple-record case format is not suitable for building mining models except for Association model (which accepts transactional data format with transaction id and item id).

Use Case

Here is an example of how JSON Query node is used to project the JSON data source to relational format, so that the data can be consumed by Explore Data node for data analysis and Class Build node for building models.

Conclusion

This blog shows how JSON data can be brought over to Data Miner via the new JSON Query node. Once the data is projected to relational format, it can easily be consumed by Data Miner for graphing, data analysis, text processing, transformation, and modeling.

Thursday Nov 20, 2014

ORACLE BI, DW, ANALYTICS, BIG DATA AND SPATIAL USER COMMUNITY - BIWA Summit'15 www.biwasummit.org

Please share with your Oracle BI, DW, Analytics, big Data and Spatial User coMMUNITY.   THANKS.  CB

BIWA Summit’15 Jan 27-29, 2015 Early Bird Registration Ends Friday. 

Registration is now LIVE. Register by November 21st (tomorrow) to receive the early bird pricing of $249 and save $50.

Please direct your colleagues to REGISTER NOW and participate to take advantage of the Early Bird registration ($249.00 USD).  EARLY BIRD SPECIAL ENDS TOMORROW (Friday, Nov. 21).  Here’s some information about the event below and some pics and talks from last year to give some feel for the opportunity.   

BIWA Summits have been organized and managed by the Oracle BI, DW and Analytics SIG user community of IOUG (Independent Oracle User Group) and attract the top Oracle BI, DW and Advanced Analytics and Big Data experts. The 2.5-day BIWA Summit'15 event joins forces with the Oracle Spatial SIG and involves Keynotes by Industry experts, Educational sessions, Hands-on Labs and networking events. We have a great line up so far w/ Tom Kyte Senior Technical Architect in Oracles Server Technology, Doug Cutting (Chief Architect, Cloudera), Oracle BI Senior Management, Neil Mendelson, VP of Product Management Big Data and Advanced Analytics, Matt Bradley, SVP, Oracle Product Development, EPM Applications, other features speakers, and many customers/tech experts (see web site and search % Sessions). Our BIWA Summit offers a broad, multi-track user driven conference that has built up a growing reputation over the years. We emphasize technical content and networking with like minded customers, users, developers, product managers (Database, Big Data Appliance, Oracle Advanced Analytics, Spatial, OBIEE, Endeca, Big Data Discovery, In-Memory, SQL Patterns, etc.), etc. who all share an interest in “novel and interesting use cases” of Oracle BI, DW, Advanced Analytics and Spatial technologies, applications and solutions. We’re off to a great start this year with a great agenda and hope to pack the HQ CC this Jan 27-29, 2015 with 300+ attendees.

Please forward and share with your Oracle BI, DW, Analytics, Big Data and Spatial colleagues.   

Thank you!  Hope to see you at BIWA Summit'15

Charlie

Wednesday Oct 08, 2014

2014 was a very good year for Oracle Advanced Analytics at Oracle Open World 2014

2014 was a very good year for Oracle Advanced Analytics at Oracle Open World 2014.   We had a number of customer, partner and Oracle talks that focused on the Oracle Advanced Analytics Database Option.    See below with links to presentations.  Check back later to OOW Sessions Content Catalog as not all presentations have been uploaded yet.  :-(

Big Data and Predictive Analytics: Fiserv Data Mining Case Study [CON8631]

Moving data mining algorithms to run as native data mining SQL functions eliminates data movement, automates knowledge discovery, and accelerates the transformation of large-scale data to actionable insights from days/weeks to minutes/hours. In this session, Fiserv, a leading global provider of electronic commerce systems for the financial services industry, shares best practices for turning in-database predictive models into actionable policies and illustrates the use of Oracle Data Miner for fraud prevention in online payments. Attendees will learn how businesses that implement predictive analytics in their production processes significantly improve profitability and maximize their ROI.

Developing Relevant Dining Visits with Oracle Advanced Analytics at Olive Garden [CON2898]

Olive Garden, traditionally managing its 830 restaurants nationally, transitioned to a localized approach with the help of predictive analytics. Using k-means clustering and logistic classification algorithms, it divided its stores into five behavioral segments. The analysis leveraged Oracle SQL Developer 4.0 and Oracle R Enterprise 1.3 to evaluate 115 million transactions in just 5 percent the time required by the company’s BI tool. While saving both time and money by making it possible to develop the solution internally, this analysis has informed Olive Garden’s latest remodel campaign and continues to uncover millions in profits by optimizing pricing and menu assortment. This session illustrates how Oracle Advanced Analytics solutions directly affect the bottom line.

A Perfect Storm: Oracle Big Data Science for Enterprise R and SAS Users [CON8331]

With the advent of R and a rich ecosystem of users and developers, a myriad of bloggers, and thousands of packages with functionality ranging from social network analysis and spatial data analysis to empirical finance and phylogenetics, use of R is on a steep uptrend. With new R tools from Oracle, including Oracle R Enterprise, Oracle R Distribution, and Oracle R Advanced Analytics for Hadoop, users can scale and integrate R for their enterprise big data needs. Come to this session to learn about Oracle’s R technologies and what data scientists from smart companies around the world are doing with R.

Extending the Power of In-Database Analytics with Oracle Big Data Appliance [CON2452]

The need for speed could not be greater—not speed of processing but time to market. The problem is driven by the long journey data takes before evolving into insight. Insight, however, is always relative to assumption. In fact, analytics is often seen as a battle between assumption and data. Assumptions can be classified into three types: related to distributions, ratios, and relations. In this session, you will see how the most-valuable business insights can come in the matter of hours, not months, when assumptions are challenged with data. This is made possible by the integration of Oracle Big Data Appliance, enabling transparent access to in-database analytics from the data warehouse and avoiding the traditional long journey of data to insight.

Market Basket Analysis at Dunkin’ Brands [CON6545]

With almost 120 years of franchising experience, Dunkin’ Brands owns two of the world’s most recognized, beloved franchises: Dunkin’ Donuts and Baskin-Robbins. This session describes a market basket analysis solution built from scratch on the Oracle Advanced Analytics platform at Dunkin’ Brands. This solution enables Dunkin’ to look at product affinity and a host of associated sales metrics with a view to improving promotional effectiveness and cross-sell/up-sell to increase customer loyalty. The presentation discusses the business value achieved and technical challenges faced in scaling the solution to Dunkin’ Brands’ transaction volumes, including engineered systems (Oracle Exadata) hardware and parallel processing at the core of the implementation.

Predictive Analytics with Oracle Data Mining [CON8596]

This session presents three case studies related to predictive analytics with the Oracle Data Mining feature of Oracle Advanced Analytics. Service contracts cancellation avoidance with Oracle Data Mining is about predicting the contracts at risk of cancellation at least nine months in advance. Predicting hardware opportunities that have a high likelihood of being won means identifying such opportunities at least four months in advance to provide visibility into suppliers of required materials. Finally, predicting cloud customer churn involves identifying the customers that are not as likely to renew subscriptions as others.

SQL Is the Best Development Language for Big Data [CON7439]

SQL has a long and storied history. From the early 1980s till today, data processing has been dominated by this language. It has changed and evolved greatly over time, gaining features such as analytic windowing functions, model clauses, and row-pattern matching. This session explores what's new in SQL and Oracle Database for exploiting big data. You'll see how to use SQL to efficiently and effectively process data that is not stored directly in Oracle Database.

Advanced Predictive Analytics for Database Developers on Oracle [CON7977]

Traditional database applications use SQL queries to filter, aggregate, and summarize data. This is called descriptive analytics. The next level is predictive analytics, where hidden patterns are discovered to answer questions that give unique insights that cannot be derived with descriptive analytics. Businesses are increasingly using machine learning techniques to perform predictive analytics, which helps them better understand past data, predict future trends, and enable better decision-making. This session discusses how to use machine learning algorithms such as regression, classification, and clustering to solve a few selected business use cases.

What Are They Thinking? With Oracle Application Express and Oracle Data Miner [UGF2861]

Have you ever wanted to add some data science to your Oracle Application Express applications? This session shows you how you can combine predictive analytics from Oracle Data Miner into your Oracle Application Express application to monitor sentiment analysis. Using Oracle Data Miner features, you can build data mining models of your data and apply them to your new data. The presentation uses Twitter feeds from conference events to demonstrate how this data can be fed into your Oracle Application Express application and how you can monitor sentiment with the native SQL and PL/SQL functions of Oracle Data Miner. Oracle Application Express comes with several graphical techniques, and the presentation uses them to create a sentiment dashboard.

Transforming Customer Experience with Big Data and Predictive Analytics [CON8148]

Delivering a high-quality customer experience is essential for long-term profitability and customer retention in the communications industry. Although service providers own a wealth of customer data within their systems, the sheer volume and complexity of the data structures inhibit their ability to extract the full value of the information. To change this situation, service providers are increasingly turning to a new generation of business intelligence tools. This session begins by discussing the key market challenges for business analytics and continues by exploring Oracle’s approach to meeting these challenges, including the use of predictive analytics, big data, and social network analytics.

There are a few others where Oracle Advanced Analytics is included e.g. Retail GBU, Big Data Strategy, etc. but they are typically more broadly focused.  If you search the Content Catalog for “Advanced Analytics” etc. you can find other related presentations that involve OAA.

Hope this helps.  Enjoy!

cb

Sunday Aug 10, 2014

Take a FREE Test Drive of Oracle Data Miner on Amazon Cloud - Offered by Vlamis Software, Oracle Partner

Thanks to a wonderful and extremely convenient and easy to use Amazon Cloud hosting by Vlamis Software, an Oracle Partner, you can now take a FREE Test Drive of Oracle Data Miner in about 10 minutes!  There are 3 simple steps:

Step 1—Fill out request

  •  Select the Oracle Advanced Analytics Test Drive


Step 2—Connect and Launch

  • Launch the Amazon Cloud instance and wait for the assigned IP address.  Vlamis has provided a nice YouTube instructional video that you should watch for instructions.

  • Connect with Remote Desktop


Step 3—Start Test Drive!

The Amazon Cloud that Vlamis has set up includes everything you'll need to try out Oracle Data Miner:

  • Oracle Database EE  11g Release 2
  • Oracle Advanced Analytics Option
  • SQL Developer 4.0/Oracle Data Miner GUI
  • Demo data for learning - this makes it fast and easy to get started.  The demo data covers multiple scenarios for simple graphing, classification, regression, market basket analysis, anomaly detection, text mining and mining star schema 360 degree customer views
  • Follow the Oracle Data Miner Tutorials that are provided.  These Tutorials are also available on the Oracle Technology Network


  • Try it out! 

Many thanks to Oracle Partner, Vlamis Software for this terrific Oracle Data Miner Test Drive on the Amazon Cloud. 

By the way, if interested, Vlamis is an authorized Instructor for the Oracle University 2 Day Instructor Led Course on Oracle Data Mining and provides data mining consulting and implementation assistance services.

Wednesday Aug 06, 2014

New Book: Predictive Analytics Using Oracle Data Miner


Great New Book Now Available:  Predictive Analytics Using Oracle Data Miner, by Brendan Tierney, Oracle ACE Director

If you have an Oracle Database and want to leverage that data to discover new insights, make predictions and generate actionable insights, this book is a must read for you!  In Predictive Analytics Using Oracle Data Miner: Develop & Use Oracle Data Mining Models in Oracle Data Miner, SQL & PL/SQL, Brendan Tierney, Oracle ACE Director and data mining expert, guides the user through the basic concepts of data mining and offers step by step instructions for solving data-driven problems using SQL Developer’s Oracle Data Mining extension.  Brendan takes it full circle by showing the reader how to deploy advanced analytical methodologies and predictive models immediately into enterprise-wide production environments using the in-database SQL and PL/SQL functionality.  

Definitely a must read for any Oracle data professional!

See Predictive Analytics Using Oracle Data Miner, by Brendan Tierney on Amazon.com  



Sunday May 18, 2014

Oracle Data Miner and Oracle R Enterprise Integration - Watch Demo

Oracle Data Miner and Oracle R Enterprise Integration - Watch Demo

Oracle Advanced Analytics (Database EE) Option turns the database into an enterprise-wide analytical platform that can quickly deliver enterprise-wide predictive analytics and actionable insights.  Oracle Advanced Analytics is comprised of both the Oracle Data Mining SQL data mining functions, Oracle Data Miner, an extension to SQL Developer that exposes the data mining SQL functions for data analysts, and Oracle R Enterprise which integrates the R statistical programming language with SQL.  15 powerful in-database SQL data mining functions, the SQL Developer/Oracle Data Miner workflow GUI and the ability to integrate open source R within an analytical methodology, makes the Oracle Database + Oracle Advanced Analytics Option the ideal platform for building and deploying enterprise-wide predictive analytics applications/solutions.  

In Oracle Data Miner 4.0 we added a new SQL Query node to allow users to insert arbitrary SQL scripts within an ODMr analytical workflow. Additionally, the SQL Query node allows users to leverage registered R scripts to extend Oracle Data Miner's analytical capabilities.  For applications that are mostly OAA/Oracle Data Mining SQL data mining functions based but require additional analytical techniques found in the R community, this is an ideal method for integrating the power of in-database SQL analytical and data mining functions with the flexibility of open source R.  For applications that are built entirely using the R statistical programming language, it may be more practical to stay within the R console or RStudio environments, but for SQL-centric in-database predictive methodologies, this integration is just what might satisfy your needs.

Watch this Oracle Data Miner and Oracle R Enteprise Integration YouTube to see the demo. 

There is an excellent related Oracle Data Miner:  Integrate Oracle R Enterprise Algorithms into workflow using the SQL Query node (pdf, companion files) white paper on this topic that includes examples on the Oracle Technology Network in the Oracle Data Mining pages.  

Tuesday May 06, 2014

Oracle Data Miner 4.0/SQLDEV 4.0 New Features - Watch Demo!

Oracle Data Miner 4.0 New Features 

Oracle Data Miner/SQLDEV 4.0 (for Oracle Database 11g and 12c)

  • New Graph node (box, scatter, bar, histograms)
  • SQL Query node + integration of R scripts
  • Automatic SQL script generation for deployment

Oracle Advanced Analytics 12c New SQL data mining algorithms/enhancements features exposed in Oracle Data Miner 4.0

  • Expectation Maximization Clustering algorithm
  • PCA & Singular Vector Decomposition algorithms
  • Decision Trees can also now mine unstructured data
  • Improved/automated Text Mining, Prediction Details and other algorithm improvements
  • SQL Predictive Queries—automatic build, apply within simple yet powerful SQL query


Sunday Apr 27, 2014

Real Time Association Rules Recommendation Engine

This blog shows how you can write a SQL query for Association Rules recommendation; such a query can be used to recommend products (cross-sell) to a customer based on products already placed in his current shopping cart.  Before we can perform the recommendation, we need to build an association rules model that based on previous customer sales transactions. For the demo, I will use the SALES and PRODUCTS tables found in the sample SH schema as input data and build the association model using the free Oracle Data Miner GUI tool.

Association Rules Model Workflow

The SALES table contains time based (TIME_ID) sales transactions of all customers (CUST_ID) product purchases (PROD_ID). The actual product names can be found in the PRODUCTS table, so we join these two tables to get the sales transactions with real product names (instead of looking up the product names using the PROD_ID later).

Enter the following Transaction ids (CUST_ID, TIME_ID) and item id (PROD_NAME) in the Association Rule Build node editor.

Enter the Maximum Rule length of 2 and Minimum Confidence and Support as followings. The lower the Confidence and Support percents will yield more rules; the higher the percents will yield fewer rules. We want the generated rules to have one Antecedent to one Consequent, so we set the Maximum Rule length to 2.

SQL Query for Recommendation

The following SQL query returns the top 3 products recommendation based on products placed in the customer’s current shopping cart.

SELECT rownum AS rank,

  consequent  AS recommendation

FROM

(

  WITH rules AS (

    SELECT AR.rule_id AS "ID",

      ant_pred.attribute_subname antecedent,

      cons_pred.attribute_subname consequent,

      AR.rule_support support,

      AR.rule_confidence confidence

    FROM TABLE(dbms_data_mining.get_association_rules('AR_RECOMMENDATION')) AR,

      TABLE(AR.antecedent) ant_pred,

      TABLE(AR.consequent) cons_pred

  ),

  cust_data AS (

    SELECT 'Comic Book Heroes' AS prod_name FROM DUAL

    UNION

    SELECT 'Martial Arts Champions' AS prod_name FROM DUAL

  )

  SELECT rules.consequent,

    MAX(rules.confidence) max_confidence,

    MAX(rules.support) max_support

  FROM rules, cust_data

  WHERE cust_data.prod_name = rules.antecedent

  AND rules.consequent NOT IN (SELECT prod_name FROM cust_data)

  GROUP BY rules.consequent

  ORDER BY max_confidence DESC, max_support DESC

)

WHERE rownum <=3;


The above SQL query consists of 3 main sections: association rules, current customer data, and product recommendation.

Association Rules

The first section returns the associated rules (antecedent, consequent) and associated confidence and support values discovered by the model (AR_RECOMMENDATION) that was built in the above workflow. You may find the DBMS_DATA_MINING.GET_ASSOCIATION_RULES function reference here.

  WITH rules AS (

    SELECT AR.rule_id AS "ID",

      ant_pred.attribute_subname antecedent,

      cons_pred.attribute_subname consequent,

      AR.rule_support support,

      AR.rule_confidence confidence

    FROM TABLE(dbms_data_mining.get_association_rules('AR_RECOMMENDATION')) AR,

      TABLE(AR.antecedent) ant_pred,

      TABLE(AR.consequent) cons_pred

Current Customer Data

The middle section defines the current customer product selection on the fly (real time). For example, we assume this customer placed the 'Comic Book Heroes' and 'Martial Arts Champions' products in the current shopping cart.

  cust_data AS (

    SELECT 'Comic Book Heroes' AS prod_name FROM DUAL

    UNION

    SELECT 'Martial Arts Champions' AS prod_name FROM DUAL

  )

Product Recommendation

Last but not least is the query to return the recommended products based on the discovered rules and current customer product selection. It is possible that the rules may suggest the same product (consequent) for different customer products (prod_name), so we aggregate the consequents using the MAX function on the confidence and support values. In case of duplicate recommendations, we just use the max confidence and support values for comparison. Moreover, we don’t want the recommended products that are already placed in the customer’s shopping cart, so we add the “NOT IN (SELECTprod_name FROM cust_data)” condition. Finally, the query returns the recommendations in the order of highest confident and support first.

  SELECT rules.consequent,

    MAX(rules.confidence) max_confidence,

    MAX(rules.support) max_support

  FROM rules, cust_data

  WHERE cust_data.prod_name = rules.antecedent

  AND rules.consequent NOT IN (SELECT prod_name FROM cust_data)

  GROUP BY rules.consequent

  ORDER BY max_confidence DESC, max_support DESC

The recommendation query returns the following recommendations for the 'Comic Book Heroes' and 'Martial Arts Champions' products.

RANK   RECOMMENDATION

---------- --------------------------------

         1   Xtend Memory

         2   Endurance Racing

         3   Adventures with Numbers

Alternative SQL Query for Recommendation

The first recommendation query may not be scalable; it returns all possible rules to be processed by the recommendation sub query. The more scalable approach is to push as much processing to the GET_ASSOCIATION_RULES function as possible, so that it returns minimal set of rules for further processing. Here we specify the topn=10, min_confidence=0.1, min_support=0.01, sort_order='RULE_CONFIDENCE DESC', 'RULE_SUPPORT DESC', and the antecedent items to the function, and let it finds the top 10 set of rules that satisfy these criteria. Once we obtain the refined rule set, we filter out recommendations that already in the customer’s shopping cart and also perform aggregation (use MAX() function) on the confidence and support values. Finally, we query the top 3 recommendations based on the order of highest confident and support first.

SELECT rownum AS rank,

  consequent  AS recommendation

FROM

  (SELECT cons_pred.attribute_subname consequent,

    MAX(AR.rule_support) max_support,

    MAX(AR.rule_confidence) max_confidence

  FROM TABLE (DBMS_DATA_MINING.GET_ASSOCIATION_RULES ( 'AR_RECOMMENDATION', 10, NULL, 0.1, 0.01, 2, 1, 

                 ORA_MINING_VARCHAR2_NT ( 'RULE_CONFIDENCE DESC', 'RULE_SUPPORT DESC'), 

                 DM_ITEMS(DM_ITEM('PROD_NAME', 'Comic Book Heroes', NULL, NULL), 

                          DM_ITEM('PROD_NAME', 'Martial Arts Champions', NULL, NULL)), NULL, 1)) AR, TABLE(AR.consequent) cons_pred

  WHERE cons_pred.attribute_subname NOT IN ('Comic Book Heroes', 'Martial Arts Champions')

  GROUP BY cons_pred.attribute_subname

  ORDER BY max_confidence DESC, max_support DESC

  )

WHERE rownum <=3;


Note: another consideration is to order the rules by the lift value; the higher the lift value the more accurate the recommendation.

SQL Query for Recommendation using Customer Previous Sales Transactions
I am going to extend the above recommendation SQL query to include the customer previous sales transactions, so that the recommendation is now based on the previous purchased products and the products in the current shopping cart. Moreover, we don’t want any recommended products that have been purchased previously or already placed in the current shopping cart. For this example, we use a window of 12 months since the last customer purchase as the past sales history used for recommendation.

To include the customer sales history (assume cust_id = 3), a hist_cust_data sub query is added to obtain the previously purchased products. A tot_cust_data sub query is added to include both the products in the current shopping cart and the previously purchased products. The following query returns top 3 recommendations based on customer previously purchased products in the last 12 months and the products in the current shopping cart.

SELECT rownum AS rank, consequent AS recommendation

FROM

(

  WITH rules AS (

    SELECT AR.rule_id AS "ID",

      ant_pred.attribute_subname antecedent,

      cons_pred.attribute_subname consequent,

      AR.rule_support support,

      AR.rule_confidence confidence,

      AR.rule_lift lift

    FROM TABLE(dbms_data_mining.get_association_rules('AR_RECOMMENDATION')) AR,

         TABLE(AR.antecedent) ant_pred,

         TABLE(AR.consequent) cons_pred

  ),

  cur_cust_data AS (

    SELECT 'Comic Book Heroes' AS PROD_NAME FROM DUAL

    UNION

    SELECT 'Martial Arts Champions' AS PROD_NAME FROM DUAL

  ),

  hist_cust_data AS(

    SELECT

      DISTINCT PROD_NAME

    FROM sh.sales s, sh.products p

    WHERE cust_id = 3

      AND s.prod_id = p.prod_id

      -- customer historical purchase for last 12 months

      AND time_id  >= add_months((SELECT MAX(time_id) FROM sh.sales WHERE cust_id = 3), -12) 

  ),

  tot_cust_data AS (

    SELECT PROD_NAME FROM cur_cust_data

    UNION

    SELECT PROD_NAME FROM hist_cust_data

  )

  SELECT rules.consequent,

    SUM(rules.lift) lift_sum,

    SUM(rules.confidence) confidence_sum,

    SUM(rules.support) support_sum

  FROM rules, tot_cust_data

  WHERE tot_cust_data.prod_name = rules.antecedent

    -- don't recommend products that customer already owned or about to purchase  

    AND rules.consequent NOT IN (SELECT prod_name FROM tot_cust_data) 

  GROUP BY rules.consequent

  ORDER BY lift_sum DESC, confidence_sum DESC, support_sum DESC

)

WHERE rownum <= 3;

Conclusion

This blog shows a few examples of how you can write a recommendation SQL query with different flavors (with or without historical sales transactions). You may also consider assign a profit for each product, so that you may come up with a query that returns the top most profitable product recommendations.

Tuesday Mar 18, 2014

Deploy Data Miner Apply Node SQL as RESTful Web Service for Real-Time Scoring

The free Oracle Data Miner GUI is an extension to Oracle SQL Developer that enables data analysts to work directly with data inside the database, explore the data graphically, build and evaluate multiple data mining models, apply Oracle Data Mining models to new data and deploy Oracle Data Mining's predictions and insights throughout the enterprise. The product enables a complete workflow deployment to a production system via generated PL/SQL scripts (See Generate a PL/SQL script for workflow deployment). This time I want to focus on the model scoring side, especially the single record real-time scoring. Would it be nice if the scoring function can be accessed by different systems on different platforms? How about deploying the scoring function as a Web Service? This way any system that can send HTTP request can invoke the scoring Web Service, and consume the returning result as they see fit. For example, you can have a mobile app that collects customer data, and then invokes the scoring Web Service to determine how likely the customer is going to buy a life insurance. This blog shows a complete demo from building predictive models to deploying a scoring function as a Web Service. However, the demo does not take into account of any authentication and security consideration related to Web Services, which is out of the scope of this blog.

Web Services Requirement

This demo uses the Web Services feature provided by the Oracle APEX 4.2 and Oracle REST Data Services 2.0.6 (formerly Oracle APEX Listener). Here are the installation instructions for both products:

For 11g Database

Go to the Oracle Application Express Installation Guide and following the instructions below:

1.5.1 Scenario 1: Downloading from OTN and Configuring the Oracle Application Express Listener

· Step 1: Install the Oracle Database and Complete Pre-installation Tasks

· Step 2: Download and Install Oracle Application Express

· Step 3: Change the Password for the ADMIN Account

· Step 4: Configure RESTful Services

· Step 5: Restart Processes

· Step 6: Configure APEX_PUBLIC_USER Account

· Step 7: Download and Install Oracle Application Express Listener

· Step 8: Enable Network Services in Oracle Database 11g

· Step 9: Security Considerations

· Step 10: About Developing Oracle Application Express in Other Languages

· Step 11: About Managing JOB_QUEUE_PROCESSES

· Step 12: Create a Workspace and Add Oracle Application Express Users


For 12c Database

Go to Oracle Application Express Installation Guide (Release 4.2 for Oracle Database 12c) and following the instructions below:

4.4 Installing from the Database and Configuring the Oracle Application Express Listener

· Install the Oracle Database and Complete Preinstallation Tasks

· Download and Install Oracle Application Express Listener

· Configure RESTful Services

· Enable Network Services in Oracle Database 12c

· Security Considerations

· About Running Oracle Application Express in Other Languages

· About Managing JOB_QUEUE_PROCESSES

· Create a Workspace and Add Oracle Application Express Users


Note: The APEX is pre-installed with the Oracle database 12c, but you need to configure it in order to use it.

For this demo, create a Workspace called DATAMINER that is based on an existing user account that has already been granted access to the Data Miner (this blog assumes DMUSER is the Data Miner user account). Please refer to the Oracle By Example Tutorials to review how to create a Data Miner user account and install the Data Miner Repository. In addition, you need to create an APEX user account (for simplicity I use DMUSER).

Build Models to Predict BUY_INSURANCE

This demo uses the demo data set, INSUR_CUST_LTV_SAMPLE, that comes with the Data Miner installation. Now, let’s use the Classification Build node to build some models using the CUSTOMER_ID as the case id and BUY_INSURANCE as the target.

Evaluate the Models

Nice thing about the Build node is that it builds a set of models with different algorithms within the same mining function by default, so we can select the best model to use. Let’s look at the models in the Test Viewer; here we can compare the models by looking at their Predictive Confidence, Overall Accuracy, and Average Accuracy values. Basically, the model with the highest values across these three metrics is the good one to use. As you can see, the winner here is the CLAS_DT_3_6 decision tree model.

Next, let’s see what input data columns are used as predictors for the decision tree model. You can find that information in the Model Viewer below. Surprisingly, it only uses a few columns for the prediction. These columns will be our input data requirement for the scoring function, the rest of the input columns can be ignored.


Score the Model

Let’s complete the workflow with an Apply node, from which we will generate the scoring SQL statement to be used for the Web Service. Here we reuse the INSUR_CUST_LTV_SAMPLE data as input data to the Apply node, and select only the required columns as found in the previous step. Also, in the Class Build node we deselect the other models as output in the Property Inspector (Models tab), so that only decision tree model will be used for the Apply node. The generated scoring SQL statement will use only the decision tree model to score against the limited set of input columns.

Generate SQL Statement for Scoring

After the workflow is run successfully, we can generate the scoring SQL statement via the “Save SQL” context menu off the Apply node as shown below.

Here is the generated SQL statement:

/* SQL Deployed by Oracle SQL Developer 4.1.0.14.78 from Node "Apply", Workflow "workflow score", Project "project", Connection "conn_12c" on Mar 16, 2014 */
ALTER SESSION set "_optimizer_reuse_cost_annotations"=false;
ALTER SESSION set NLS_NUMERIC_CHARACTERS=".,";
--ALTER SESSION FOR OPTIMIZER
WITH
/* Start of sql for node: INSUR_CUST_LTV_SAMPLE APPLY */
"N$10013" as (select /*+ inline */ "INSUR_CUST_LTV_SAMPLE"."BANK_FUNDS",
"INSUR_CUST_LTV_SAMPLE"."CHECKING_AMOUNT",
"INSUR_CUST_LTV_SAMPLE"."CREDIT_BALANCE",
"INSUR_CUST_LTV_SAMPLE"."N_TRANS_ATM",
"INSUR_CUST_LTV_SAMPLE"."T_AMOUNT_AUTOM_PAYMENTS"
from "DMUSER"."INSUR_CUST_LTV_SAMPLE" )
/* End of sql for node: INSUR_CUST_LTV_SAMPLE APPLY */
,
/* Start of sql for node: Apply */
"N$10011" as (SELECT /*+ inline */
PREDICTION("DMUSER"."CLAS_DT_3_6" COST MODEL USING *) "CLAS_DT_3_6_PRED",
PREDICTION_PROBABILITY("DMUSER"."CLAS_DT_3_6", PREDICTION("DMUSER"."CLAS_DT_3_6" COST MODEL USING *) USING *) "CLAS_DT_3_6_PROB",
PREDICTION_COST("DMUSER"."CLAS_DT_3_6" COST MODEL USING *) "CLAS_DT_3_6_PCST"
FROM "N$10013" )
/* End of sql for node: Apply */
select * from "N$10011";

We need to modify the first SELECT SQL statement to change the data source from a database table to a record that can be constructed on the fly, which is crucial for real-time scoring. The bind variables (e.g. :funds) are used; these variables will be replaced with actual data (passed in by the Web Service request) when the SQL statement is executed.

/* SQL Deployed by Oracle SQL Developer 4.1.0.14.78 from Node "Apply", Workflow "workflow score", Project "project", Connection "conn_12c" on Mar 16, 2014 */
WITH
/* Start of sql for node: INSUR_CUST_LTV_SAMPLE APPLY */
"N$10013" as (select /*+ inline */
:funds "BANK_FUNDS",
:checking "CHECKING_AMOUNT",
:credit "CREDIT_BALANCE",
:atm "N_TRANS_ATM",
:payments "T_AMOUNT_AUTOM_PAYMENTS"
from DUAL
)
/* End of sql for node: INSUR_CUST_LTV_SAMPLE APPLY */
,
/* Start of sql for node: Apply */
"N$10011" as (SELECT /*+ inline */
PREDICTION("DMUSER"."CLAS_DT_3_6" COST MODEL USING *) "CLAS_DT_3_6_PRED",
PREDICTION_PROBABILITY("DMUSER"."CLAS_DT_3_6", PREDICTION("DMUSER"."CLAS_DT_3_6" COST MODEL USING *) USING *) "CLAS_DT_3_6_PROB",
PREDICTION_COST("DMUSER"."CLAS_DT_3_6" COST MODEL USING *) "CLAS_DT_3_6_PCST"
FROM "N$10013" )
/* End of sql for node: Apply */
select * from "N$10011";

Create Scoring Web Service

Assume the Oracle APEX and Oracle REST Data Services have been properly installed and configured; we can proceed to create a RESTful web service for real-time scoring. The followings describe the steps to create the Web Service in APEX:

1. APEX Login

You can bring up the APEX login screen by pointing your browser to http://<host>:<port>/ords. Enter your Workspace name and account info to login. The Workspace should be based on the Data Miner DMUSER account for this demo to work.

2. Select SQL Workshop

Select the SQL Workshop icon to proceed.

3. Select RESTful Services

Select the RESTful Services to create the Web Service.

Click the “Create” button to continue.

4. Define Restful Services

Enter the following information to define the scoring Web Service in the RESTful Services Module form:

Name: buyinsurance

URI Prefix: score/

Status: Published

URI Template: buyinsurance?funds={funds}&checking={checking}&credit={credit}&atm={atm}&payments={payments}

Method: GET

Source Type: Query Format: CSV

Source:

/* SQL Deployed by Oracle SQL Developer 4.1.0.14.78 from Node "Apply", Workflow "workflow score", Project "project", Connection "conn_11204" on Mar 16, 2014 */
WITH
/* Start of sql for node: INSUR_CUST_LTV_SAMPLE APPLY */
"N$10013" as (select /*+ inline */
:funds "BANK_FUNDS",
:checking "CHECKING_AMOUNT",
:credit "CREDIT_BALANCE",
:atm "N_TRANS_ATM",
:payments "T_AMOUNT_AUTOM_PAYMENTS"
from DUAL
)
/* End of sql for node: INSUR_CUST_LTV_SAMPLE APPLY */
,
/* Start of sql for node: Apply */
"N$10011" as (SELECT /*+ inline */
PREDICTION("DMUSER"."CLAS_DT_3_6" COST MODEL USING *) "CLAS_DT_3_6_PRED",
PREDICTION_PROBABILITY("DMUSER"."CLAS_DT_3_6", PREDICTION("DMUSER"."CLAS_DT_3_6" COST MODEL USING *) USING *) "CLAS_DT_3_6_PROB",
PREDICTION_COST("DMUSER"."CLAS_DT_3_6" COST MODEL USING *) "CLAS_DT_3_6_PCST"
FROM "N$10013" )
/* End of sql for node: Apply */
select * from "N$10011";

Note: JSON output format is supported.

Lastly, create the following parameters that are used to pass the data from the Web Service request (URI) to the bind variables used in the scoring SQL statement.

The final RESTful Services Module definition should look like the following. Make sure the “Requires Secure Access” is set to “No” (HTTPS secure request is not addressed in this demo).

Test the Scoring Web Service

Let’s create a simple web page using your favorite HTML editor (I use JDeveloper to create this web page). The page includes a form that is used to collect customer data, and then fires off the Web Service request upon submission to get a prediction and associated probability.

Here is the HTML source of the above Form:

<!DOCTYPE html>

<html>

<head>

<meta http-equiv="Content-Type" content="text/html; charset=UTF-8"/>

<title>score</title>

</head>

<body>

<h2>

Determine if Customer will Buy Insurance

</h2>

<form action="http://localhost:8080/ords/dataminer/score/buyinsurance" method="get">

<table>

<tr>

<td>Bank Funds:</td>

<td><input type="text" name="funds"/></td>

</tr>

<tr>

<td>Checking Amount:</td>

<td><input type="text" name="checking"/></td>

</tr>

<tr>

<td>Credit Balance:</td>

<td><input type="text" name="credit"/></td>

</tr>

<tr>

<td>Number ATM Transactions:</td>

<td><input type="text" name="atm"/></td>

</tr>

<tr>

<td>Amount Auto Payments:</td>

<td><input type="text" name="payments"/></td>

</tr>

<tr>

<td colspan="2" align="right">

<input type="submit" value="Score"/>

</td>

</tr>

</table>

</form>

</body>
</html>

When the Score button is pressed, the form sends a GET HTTP request to the web server with the collected form data as name-value parameters encoded in the URL.

checking=%7bchecking%7d&credit=%7bcredit%7d&atm=%7batm%7d&payments=%7bpayments%7d">http://localhost:8080/ords/dataminer/score/buyinsurance?funds={funds}&checking={checking}&credit={credit}&atm={atm}&payments={payments}

Notice the {funds}, {checking}, {credit}, {atm}, {payments} will be replaced with actual data from the form. This URI matches the URI Template specified in the RESTful Services Module form above.

Let’s test out the scoring Web Service by entering some values in the form and hit the Score button to see the prediction.

The prediction along with its probability and cost is returned as shown below. Unfortunately, this customer is less likely to buy insurance.

Let’s change some values and see if we have any luck.

Bingo! This customer is more likely to buy insurance.

Conclusion

This blog shows how to deploy Data Miner generated scoring SQL as Web Service, which can be consumed by different systems on different platforms from anywhere. In theory, any SQL statement generated from the Data Miner node could potentially be made as Web Services. For example, we can have a Web Service that returns Model Details info, and this info can be consumed by some BI tool for application integration purpose.

Wednesday Feb 26, 2014

How to generate training and test dataset using SQL Query node in Data Miner

Overview

In Data Miner, the Classification and Regression Build nodes include a process that splits the input dataset into training and test dataset internally, which are then used by the model build and test processes within the nodes. This internal data split feature alleviates user from performing external data split, and then tie the split dataset into a build and test process separately as found in other competitive products. However, there are times user may want to perform an external data split. For example, user may want to generate a single training and test dataset, and reuse them in multiple workflows. The generation of training and test dataset can be done easily via the SQL Query node.

Stratified Split

The stratified split is used internally by the Classification Build node, because this technique can preserve the categorical target distribution in the resulting training and test dataset, which is important for the classification model build. The following shows the SQL statements that are essentially used by the Classification Build node to produce the training and test dataset internally:

SQL statement for Training dataset

SELECT

v1.*

FROM

(

-- randomly divide members of the population into subgroups based on target classes

SELECT a.*,

row_number() OVER (partition by {target column} ORDER BY ORA_HASH({case id column})) "_partition_caseid"

FROM {input data} a

) v1,

(

-- get the count of subgroups based on target classes

SELECT {target column},

COUNT(*) "_partition_target_cnt"

FROM {input data} GROUP BY {target column}

) v2

WHERE v1. {target column} = v2. {target column}

-- random sample subgroups based on target classes in respect to the sample size

AND ORA_HASH(v1."_partition_caseid", v2."_partition_target_cnt"-1, 0) <= (v2."_partition_target_cnt" * {percent of training dataset} / 100)


SQL statement for Test dataset

SELECT

v1.*

FROM

(

-- randomly divide members of the population into subgroups based on target classes

SELECT a.*,

row_number() OVER (partition by {target column} ORDER BY ORA_HASH({case id column})) "_partition_caseid"

FROM {input data} a

) v1,

(

-- get the count of subgroups based on target classes

SELECT {target column},

COUNT(*) "_partition_target_cnt"

FROM {input data} GROUP BY {target column}

) v2

WHERE v1. {target column} = v2. {target column}

-- random sample subgroups based on target classes in respect to the sample size

AND ORA_HASH(v1."_partition_caseid", v2."_partition_target_cnt"-1, 0) > (v2."_partition_target_cnt" * {percent of training dataset} / 100)

The followings describe the placeholders used in the SQL statements:

{target column} - target column. It must be categorical type.

{case id column} - case id column. It must contain unique numbers that identify the rows.

{input data} - input data set.

{percent of training dataset} - percent of training dataset. For example, if you want to split 60% of input dataset into training dataset, use the value 60. The test dataset will contain 100%-60% = 40% of the input dataset. The training and test dataset are mutually exclusive.

Random Split

The random split is used internally by the Regression Build node because the target is usually numerical type. The following shows the SQL statements that are essentially used by the Regression Build node to produce the training and test dataset:

SQL statement for Training dataset

SELECT

v1.*

FROM

{input data} v1

WHERE ORA_HASH({case id column}, 99, 0) <= {percent of training dataset}

SQL statement for Test dataset

SELECT

    v1.*

FROM

{input data} v1

WHERE ORA_HASH({case id column}, 99, 0) > {percent of training dataset}

The followings describe the placeholders used in the SQL statements:

{case id column} - case id column. It must contain unique numbers that identify the rows.

{input data} - input data set.

{percent of training dataset} - percent of training dataset. For example, if you want to split 60% of input dataset into training dataset, use the value 60. The test dataset will contain 100%-60% = 40% of the input dataset. The training and test dataset are mutually exclusive.

Use SQL Query node to create training and test dataset

Assume you want to create the training and test dataset out of the demo INSUR_CUST_LTV_SAMPLE dataset using the stratified split technique; you can create the following workflow to utilize the SQL Query nodes to execute the above split SQL statements to generate the dataset, and then use the Create Table nodes to persist the resulting dataset.

Assume the case id is CUSTOMER_ID, target is BUY_INSURANCE, and the training dataset is 60% of the input dataset. You can enter the following SQL statement to create the training dataset in the “SQL Query Stratified Training” SQL Query node:

SELECT

v1.*

FROM

(

-- randomly divide members of the population into subgroups based on target classes

SELECT a.*,

row_number() OVER (partition by "BUY_INSURANCE" ORDER BY ORA_HASH("CUSTOMER_ID")) "_partition_caseid"

FROM "INSUR_CUST_LTV_SAMPLE_N$10009" a

) v1,

(

-- get the count of subgroups based on target classes

SELECT "BUY_INSURANCE",

COUNT(*) "_partition_target_cnt"

FROM "INSUR_CUST_LTV_SAMPLE_N$10009" GROUP BY "BUY_INSURANCE"

) v2

WHERE v1."BUY_INSURANCE" = v2."BUY_INSURANCE"

-- random sample subgroups based on target classes in respect to the sample size

AND ORA_HASH(v1."_partition_caseid", v2."_partition_target_cnt"-1, 0) <= (v2."_partition_target_cnt" * 60 / 100)



Likewise, you can enter the following SQL statement to create the test dataset in the “SQL Query Stratified Test” SQL Query node:

SELECT

v1.*

FROM

(

-- randomly divide members of the population into subgroups based on target classes

SELECT a.*,

row_number() OVER (partition by "BUY_INSURANCE" ORDER BY ORA_HASH("CUSTOMER_ID")) "_partition_caseid"

FROM "INSUR_CUST_LTV_SAMPLE_N$10009" a

) v1,

(

-- get the count of subgroups based on target classes

SELECT "BUY_INSURANCE",

COUNT(*) "_partition_target_cnt"

FROM "INSUR_CUST_LTV_SAMPLE_N$10009" GROUP BY "BUY_INSURANCE"

) v2

WHERE v1."BUY_INSURANCE" = v2."BUY_INSURANCE"

-- random sample subgroups based on target classes in respect to the sample size

AND ORA_HASH(v1."_partition_caseid", v2."_partition_target_cnt"-1, 0) > (v2."_partition_target_cnt" * 60 / 100)

Now run the workflow to create the training and test dataset. You can find the table names of the persisted dataset in the associated Create Table nodes.


Conclusion

This blog shows how easily to create the training and test dataset using the stratified split SQL statements via the SQL Query nodes. Similarly, you can generate the training and test dataset using the random split technique by replacing SQL statements with the random split SQL statements in the SQL Query nodes in the above workflow. If a large dataset (tens of millions of rows) is used in multiple model build nodes, it may be a good idea to split the data ahead of time to optimize the overall processing time (avoid multiple internal data splits inside the model build nodes).

Friday Feb 14, 2014

dunnhumby Accelerates Complex Segmentation Queries from Weeks to Minutes—Gains Competitive Advantage

See original story on http://www.oracle.com/us/corporate/customers/customersearch/dunnhumby-1-exadata-ss-2137635.html

dunnhumby Accelerates Complex Segmentation Queries from Weeks to Minutes—Gains Competitive Advantage

dunnhumby is the world’s leading customer-science company. It analyzes customer data and applies insights from more than 400 million customers across the globe to create better customer experiences and build loyalty. With its unique analytical capabilities, dunnhumby helps retailers better serve customers, create a competitive advantage, and enjoy sustained growth.


Challenges

A word from dunnhumby Ltd.

  • “Oracle Exadata Database Machine is helping us to transform our business and improve our competitive edge. We can now complete queries that took weeks in just minutes—driving new product offerings, more competitive bids, and more accurate analyses based on 100% of data instead of just a sampling.” – Chris Wones, Director of Data Solutions, dunnhumby USA

  • Expand breadth of services to maintain a competitive advantage in the customer-science industry
  • Provide clients, including major retail organizations in the United Kingdom and North America, with expanded historical and real-time insight into customer behavior, buying tendencies, and response to promotional campaigns and product offerings
  • Ensure competitive pricing for the company’s customer-analysis services while delivering added value to a growing client base
  • Analyze growing volumes of data rapidly and comprehensively
  • Ensure the security of sensitive information, including protected personal information to reduce risk and support compliance
  • Protect against data loss and reduce the backup and recovery window, as data is crucial to the competitive advantage and success of the business
  • Optimize IT investment and performance across the technology-intensive business
  • Reduce licensing and maintenance costs of previous analytical and data warehouse software

Solutions

  • Deployed Oracle Exadata Database Machine and accelerated queries that previously took two-to-three weeks to just minutes, enabling the company to bid on more complex, custom analyses and gain a competitive advantage
  • Achieved 4x to 30x more data compression using Hybrid Columnar Compression and Oracle Advanced Compression across sets—reducing storage requirements, increasing analysis and backup performance, and optimizing IT investment
  • Consolidated data marts securely with data warehouse schemas in Oracle Exadata, enabling extremely faster presummarizations of large volumes of data
  • Accelerated analytic capabilities to near real time using Oracle Advanced Analytics and third-party tools, enabling analysis of unstructured big data from emerging sources, like smart phones
  • Accelerated segmentation and customer-loyalty analysis from one week to just four hours—enabling the company to deliver more timely information as well as finer-grained analysis
  • Improved analysts’ productivity and focus as they can now run queries and complete analysis without having to wait hours or days for a query to process
  • Generated more accurate business insights and marketing recommendations with the ability to analyze 100% of data—including years of historical data—instead of just a small sample
  • Improved accuracy of marketing recommendations by analyzing larger sample sizes and predicting the market’s reception to new product ideas and strategies
  • Improved secure processing and management of 60 TB of data, growing at a rate of 500 million customer records a week, including information from clients’ customer loyalty programs 
  • Ensured data security and compliance with requirements for safeguarding protected personal information and reduced risk with Oracle Advanced Security, Oracle Directory Services Plus, and Oracle Enterprise Single Sign-On Suite Plus
  • Gained high-performance identity virtualization, storage, and synchronization services that meet the needs of the company’s high-volume environment
  • Ensured performance scalability even with concurrent queries with Oracle Exadata, which demonstrated higher throughput than competing solutions under such conditions
  • Deployed integrated backup and recovery using Oracle’s Sun ZFS Backup Appliance—to support high performance and continuous availability and act as a staging area for both inbound and outbound extract, transform, and load processes

Why Oracle

dunnhumby considered Teradata, IBM Netezza, and other solutions, and selected Oracle Exadata for its ability to sustain high performance and throughput even during concurrent queries. “We needed to see how the system performed when scaled to multiple concurrent queries, and Oracle Exadata’s throughput was much higher than competitive offerings,” said Chris Wones, director of data solutions, dunnhumby, USA.

Implementation Process

dunnhumby began its Oracle Exadata implementation in September 2012 and went live in April 2013. It has installed four Oracle Exadata machine units in the United States and four in the United Kingdom. The company is using three of the four machines in each country as production environments and one machine in each country for development and testing. dunnhumby runs an active-active environment across its Oracle Exadata clusters to ensure high availability.

Monday Feb 03, 2014

How to generate Scatterplot Matrices using R script in Data Miner

Data Miner provides Explorer node that produces descriptive statistical data and histogram graph, which allows analyst to analyze input data columns individually. Often time an analyst is interested in analyzing the relationships among the data columns, so that he can choose the columns that are closely correlated to the target column for model build purpose. To examine relationships among data columns, he can create scatter plots using the Graph node.

For example, an analyst may want to build a regression model that predicts the customer LTV (long term value) using the INSUR_CUST_LTV_SAMPLE demo data. Before building the model, he can create the following workflow with the Graph node to examine the relationships between interested data columns and the LTV target column.

In the Graph node editor, create a scatter plot with an interested data column (X Axis) against the LTV target column (Y Axis). For the demo, let’s create three scatter plots using these data columns: HOUSE_OWNERSHIP, N_MORTGAGES, and MORTGAGE_AMOUNT.

Here are the scatter plots generated by the Graph node. As you can see the HOUSE_OWNERSHIP and N_MORTGAGES are quite positively correlated to the LTV target column. However, the MORTGAGE_AMOUNT seems less correlated to the LTV target column.

The problem with the above approach is it is laborious to create scatter plots one by one and you cannot examine relationships among those data columns themselves. To solve the problem, we can create a Scatterplot matrix graph as the following:

This is a 4 x4 scatterplot matrix of data column LTV, HOUSE_OWNERSHIP, N_MORTGAGES, and MORTGAGE_AMOUNT. In the top row, you can examine the relationships between HOUSE_OWNERSHIP, N_MORTGAGES, and MORTGAGE_AMOUNT against the LTV target column. In the second row, you can examine the relationships between LTV, N_MORTGAGES, and MORTGAGE_AMOUNT against the HOUSE_OWNERSHIP column. In the third and forth rows, you can examine the relationships of other columns against the N_MORTGAGES, and MORTGAGE_AMOUNT respectively.

To generate this scatterplot matrix, we need to invoke the readily available R script RQG$pairs (via the SQL Query node) in the Oracle R Enterprise. Please refer to http://www.oracle.com/technetwork/database/options/advanced-analytics/r-enterprise/index.html?ssSourceSiteId=ocomen for Oracle R Enterprise installation.

Let’s create the following workflow with the SQL Query node to invoke the R script. Note: a Sample node may be needed to sample down the data size (e.g. 1000 rows) for large data set before it is used for charting.

Enter the following SQL statement in the SQL Query editor. The rqTableEval is a R SQL function that allows user to invoke R script from the SQL side. The first SELECT statement within the function specifies the input data (LTV, HOUSE_OWNERSHIP, N_MORTGAGES, and MORTGAGE_AMOUNT). The second SELECT statement specifies the optional parameter to the R script, where we define the graph title “Scatterplot Matrices”. The output of the function is an XML document with the graph data embedded in it.

SELECT VALUE FROM TABLE
(
rqTableEval(
cursor(select "INSUR_CUST_LTV_SAMPLE_N$10001"."LTV",
"INSUR_CUST_LTV_SAMPLE_N$10001"."HOUSE_OWNERSHIP",
"INSUR_CUST_LTV_SAMPLE_N$10001"."N_MORTGAGES",
"INSUR_CUST_LTV_SAMPLE_N$10001"."MORTGAGE_AMOUNT"
from "INSUR_CUST_LTV_SAMPLE_N$10001"), -- Input Cursor
cursor(select 'Scatterplot Matrices' as MAIN from DUAL), -- Param Cursor
'XML', -- Output Definition
'RQG$pairs' -- R Script
)
)

You can see what default R scripts are available in the R Scripts tab. This tab is visible only when the Oracle R Enterprise installation is detected.

Click the button in the toolbar to invoke the R script to produce the Scatterplot matrix below.

You can copy the Scatterplot matrix image to a clipboard or save it to an image file (PNG) for reporting purpose. To do so, right click on the graph to bring up the pop-up menu below.

The Scatterplot matrix is also available in the Data Viewer of the SQL Query node. To open the Data Viewer, select the “View Data” item in the pop-up menu of the node.

The returning XML data is shown in the Data Viewer as shown below. To view the Scatterplot matrix embedded in the data, click on the XML data to bring up the icon in the far right of the cell, and then click on the icon to bring up the viewer.

Tuesday Jan 14, 2014

How to export data from the Explore Node using Data Miner and SQL Developer

Blog posting by Denny Wong, Principal Member of Technical Staff, User Interfaces and Components, Oracle Data Mining Development

The Explorer node generates descriptive statistical data and histogram data for all input table columns.  These statistical and histogram data may help user to analyze the input data to determine if any action (e.g. transformation) is needed before using it for data mining purpose.  An analyst may want to export this data to a file for offline analysis (e.g. Excel) or reporting purpose.  The Explorer node generates this data to a database table specified in the Output tab of the Property Inspector.  In this case, the data is generated to a table named “OUTPUT_1_2”.


To export the table to a file, we can use the SQL Developer Export wizard. Go to the Connections tab in the Navigator Window, search for the table “OUTPUT_1_2” within the proper connection, then bring up the pop-up menu off the table. Click on the Export menu to launch the Export Wizard.


In the wizard, uncheck the “Export DDL” and select the “Export Data” option since we are only interested in the data itself. In the Format option, select “excel” in this example (a dozen of output formats are supported) and specify the output file name. Upon wizard finish, an excel file is generated.


Let’s open the file to examine what is in it. As expected, it contains all statistical data for all input columns. The histogram data is listed as the last column (HISTOGRAMS), and it has this ODMRSYS.ODMR_HISTOGRAMS structure.


For example, let’s take a closer look at the histogram data for the BUY_INSURANCE column:

ODMRSYS.ODMR_HISTOGRAMS(ODMRSYS.ODMR_HISTOGRAM_POINT('"BUY_INSURANCE"',''No'',NULL,NULL,73.1),ODMRSYS.ODMR_HISTOGRAM_POINT('"BUY_INSURANCE"',''Yes'',NULL,NULL,26.9))

This column contains an ODMRSYS.ODMR_HISTOGRAMS object which is an array of ODMRSYS.ODMR_HISTOGRAM_POINT structure. We can describe the structure to see what is in it.


The ODMRSYS.ODMR_HISTOGRAM_POINT contains five attributes, which represent the histogram data. The ATTRIBUTE_NAME contains the attribute name (e.g. BUY_INSURANCE), the ATTRIBUTE_VALUE contains the attribute values (e.g. No, Yes), the GROUPING_ATTRIBUTE_NAME and GROUPING_ ATTRIBUTE_VALUE are not used (these fields are used when the Group By option is specified), and the ATTRIBUTE_PERCENT contains the percents (e.g. 73.1, 26.9) for the attribute values respectively.


As you can see the ODMRSYS.ODMR_HISTOGRAMS complex output format may be difficult to read and it may require some processing before the data can be used. Alternatively, we can “unnest” the histogram data to transactional data format before exporting it. This way we don’t have to deal with the complex array structure, thus the data is more consumable. To do that, we can write a simple SQL query to “unnest” the data and use the new SQL Query node (Extract histogram data) to run this query (see below). We then use a Create Table node (Explorer output table) to persist the “unnested” histogram data along with the statistical data.

1. Create a SQL Query node

Create a SQL Query node and connect the “Explore Data” node to it. You may rename the SQL Query node to “Extract histogram data” to make it clear it is used to “unnest” the histogram data.

2. Specify a SQL query to “unnest” histogram data

Double click the “Extract histogram data” node to bring up the editor, enter the following SELECT statement in the editor:

SELECT
    "Explore Data_N$10002"."ATTR",
    "Explore Data_N$10002"."AVG",
    "Explore Data_N$10002"."DATA_TYPE",
    "Explore Data_N$10002"."DISTINCT_CNT",
    "Explore Data_N$10002"."DISTINCT_PERCENT",
    "Explore Data_N$10002"."MAX",
    "Explore Data_N$10002"."MEDIAN_VAL",
    "Explore Data_N$10002"."MIN",
    "Explore Data_N$10002"."MODE_VALUE",
    "Explore Data_N$10002"."NULL_PERCENT",
    "Explore Data_N$10002"."STD",
    "Explore Data_N$10002"."VAR",
    h.ATTRIBUTE_VALUE,
    h.ATTRIBUTE_PERCENT
FROM
    "Explore Data_N$10002", TABLE("Explore Data_N$10002"."HISTOGRAMS") h

Click OK to close the editor. This query is used to extract out the ATTRIBUTE_VALUE and ATTRIBUTE_PERCENT fields from the ODMRSYS.ODMR_HISTOGRAMS nested object.

Note: you may select only columns that contain the statistics you are interested in.  The "Explore Data_N$10002" is a generated unique name reference to the Explorer node, you may have a slightly different name ending with some other unique number. 

The query produces the following output.  The last two columns are the histogram data in transactional format.

3. Create a Create Table node to persist the “unnested” histogram data

Create a Create Table node and connect the “Extract histogram data” node to it. You may rename the Create Table node to “Explorer output table” to make it clear it is used to persist the “unnested” histogram data.


4. Export “unnested” histogram data to Excel file

Run the “Explorer output table” node to persist the “unnested” histogram data to a table. The name of the output table (OUTPUT_3_4) can be found in the Property Inspector below.


Next, we can use the SQL Developer Export wizard as described above to export the table to an Excel file. As you can see the histogram data are now in transactional format; they are more readable and can readily be consumed.


Tuesday Dec 31, 2013

Oracle BIWA Summit 2014 January 14-16, 2014 at Oracle HQ in Redwood Shores, CA


Oracle Business Intelligence, Warehousing & Analytics Summit - Redwood City

Oracle is a proud sponsor of the Business Intelligence, Warehousing & Analytics (BIWA) Summit happening January 14 – 16 at the Oracle Conference Center in Redwood City. The Oracle BIWA Summit brings together Oracle ACE experts, customers who are currently using or planning to use Oracle BI, Warehousing and Analytics products and technologies, partners and Oracle Product Managers, Support Personnel and Development Managers. Join us on Tuesday, January 14 at 5 p.m. to hear featured speaker Balaji Yelamanchili, Senior Vice President Analytics and Performance Management Products, for his keynote: Oracle Business Intelligence -- Innovate Faster. Visit the BIWA site http://www.biwasummit.com/ for more information today.

 Among the approximately 50 technical presentations, featured talks a Hands on Labs, I'll be delivering a presentation on Oracle Advanced Analytics and a Hands on Lab on using the OAA/Oracle Data Miner GUI.  

 AA-1010 BEST PRACTICES FOR IN-DATABASE ANALYTICS

Session ID: AA-1010

Presenter: Charlie Berger, Oracle

Abstract:

In the era of Big Data, enterprises are acquiring increasing volumes and varieties of data from a rapidly growing range of internet, mobile, sensor and other real-time and near real-time sources.  The driving force behind this trend toward Big Data analysis is the ability to use this data for “actionable intelligence” -- to predict patterns and behaviors and to deliver essential information when and where it is needed. Oracle Database uniquely offers a powerful platform to perform this predictive analytics and location analysis with in-database data mining, statistical processing and SQL Analytics.  Oracle Advanced Analytics embeds powerful data mining algorithms and adds enterprise scale open source R to solve problems such as predicting customer behavior, anticipating churn, detecting fraud, market basket analysis and discovering customer segments.  Oracle Data Miner GUI , a new SQL Developer 4.0 Extension, enables business analysts to quickly analyze data and visualize data, build, evaluate and apply predictive models and deploy via SQL scripts sophisticated predictive analytics methodologies—all while keeping the data inside the Oracle Database.  Come learn best practices and customer examples for exploiting Oracle’s scalable, performant and secure in-database analytics capabilities to extract more value and actionable intelligence from your data.

HOL-AA-1008 LEARN TO USE ORACLE ADVANCED ANALYTICS FOR PREDICTIVE ANALYTICS SOLUTIONS

Session ID: HOL-AA-1008

Presenter: Charles Berger, Oracle & Karl Rexer, Rexer Analytics

Abstract:

Big Data;  Bigger Insights!  Oracle Data Mining Release 12c, a component of the Oracle Advanced Analytics database Option, embeds powerful data mining algorithms in the SQL kernel of the Oracle Database for problems such as predicting customer behavior, anticipating churn, identifying up-sell and cross-sell, detecting anomalies and potential fraud, market basket analysis, customer profiling, text mining and retail market basket analysis.  Oracle Data Miner GUI , a new SQL Developer 4.0 Extension, enables business analysts to quickly analyze data and visualize data, build, evaluate and apply predictive models and develop sophisticated predictive analytics methodologies—all while keeping the data inside Oracle Database.  Come see how easily you can discover big insights from your Oracle data and generate SQL scripts for deployment and automation and deploying results into Oracle Business Intelligence (OBIEE) dashboards. 

<script type="text/javascript"> var _gaq = _gaq || []; _gaq.push(['_setAccount', 'UA-46756583-1']); _gaq.push(['_trackPageview']); (function() { var ga = document.createElement('script'); ga.type = 'text/javascript'; ga.async = true; ga.src = ('https:' == document.location.protocol ? 'https://ssl' : 'http://www') + '.google-analytics.com/ga.js'; var s = document.getElementsByTagName('script')[0]; s.parentNode.insertBefore(ga, s); })(); </script>

Monday Dec 09, 2013

Come See and Test Drive Oracle Advanced Analytics at the BIWA Summit'14, Jan 14-16, 2014

The BIWA Summit '14 January 14-16 at Oracle HQ Conference Center Detailed Agenda is now published.   

Please share with your others by Tweeting, Blogging, Facebook, LinkedIn, Email, etc.!

The BIWA Summit is known for novel and interesting use cases of Oracle Big Data, Exadata, Advanced Analytics/Data Mining, OBIEE, Spatial, Endeca and more!    Opportunities to get hands on experience with products in the Hands on Labs, great customer case studies and talks by Oracle Technical Professionals and Partners.  Meet with technical experts.  Click HERE to read detailed abstracts and speaker profiles. 

Use the SPECIAL DISCOUNT code ORACLE12C and registration is only $199 for the 2.5 day technically focused Oracle user group event.

Charlie  (Oracle Employee Advisor to Oracle BIWA Special Interest User Group)

----


<script type="text/javascript"> var _gaq = _gaq || []; _gaq.push(['_setAccount', 'UA-46756583-1']); _gaq.push(['_trackPageview']); (function() { var ga = document.createElement('script'); ga.type = 'text/javascript'; ga.async = true; ga.src = ('https:' == document.location.protocol ? 'https://ssl' : 'http://www') + '.google-analytics.com/ga.js'; var s = document.getElementsByTagName('script')[0]; s.parentNode.insertBefore(ga, s); })(); </script>

Tuesday Nov 12, 2013

Oracle Big Data Learning Library

Click on LEARN BY PRODUCT to view all learning resources.

Oracle Big Data Essentials

Attend this Oracle University Course!

Using Oracle NoSQL Database

Attend this Oracle University class!

Oracle and Big Data on OTN

See the latest resource on OTN.

<script type="text/javascript"> var _gaq = _gaq || []; _gaq.push(['_setAccount', 'UA-46756583-1']); _gaq.push(['_trackPageview']); (function() { var ga = document.createElement('script'); ga.type = 'text/javascript'; ga.async = true; ga.src = ('https:' == document.location.protocol ? 'https://ssl' : 'http://www') + '.google-analytics.com/ga.js'; var s = document.getElementsByTagName('script')[0]; s.parentNode.insertBefore(ga, s); })(); </script>

Wednesday Sep 04, 2013

Oracle Data Miner (Extension of SQL Developer 4.0) Integrate Oracle R Enterprise Mining Algorithms into workflow using the SQL Query node

I posted a new white paper authored by Denny Wong, Principal Member of Technical Staff, User Interfaces and Components, Oracle Data Mining Technologies.  You can access the white paper here and the companion files here.  Here is an excerpt:

Oracle Data Miner (Extension of SQL Developer 4.0) 

Integrate Oracle R Enterprise Mining Algorithms into workflow using the SQL Query node

Oracle R Enterprise (ORE), a component of the Oracle Advanced Analytics Option, makes the open source R statistical programming language and environment ready for the enterprise and big data. Designed for problems involving large amounts of data, Oracle R Enterprise integrates R with the Oracle Database. R users can develop, refine and deploy R scripts that leverage the parallelism and scalability of the database to perform predictive analytics and data analysis.

Oracle Data Miner (ODMr) offers a comprehensive set of in-database algorithms for performing a variety of mining tasks, such as classification, regression, anomaly detection, feature extraction, clustering, and market basket analysis. One of the important capabilities of the new SQL Query node in Data Miner 4.0 is a simplified interface for integrating R scripts registered with the database. This provides the support necessary for R Developers to provide useful mining scripts for use by data analysts. This synergy provides many additional benefits as noted below.

· R developers can further extend ODMr mining capabilities by incorporating the extensive R mining algorithms from the open source CRAN packages or leveraging any user developed custom R algorithms via SQL interfaces provided by ORE.

· Since this SQL Query node can be part of a workflow process, R scripts can leverage functionalities provided by other workflow nodes which can simplify the overall effort of integrating R capabilities within the database.

· R mining capabilities can be included in the workflow deployment scripts produced by the new sql script generation feature. So the ability of deploy R functionality within the context of an Data Miner workflow is easily accomplished.

· Data and processing are secured and controlled by the Oracle Database. This alleviates a lot of risk that are incurred by other providers, when users have to export data out of the database in order to perform advanced analytics.

Oracle Advanced Analytics saves analysts, developers, database administrators and management the headache of trying to integrate R and database analytics. Instead, users can quickly gain the benefit of new R analytics and spend their time and effort on developing business solutions instead of building homegrown analytical platforms.

This paper should be very useful to R developers wishing to better understand how to leverage imbedding R Scripts for use by Data Analysts.  Analysts will also find the paper useful to see how R features can be surfaced for their use in Data Miner. The specific use case covered demonstrates how to use the SQL Query node to integrate R glm and rpart regression model build, test, and score operations into the workflow along with nodes that perform data preparation and residual plot graphing. However, the integration process described here can easily be adapted to integrate other R operations like statistical data analysis and advanced graphing to expand ODMr functionalities.

<script type="text/javascript"> var _gaq = _gaq || []; _gaq.push(['_setAccount', 'UA-46756583-1']); _gaq.push(['_trackPageview']); (function() { var ga = document.createElement('script'); ga.type = 'text/javascript'; ga.async = true; ga.src = ('https:' == document.location.protocol ? 'https://ssl' : 'http://www') + '.google-analytics.com/ga.js'; var s = document.getElementsByTagName('script')[0]; s.parentNode.insertBefore(ga, s); })(); </script>

Monday Jul 15, 2013

Oracle Data Miner GUI, part of SQL Developer 4.0 Early Adopter 1 is now available for download on OTN

The NEW Oracle Data Miner GUI, part of SQL Developer 4.0 Early Adopter 1 is now available for download on OTN.  See link to SQL Developer 4.0 EA1.   


The Oracle Data Miner 4.0 New Features are applicable to Oracle Database 11g Release 2 and Oracle Database Release 12c:  See Oracle Data Miner Extension to SQL Developer 4.0 Release Notes for EA1 for additional information  

· Workflow SQL Script Deployment

o Generates SQL scripts to support full deployment of workflow contents

· SQL Query Node

o Integrate SQL queries to transform data or provide a new data source

o Supports the running of R Language Scripts and viewing of R generated data and graphics


· Graph Node

o Generate Line, Scatter, Bar, Histogram and Box Plots



· Model Build Node Improvements

o Node level data usage specification applied to underlying models

o Node level text specifications to govern text transformations

o Displays heuristic rules responsible for excluding predictor columns

o Ability to control the amount of Classification and Regression test results generated

· View Data

o Ability to drill in to view custom objects and nested tables

These new Oracle Data Miner GUI capabilities expose Oracle Database 12c and Oracle Advanced Analytics/Data Mining Release 1 features:

· Predictive Query Nodes

o Predictive results without the need to build models using Analytical Queries

o Refined predictions based on data partitions

· Clustering Node New Algorithm

o Added Expectation Maximization algorithm

· Feature Extraction Node New Algorithms

o Added Singular Value Decomposition and Principal Component Analysis algorithms

· Text Mining Enhancements

o Text transformations integrated as part of Model's Automatic Data Preparation

o Ability to import Build Text node specifications into a Model Build node

· Prediction Result Explanations

o Scoring details that explain predictive result

· Generalized Linear Model New Algorithm Settings

o New algorithm settings provide feature selection and generation

See OAA on OTN pages http://www.oracle.com/technetwork/database/options/advanced-analytics/index.html for more information on Oracle Advanced Analytics.

<script type="text/javascript"> var _gaq = _gaq || []; _gaq.push(['_setAccount', 'UA-46756583-1']); _gaq.push(['_trackPageview']); (function() { var ga = document.createElement('script'); ga.type = 'text/javascript'; ga.async = true; ga.src = ('https:' == document.location.protocol ? 'https://ssl' : 'http://www') + '.google-analytics.com/ga.js'; var s = document.getElementsByTagName('script')[0]; s.parentNode.insertBefore(ga, s); })(); </script>

Wednesday May 08, 2013

Oracle Advanced Analytics and Data Mining at the Movies on YouTube - Updated August 3, 2015

Updated August 3, 2015

Periodically, I've recorded a demonstration and/or presentation on Oracle Advanced Analytics and Data Mining and have posted them on YouTube.

Here are links to some of more recent YouTube postings--sort of an Oracle Advanced Analytics and Data Mining at the Movies experience.

  1. New Big Data Analyics using Oracle Advanced Analytics12c and Big Data SQL  - Watch on YouTube
  2. New - Oracle Academy Webcast:  Ask the Oracle Experts Big Data Analytics with Oracle Advanced Analytics - Watch YouTube
  3. Oracle Data Miner and Oracle R Enterprise Integration via SQL Query node - Watch Demo
  4. Oracle Data Miner 4.0 (SQL Developer 4.0 Extension) New Features - Watch Demo
  5. Oracle Business Intelligence Enterprise Edition (OBIEE) SampleAppls Demo featuring integration with Oracle Advanced Analytics/Data Mining
  6. Oracle Big Data Analytics Demo mining remote sensor data from HVACs for better customer service 
  7. In-Database Data Mining for Retail Market Basket Analysis Using Oracle Advanced Analytics
  8. In-Database Data Mining Using Oracle Advanced Analytics for Classification using Insurance Use Case
  9. Fraud and Anomaly Detection using Oracle Advanced Analytics Part 1 Concepts
  10. Fraud and Anomaly Detection using Oracle Advanced Analytics Part 2 Demo
  11. Overview Presentation and Demonstration of Oracle Advanced Analytics Database Option

So.... grab your popcorn and a comfortable chair.  Hope you enjoy!

Charlie 

Oracle Advanced Analytics at the Movies

<script type="text/freezescript"> var _gaq = _gaq || []; _gaq.push(['_setAccount', 'UA-46756583-1']); _gaq.push(['_trackPageview']); (function() { var ga = document.createElement('script'); ga.type = 'text/javascript'; ga.async = true; ga.src = ('https:' == document.location.protocol ? 'https://ssl' : 'http://www') + '.google-analytics.com/ga.js'; var s = document.getElementsByTagName('script')[0]; s.parentNode.insertBefore(ga, s); })(); </script>

Thursday Mar 21, 2013

Recorded Webcast: Best Practices using Oracle Advanced Analytics with Oracle Exadata

Best Practices using Oracle Advanced Analytics with Oracle Exadata

 On Demand
Launch Presentation


Join us to learn how Oracle Advanced Analytics extends the Oracle database into a comprehensive advanced analytics platform through two major components, Oracle R Enterprise and Oracle Data Mining. Using these tools with Oracle Exadata Database Machine will allow organizations to perform at their peak and find real business value within their data.

You need to visit this Oracle Exadata Webcast Main page first and submit your registration information.  Then you’ll receive an email so you can view the Webcast.  This is external so you can share with anyone can download the presentation as well.  FYI.  Charlie

<script type="text/javascript"> var _gaq = _gaq || []; _gaq.push(['_setAccount', 'UA-46756583-1']); _gaq.push(['_trackPageview']); (function() { var ga = document.createElement('script'); ga.type = 'text/javascript'; ga.async = true; ga.src = ('https:' == document.location.protocol ? 'https://ssl' : 'http://www') + '.google-analytics.com/ga.js'; var s = document.getElementsByTagName('script')[0]; s.parentNode.insertBefore(ga, s); })(); </script>

Wednesday Mar 13, 2013

Oracle OpenWorld Call for Proposals now OPEN Submit your Oracle Advanced Analytics/Data Mining/ORE talks today!!

Calling All Oracle OpenWorld Oracle Advanced Analytics, Data Mining and R Experts


The Call for Proposals is open. Have something interesting to present to the world’s largest gathering of Oracle technologists and business leaders? Making breakthrough innovations with Java or MySQL? We want to hear from you, and so do the attendees at this year’s Oracle OpenWorld, JavaOne, and MySQL Connect conferences. Submit your proposal now for a chance to share your expertise at one of the most important technology and business conferences of the year.

CHOOSE...

Select one of Oracle’s premiere conferences

SHARE...

Submit your proposal for sharing your most innovative ideas and experiences

JOIN...

Connect with the elite community of Oracle OpenWorld, JavaOne, and MySQL Connect session leaders in 2013

We recommend you take the time to review the General Information, Content Program Policies, and Tips and Guidelines pages before you begin. We look forward to your submissions!


Submit Papers

Please submit your papers by clicking on the link below and then select the event for which you are submitting.

Submit Now!

General Information

Conferences location: San Francisco, California, USA


Dates

  • Oracle OpenWorld: Sunday, September 22, 2013–Thursday, September 26, 2013
  • JavaOne: Sunday, September 22, 2013–Thursday, September 26, 2013
  • MySQL Connect: Saturday, September 21–Monday, September 23, 2013

Key 2013 deadlines

Deliverables

Due Dates

Call for Proposals–Open

Wednesday, March 13

Call for Proposals–Closed

Friday, April 12, 11:59 p.m. PDT

Notifications for accepted and declined submissions sent

Mid-June

For Oracle OpenWorld, Oracle employee submitters will need to contact the appropriate Oracle track leads before submitting. To view a list of track leads, click here

Contact us:

Friday Feb 22, 2013

Take a FREE Test Drive with Oracle Advanced Analytics/Data Mining on the Amazon Cloud

I wanted to highlight a wonderful new resource provided by our partner Vlamis Software.  Extremely easy!  Fill out the form, wait a few minutes for the Amazon Cloud instance to start up and them BAM!  You can login and start using the Oracle Advanced Analytics Oracle Data Miner work flow GUI.  Demo data and online Oracle by Example Learning Tutorials are also provided to ensure your data mining test drive is a positive one,  Enjoy!! 

Test Drive -- Powered by Amazon AWS

We have partnered with Amazon Web Services to provide to you, free of charge, the opportunity to work, hands-on, with the latest of Oracle's Business Intelligence offerings. By signing up to one of the labs below, Amazon's Elastic Cloud Computer (EC2) environment will generate a complete server for you to work with.

These hands on labs are working with the actual Oracle software running on the Amazon Web Services EC2 environment. They each take approximately 2 hours to work through and will give you hands-on experience with the software and a tour of the features. Your EC2 environment will be available for you for 5 hours, at which time it will self-terminate. If, after registration, you need additional time or need further instructions, simply reply to the registration email and we would be glad to help you.

Data Mining

This test drive walks through some basic exercises in doing predictive analytics within an Oracle 11g Database instance using the Oracle Data Miner extension for Oracle SQL Developer. You use a drag-and-drop "workflow" interface to build a data mining model that predicts the likelihood of purchase for a set of prospects. Oracle Data Mining is ideal for automatically finding patterns, understanding relationships, and making predictions in large data sets.

<script type="text/javascript"> var _gaq = _gaq || []; _gaq.push(['_setAccount', 'UA-46756583-1']); _gaq.push(['_trackPageview']); (function() { var ga = document.createElement('script'); ga.type = 'text/javascript'; ga.async = true; ga.src = ('https:' == document.location.protocol ? 'https://ssl' : 'http://www') + '.google-analytics.com/ga.js'; var s = document.getElementsByTagName('script')[0]; s.parentNode.insertBefore(ga, s); })(); </script>

Monday Jan 28, 2013

BIWA Summit 2013 Presentations Now Available for Viewing

If you missed the BIWA Summit 2013, you can still look through the presentations from the event. 

Go to the Schedule at http://www.biwasummit.com/schedule  and download the presentations using the links for each session.  You can forward this to customers, prospects and others within Oracle.  All is external.

The Oracle BIWA Summit, organized by the leading Oracle Special Interest Group (SIG) for Business Intelligence, Data Warehousing and Analytics professionals, was be held on Jan 9,10 2013, at The Oracle HQ Sofitel Hotel, in Redwood City, CA. The Oracle BIWA Summit brings together Oracle ACE experts, customers who are currently using or planning to use Oracle BI, Warehousing and Analytics products and technologies, partners and Oracle Product Managers, Support Personnel and Development Managers. Everything and everyone that you will need to be successful in your Oracle “BIWA” implementations was at the Oracle BIWA Summit, Jan 9-10, 2013.

The next BIWA Summit will be at the HQ Conference Center, Jan 14-16, 2014.  Mark your calendars.

About

Everything about Oracle Data Mining, a component of the Oracle Advanced Analytics Option - News, Technical Information, Opinions, Tips & Tricks. All in One Place

Search

Categories
Archives
« August 2015
SunMonTueWedThuFriSat
      
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
     
Today