X

Recent Posts

BI Publisher on Oracle Analytics Cloud

Introducing Oracle Machine Learning SQL Notebooks for the Oracle Autonomous Data Warehouse Cloud!

Apache Zeppelin based Machine Learning SQL Notebook for Data Scientists to Collaborate in the Oracle Autonomous Data Warehouse Cloud Overview Oracle Machine Learning is a new SQL notebook interface for data scientists to perform machine learning in the Oracle Autonomous Data Warehouse Cloud (ADWC).  Notebook technologies support the creation of scripts while supporting the documentation of assumptions, approaches and rationale to increase data science team productivity.  Oracle Machine Learning SQL notebooks, based on Apache Zeppelin technology, enable teams to collaborate to build, evaluate and deploy predictive models and analytical methodologies in the Oracle ADWC. Multi-user collaboration enables the same notebook to be opened simultaneously by different users.  Changes made by one user are immediately updated for other team members.  Oracle Machine Learning SQL notebooks provide easy access to Oracle's parallelized, scalable in-database implementations of a library of Oracle Advanced Analytics' machine learning algorithms (classification, regression, anomaly detection, clustering, associations, attribute importance, feature extraction, times series, etc.), SQL, PL/SQL and Oracle's statistical and analytical SQL functions.  Oracle Machine Learning SQL notebooks and Oracle Advanced Analytics' library of machine learning SQL functions combined with PL/SQL allow companies to automate their discovery of new insights, generate predictions and add "AI" to data viz dashboards and enterprise applications. To support enterprise requirements for security, authentication, and auditing, Oracle Machine Learning SQL notebooks adhere to all Oracle standards and supports privilege-based access to data, models, and notebooks. Presentation:  Oracle Machine Learning SQL Notebook - Included in Oracle Autonomous Data Warehouse Cloud Key Features Collaborative SQL notebook UI for data scientists Packaged with Autonomous Data Warehouse Cloud  Easy access to shared notebooks, templates, permissions, scheduler, etc Access to 30+ parallel, scalable in-database implementations of machine learning algorithms SQL and PL/SQL scripting language supported Enables and Supports Deployments of Enterprise Machine Learning Methodologies in ADWC Screen Shots    Disclaimer:  Product details/functionality subject to change. Oracle Machine Learning enables data science teams to collaboratively build machine learning methodologies in the Oracle ADWC.   OML notebooks provide easy access to data managed in the Oracle ADWC for quick analysis, simple visualizations and building machine learning solutions.   Oracle Machine Learning SQL notebook starting page.   Create a new Oracle Machine Learning notebook.   Create simple visualizations of data managed in Oracle ADWC.   Easily perform quick statistical analyses using Oracle's in-database SQL statistical functions.   Build, evaluate and deploy machine learning methodologies in the Oracle ADWC using Oracle Advanced Analytics' parallelized, in-database implementations of machine learning algorithms.    

Apache Zeppelin based Machine Learning SQL Notebook for Data Scientists to Collaborate in the Oracle Autonomous Data Warehouse Cloud Overview Oracle Machine Learning is a new SQL notebook interface for...

A Simple Guide to Oracle’s Machine Learning and Advanced Analytics

Many times I'm asked for more information on how to get started with Oracle’s Machine Learning and Advanced Analytics. I put together this simple guide of the most popular and useful, in my opinion, links to product Information and getting started links and resources including: Oracle Machine Learning Zeppelin based SQL notebooks, included in the Oracle Autonomous Data Warehouse Cloud (ADWC) Oracle Advanced Analytics Database Option (OAA), included in Oracle Database Cloud High and Extreme Editions Oracle Data Mining (SQL API Machine Learning functions) Oracle Data Miner "workflow" UI (for Citizen Data Scientists) SQL Developer extension Oracle R Enterprise (R API to ODM SQL ML functions, R to SQL "push down" and R integration) Oracle R Advanced Analytics for Hadoop (ORAAH) (part of the Big Data Connectors)  OOW'17 Oracle's Machine Learning & Advanced Analytics Presentations      Oracle Advanced Analytics Overview Information  Oracle's Machine Learning and Advanced Analytics 12.2c and Oracle Data Miner 4.2 New Features presentation Oracle Advanced Analytics Public Customer References Oracle’s Machine Learning and Advanced Analytics Data Management Platforms:  Move the Algorithms;  Not the Data white paper on OTN Oracle INTERNAL ONLY  OAA Product Management Wiki and Beehive Workspace  (contains latest presentations, demos, product, etc. information)   YouTube recorded Oracle Advanced Analytics Presentations and Demos, White Papers  Oracle's Machine Learning & Advanced Analytics 12.2 & Oracle Data Miner 17.2 New Features YouTube video Library of YouTube Movies on Oracle Advanced Analytics, Data Mining, Machine Learning (7+ “live” Demos e.g.  Oracle Data Miner 4.0 New Features, Retail, Fraud, Loyalty, Overview, etc.) Overview YouTube video of Oracle’s Advanced Analytics and Machine Learning   Getting Started/Training/Tutorials Link to OAA/Oracle Data Miner Workflow GUI Online (free) Tutorial Series on OTN Link to OAA/Oracle R Enterprise (free) Tutorial Series on OTN Link to Try the Oracle Cloud Now!   Link to Getting Started w/ ODM blog entry Link to New OAA/Oracle Data Mining 2-Day Instructor Led Oracle University course.  Oracle Data Mining Sample Code Examples ORAAH Online Training      Additional Resources, Documentation & OTN Discussion Forums Oracle Advanced Analytics Option on OTN page OAA/Oracle Data Mining on OTN page, ODM Documentation & ODM Blog OAA/Oracle R Enterprise page on OTN page, ORE Documentation & ORE Blog Oracle SQL based Basic Statistical functions on OTN Oracle R Advanced Analytics for Hadoop (ORAAH) on OTN   Analytics and Data Summit 2019 User Conference All Analytics. All Data. No Nonsense.  Users, experts and Oracle professionals share "novel and interesting" BI, advanced analytics, machine learning, spatial, graph, IoT, Cloud, data viz, "predictive'" applications, etc. use cases for community learning, education and advancement. March 12 - 14, 2019, Redwood Shores, CA   Hope this helps! Charlie Charlie Berger | Sr. Director Product Management, Machine Learning, AI and Cognitive Analytics, Oracle Corporation Phone: +7817440324 | Mobile: +6033204560 | 10 Van de Graaff Drive | Burlington, MA 01803 LinkedIn:  www.linkedin.com/in/CharlieDataMine Oracle Machine Learning and Advanced Analytics on OTN Oracle Big Data Blog, Oracle Data Mining Blog, Twitter:  CharlieDataMine Oracle Advanced Analytics internal PM Beehive Workspace Analytics and Data Summit 2018  All Analytics.  All Data.  No Nonsense.  User Conference, Mar 20-22, 2018 - Join us!

Many times I'm asked for more information on how to get started with Oracle’s Machine Learning and Advanced Analytics. I put together this simple guide of the most popular and useful, in my opinion,...

Data Mining (ODM)

OOW'17 Oracle's Machine Learning & Advanced Analytics Presentations

There were a number of great presentations  on Oracle's Machine Learning and Advanced Analytics at Oracle Open World 2017.  Here are links to many of them.   I'll update this list posting as I collect more and can post the great presentations and resources. Hope you enjoy! Case Study:  Oracle’s Advanced Analytics at UK National Health Service The Naked Future: What Happens in a World that Anticipates Your Every Move?  Operationalizing Machine Learning into “Predictive” Enterprise Applications Predictive HCM Using Machine Learning Data Management Platforms  Transformational Machine Learning Use Cases You Can Deploy Now Siebel CRM CAB Advanced Analytics at EnergyAustralia Is SQL the Best Language for Statistics and Machine Learning? SQL: One Language to Rule All Your Data Ireland's An Post: Customer Analytics Using Oracle Analytics Cloud Extending Garanti Bank’s Data Management Platform with Oracle Big Data SQL General Session: Data and Analytics Power Your Success Big Data and Machine Learning and the Cloud, Oh My! Big Data Best Practices: A Workflow from Object Store to Analytics in the Cloud Larry Ellison's Sunday Night Keynote:  Oracle Cloud:  Industry's Broadest, Most Integrated Thomas Kurian's Keynote:  Oracle's Integrated Cloud Platform Demo Pod:  Oracle’s Machine Learning for the Cloud, Databases and Big Data using SQL, R and Notebooks Charlie Berger | Sr. Director Product Management, Machine Learning, AI and Cognitive Analytics Charlie.berger@oracle.com Oracle Machine Learning and Advanced Analytics on OTN Oracle Big Data Blog, Oracle Data Mining Blog, Twitter:  CharlieDataMine Analytics and Data Summit 2018  All Analytics.  All Data.  No Nonsense.  User Conference, Mar 20-22, 2018 - Join us!

There were a number of great presentations  on Oracle's Machine Learning and Advanced Analytics at Oracle Open World 2017.  Here are links to many of them.   I'll update this list posting as I collect...

Data Mining (ODM)

Evaluating Oracle Data Mining Has Never Been Easier - Evaluation "Kit" Available - Updated for Oracle Database 12.2c & SQLDEV 17.2

It's easy to evaluate Oracle Advanced Analytics!  There are multiple possibilities.  Get onto an Oracle Cloud.  Depending on your configuration, you may also want to connect to the Cloud using the Oracle Data Miner "workflow" UI.  To get Oracle Data Miner, you'll need SQL Developer and then will need to configure SQL Developer for Oracle Data Miner.  To get started, you'll need to enter a credit card but you'll get $300 of Cloud credits.      Download and Install an evaluation copy of the Oracle Database 12.2  This may actually be the easiest, simplest way to get started as you can download and install the Oracle Database in about an hour these days and you can evaluate it for free!  Oracle has a honor system so when you are done "evaluating" and are actually using the Oracle software for you job, you are supposed to see your Rep and pay for it.     Download SQL Developer.  The latest release is now called 17.2 but was previously 4.2.  SQL Developer is the popular IDE for working with the Oracle Database.  Oracle Data Miner "workflow" UI is packaged as an extension so it ships with SQL Developer, but you still have to configure it for data mining/data scientist users.   Follow the Tutorials.   There are several free tutorials that show users how to get started.  They are excellent learning vehicles for both machine learning/data mining and how to use Oracle's product functionality.  Several use cases are presented.  You MUST start with the first tutorial to Setting Up SQL Developer for Oracle Data Miner. Follow the Other Tutorials.   The Using Oracle Data Miner is the first, easiest and best tutorial for getting started.  It's fast, easy and fun!  Once you follow the first tutorials you are must better prepared to begin using Oracle Advanced Analytics and Oracle Data Miner on your own data.   By the way, each Oracle Database ships with example data sets for multiple purposes including machine learning and data mining.    The Data Mining Sample Data The data used by the sample data mining programs is based on these tables in the SH schema: SH.CUSTOMERS SH.SALES SH.PRODUCTS SH.SUPPLEMENTARY_DEMOGRAPHICS SH.COUNTRIES The dmshgrants script grants SELECT access to the tables in SH. The dmsh.sql script creates views of the SH tables in the schema of the data mining user. Good luck! Charlie

It's easy to evaluate Oracle Advanced Analytics!  There are multiple possibilities.  Get onto an Oracle Cloud.  Depending on your configuration, you may also want to connect to the Cloud using the...

Oracle BIWA Summit'18 User Community Meeting - Call for Speakers is now Live!

   BIWA Summit 2018 The Big Data + Cloud + Machine Learning + Spatial + Graph + Analytics + IoT Oracle User Conference featuring Oracle Spatial and Graph Summit March 20 - 22, 2018 Oracle Conference Center at Oracle Headquarters Campus, Redwood Shores, CA Share your successes… We want to hear your story. Submit your proposal today for Oracle BIWA Summit 2018, featuring Oracle Spatial and Graph Summit, March 20 - 22, 2018 and share your successes with Oracle technology. The call for speakers is now open through December 3, 2017.  Submit now for possible early acceptance and publication in Oracle BIWA Summit 2018 promotion materials.  Click HERE  to submit your abstract(s) for Oracle BIWA Summit 2018. Oracle Spatial and Graph Summit will be held in partnership with BIWA Summit.  BIWA Summits are organized and managed by the Oracle Business Intelligence, Data Warehousing and Analytics (BIWA) User Community and the Oracle Spatial and Graph SIG – a Special Interest Group in the Independent Oracle User Group (IOUG). BIWA Summits attract presentations and talks from the top Business Intelligence, Data Warehousing, Advanced Analytics, Spatial and Graph, and Big Data experts. The 3-day BIWA Summit 2017 event involved Keynotes by Industry experts, Educational sessions, Hands-on Labs and networking events. Click HERE to see presentations and content from BIWA Summit 2017. Call for Speaker DEADLINE is December 3, 2017 at midnight Pacific Time. Presentations and Hands-on Labs must be non-commercial. Sales promotions for products or services disguised as proposals will be eliminated.  Speakers whose abstracts are accepted will be expected to submit their presentation as PDF slide deck for posting on the BIWA Summit conference website.  Accompanying technical and use case papers are encouraged, but not required. Complimentary registration to Oracle BIWA Summit 2018 is provided to the primary speaker of each accepted presentation. Note:  Any additional co-presenters need to register for the event separately and provide appropriate registration fees.    Please submit session proposals in one of the following areas: Machine Learning Analytics Big Data Data Warehousing and ETL Cloud Internet of Things Spatial and Graph (Oracle Spatial and Graph Summit) …Anything else “Cool” using Oracle technologies in “novel and interesting” ways Proposals that cover multiple areas are acceptable and highly encouraged.  On your submission, please indicate a primary track and any secondary tracks for consideration.  The content committee strongly encourages technical/how to sessions, strategic guidance sessions, and real world customer end user case studies, all using Oracle technologies. If you submitted a session last year, your login should carry over for 2018. We will be accepting abstracts on a rolling basis, so please submit your abstracts as soon as possible. Learn from Industry Experts from Oracle, Partners, and Customers Come join hundreds of professionals with shared interests in the successful deployment of Oracle technology on premises, on Cloud, hybrid Cloud, and infrastructure: Cloud & Infrastructure Spatial & Graph Analytics Big Data & Machine Learning Internet of Things Database  Cloud Service Big Data Cloud Service Data Visualization Cloud Service Hadoop Spark Big Data Connectors (Hadoop & R) IaaS, PaaS, SaaS Spatial and Graph for Big Data and Database GIS and smart cities features Location intelligence Geocoding & routing Property graph DB Social network, fraud detection, deep learning graph analytics RDF graph Oracle Data Visualization Big Data Discovery OBIEE OBIA Applications Exalytics Real-Time Decisions Machine Learning Advanced Analytics Data Mining R Enterprise Fraud detection Text Mining SQL Patterns Clustering Market Basket Analysis Big Data Preparation Big Data from sensors Edge Analytics Industrial Internet IoT Cloud Monetizing IoT Security Standards   What To Expect 400+ Attendees | 90+ Speakers | Hands on Labs | Technical Content| Networking New at this year’s BIWA Summit: Strategy track – targeted at the C-level audience, how to assess and plan for new Oracle Technology in meeting enterprise objectives Oracle Global Leaders track – sessions by Oracle’s Global Leader customers on their use of Oracle Technology, and targeted product managers on latest Oracle products and features Grad-student track – sessions on cutting edge university work using Oracle Technology, continuing Oracle Academy’s sponsorship of graduate student participation  Exciting Topics Include:  Database, Data Warehouse, and Cloud, Big Data Architecture Deep Dives on existing Oracle BI, DW and Analytics products and Hands on Labs Updates on the latest Oracle products and technologies e.g. Oracle Big Data Discovery, Oracle Visual Analyzer, Oracle Big Data SQL Novel and Interesting Use Cases of Spatial and Graph, Text, Data Mining, ETL, Security, Cloud Working with Big Data:  Hadoop, "Internet of Things", SQL, R, Sentiment Analysis Oracle Business Intelligence (OBIEE), Oracle Spatial and Graph, Oracle Advanced Analytics —All Better Together Example Talks from BIWA Summit 2017:  [Visit www.biwasummit.org to see the  Full Agenda from BIWA’17 and to download copies of BIWA’17 presentations and HOLs.] Machine Learning Taking R to new heights for scalability and performance Introducing Oracle Machine Learning Zeppelin Notebooks Oracle's Advanced Analytics 12.2c New Features & Road Map: Bigger, Better, Faster, More! An Post -- Big Data Analytics platform and use of Oracle Advanced Analytics Customer Analytics POC for a global retailer, using Oracle Advanced Analytics Oracle Marketing Advanced Analytics Use of OAA in Propensity to Buy Models Clustering Data with Oracle Data Mining and Oracle Business Intelligence How Option Traders leverage Oracle R Enterprise to maximize trading strategies From Beginning to End - Oracle's Cloud Services and New Customer Acquisition Marketing K12 Student Early Warning System Business Process Optimization Using Reinforcement Learning Advanced Analytics & Graph: Transparently taking advantage of HW innovations in the Cloud Dynamic Traffic Prediction in Road Networks Context Aware GeoSocial Graph Mining Analytics Uncovering Complex Spatial and Graph Relationships: On Database, Big Data, and Cloud Make the most of Oracle DV (DVD / DVCS / BICS) Data Visualization at SoundExchange – A Case Study Custom Maps in Oracle Big Data Discovery with Oracle Spatial and Graph 12c Does Your Data Have a Story? Find out with Oracle Data Visualization Desktop Social Services Reporting, Visualization, and Analytics Using OBIEE Leadership Essentials in Successful Business Intelligence (BI) Programs Big Data Uncovering Complex Spatial and Graph Relationships: On Database, Big Data, and Cloud Why Apache Spark has become the darling in Big Data space? Custom Maps in Oracle Big Data Discovery with Oracle Spatial and Graph 12c A Shortest Path to Using Graph Technologies– Best Practices in Graph Construction, Indexing, Analytics and Visualization Cloud Computing Oracle Big Data Management in the Cloud Oracle Cloud Cookbook for Professionals Uncovering Complex Spatial and Graph Relationships: On Database, Big Data, and Cloud Deploying Oracle Database in the Cloud with Exadata: Technical Deep Dive Employee Onboarding: Onboard – Faster, Smarter & Greener Deploying Spatial Applications in Oracle Public Cloud Analytics in the Oracle Cloud: A Case Study Deploying SAS Retail Analytics in the Oracle Cloud BICS - For Departmental Data Mart or Enterprise Data Warehouse? Cloud Transition and Lift and Shift of Oracle BI Applications Data Warehousing and ETL Business Analytics in the Oracle 12.2 Database: Analytic Views Maximizing Join and Sort Performance in Oracle Data Warehouses Turbocharging Data Visualization and Analyses with Oracle In-Memory 12.2 Oracle Data Integrator 12c: Getting Started Analytic Functions in SQL My Favorite Scripts 2017 Internet of Things Introduction to IoT and IoT Platforms The State of Industrial IoT Complex Data Mashups: an Example Use Case from the Transportation Industry Monetizable Value Creation from Industrial-IoT Analytics Spatial and Graph Summit Uncovering Complex Spatial and Graph Relationships: On Database, Big Data, and Cloud A Shortest Path to Using Graph Technologies– Best Practices in Graph Construction, Indexing, Analytics and Visualization Build Recommender Systems, Detect Fraud, and Integrate Deep Learning with Graph Technologies Building a Tax Fraud Detection Platform with Big Data Spatial and Graph technologies Maps, 3-D, Tracking, JSON, and Location Analysis: What’s New with Oracle’s Spatial Technologies Deploying Spatial Applications in Oracle Public Cloud RESTful Spatial services with Oracle Database as a Service and ORDS Custom Maps in Oracle Big Data Discovery with Oracle Spatial and Graph 12c Smart Parking for a Smart City Using Oracle Spatial and Graph at Los Angeles and Munich Airports Analysing the Panama Papers with Oracle Big Data Spatial and Graph Apply Location Intelligence and Spatial Analysis to Big Data with Java  Example Hands-on Labs from BIWA Summit 2017: Using R for Big Data Advanced Analytics and Machine Learning Learn Predictive Analytics in 2 hours!  Oracle Data Miner Hands on Lab Deploy Custom Maps in OBIEE for Free Apply Location Intelligence and Spatial Analysis to Big Data with Java Use Oracle Big Data SQL to Analyze Data Across Oracle Database, Hadoop, and NoSQL Make the most of Oracle DV (DVD / DVCS / BICS) Analyzing a social network using Big Data Spatial and Graph Property Graph Submit your abstract(s) today, good luck and hope to see you there! See last year’s Full Agenda from BIWA’17.   Dan Vlamis and Shyam Nath , Oracle BIWA Summit '18 Conference Co-Chairs        

   BIWA Summit 2018 The Big Data + Cloud + Machine Learning + Spatial + Graph + Analytics + IoT Oracle User Conference featuring Oracle Spatial and Graph Summit March 20 - 22, 2018 Oracle Conference Center...

Data Mining (ODM)

New Wikipedia-Based Cognitive Model Available for Text Processing

  Blog Posting By Alex Sakharov, Principal Member of Technical Staff, Data Mining Technologies Explicit Semantic Analysis (ESA), a new feature in Oracle Advanced Analytics Release 12.2, uses concepts of an existing knowledge base as features rather than latent features derived by latent semantic analysis methods such as Singular Value Decomposition and Latent Dirichlet Allocation. Each row e.g. a document in the training data maps to a feature, i.e. a concept. ESA works best with concepts represented by text documents. It has multiple applications in the area of text processing, most notably semantic relatedness (similarity) and explicit topic modeling.   Text similarity use cases might involve e.g. resume matching, searching for similar blog postings, etc..  OAA’s ESA derived similarity indexes can be used as added new features for other records e.g. Candidate, Age, Income, Job_description Similarity_index_score. The ESA model is basically an inverted index that maps words to relevant concepts of the knowledge base. This inverted index also incorporates weights reflecting the strength of association between words and concepts. ESA does not project the original feature space and does not reduce its dimensionality except for filtering out features with uninformative text. There exist vast amounts of knowledge represented as text. Textual knowledge bases are normally collections of common or domain-specific articles, and every article defines one concept. These textual knowledge bases such as Wikipedia usually serve as sources for ESA models. Wikipedia is particularly good as a source for a general-purpose ESA model because Wikipedia is a comprehensive knowledge base.  Users can develop and add and use their own custom, domain specific ESA models e.g. medical, homeland security, research & development, etc. OAA 12.2 Documentation Please refer to https://docs.oracle.com/database/122/DMAPI/explicit-semantic-analysis.htm for more information about ESA. Distribution Oracle distributes an ESA model built in 12.2.0.1 from the following 2016 Wikipedia dump https://dumps.wikimedia.org/enwiki/.  The dump dated November 1, 2016 was used for the building of this ESA model. See Oracle Machine Models to download the ESA Model 1.0 EN.  The model file is wiki_model12.2.0.1.dmp. The distribution includes two scripts: wiki_esa_setup.sql, wiki_esa_demo.sql. The wiki_esa_setup.sql script defines a text policy. The wiki_esa_demo.sql script contains sample queries for the model.   Setup This is how to load this model into your DB given that you use the scott/tiger account provided in Oracle DBs. It is done similarly for other accounts. First, you execute the following sql as sysdba in order to grant necessary privileges to this account: SQL> GRANT CREATE ANY DIRECTORY TO SCOTT; Grant succeeded. SQL> GRANT EXECUTE ON CTXSYS.CTX_DDL TO SCOTT; Grant succeeded. SQL> GRANT CREATE MINING MODEL TO SCOTT; Grant succeeded. The minimum recommended size of the tablesspace is 1G. SQL> CREATE TABLESPACE <your tablespace> DATAFILE '<directory>/<file>' SIZE 1G REUSE AUTOEXTEND ON MAXSIZE UNLIMITED; Tablespace created. It is necessary to to define a DB directory in order to import a model. SQL> CREATE OR REPLACE DIRECTORY DBDIR AS '<directory>'; Directory created. SQL> ALTER USER SCOTT QUOTA UNLIMITED ON <your tablespace>; User altered. Second, you need to copy wiki_model12.2.0.1.dmp to your directory. After that, you execute this command in the shell: impdp scott/tiger dumpfile=wiki_model12.2.0.1.dmp directory=DBDIR remap_schema=DMUSER:SCOTT remap_tablespace=TBS_1:TBS Alternatively, you may execute the following sql code to achieve the same result: SQL> begin dbms_data_mining.import_model (              filename => 'wiki_model12.2.0.1.dmp',              directory =>'DBDIR',              schema_remap => 'DMUSER:SCOTT',              tablespace_remap => 'TBS_1:TBS' ); end;  / PL/SQL procedure successfully completed. The imported model name is WIKI_MODEL. You can explore the imported model via view DM$VAWIKI_MODEL. If you use your own DB account, then make sure that the same privileges as for SCOTT are granted to that account. Now you can run: SQL> @wiki_esa_setup.sql This script sets up the text policy wiki_txtpol which is referred to from the model. Make sure that the size of SGA is sufficient for fast scoring. The minimum recommended settings are: sga_max_size=1G sga_target=1G Once SGA is properly sized, you can run sample scoring queries: SQL> @wiki_esa_scoring.sql   Scoring All queries against WIKI_MODEL score textual data. These data should be given as one column named TEXT. If your textual data comes from a table column, this column should be aliased to TEXT. Text policy wiki_txtpol should be defined before scoring. Scoring function feature_set is used for topic modeling, and function feature_compare is used for semantic similarity.   Explicit topic modeling The ESA Wikipedia model helps discover the most relevant topics for a given text document. It could be a short text such as a singular word or a long document. Please see relevant Wikipedia topics for word 'bank': SQL> select s.feature_id, s.value from     (select feature_set(wiki_model, 10 using *) fset from     (SELECT 'bank' AS text FROM dual)) t,     table(t.fset) s order by s.value desc; FEATURE_ID                              VALUE ----------------------------------------  ------ Bank                                            .101 Bank of America                          .099 National bank                               .099 Central bank                                .099 National Bank Act                        .096 The next example shows Wikipedia topics for one sentence: SQL> select s.feature_id, s.value from       (select feature_set(wiki_model, 10 using *) fset from        (SELECT 'A group of European-led astronomers has made a photograph of what appears to be a planet orbiting another star. If so, it would be the first confirmed picture of a world beyond our solar system.'   AS text FROM dual)) t, table(t.fset) s order by s.value desc; FEATURE_ID                                                  VALUE ----------------------------------------                      ------ Solar System                                                  .144 Exoplanet                                                        .138 Planet                                                              .138 Formation and evolution of the Solar System .127 Planetary system                                            .127 Here is yet another example in which topic modeling is done for a paragraph: SQL> select s.feature_id, s.value from     (select feature_set(wiki_model, 10 using *) fset from     (SELECT 'The more things change... Yes, I''m inclined to agree, especially with regards to the historical relationship between stock prices and bond yields. The two have generally traded together, rising during periods of economic growth and falling during periods of contraction. Consider the period from 1998 through 2010, during which the U.S. economy experienced two expansions as well as two recessions: Then central banks came to the rescue. Fed Chairman Ben Bernanke led from Washington with the help of the bank''s current $3.6T balance sheet. He''s accompanied by Mario Draghi at the European Central Bank and an equally forthright Shinzo Abe in Japan. Their coordinated monetary expansion has provided all the sugar needed for an equities moonshot, while they vowed to hold global borrowing costs at record lows' AS text FROM dual)) t, table(t.fset) s order by s.value desc; FEATURE_ID                                 VALUE ----------------------------------------     ------ Recession                                       .147 Mario Draghi                                   .138 Lost Decade (Japan)                      .132 Ben Bernanke                                .120 Federal Open Market Committee   .093   Semantic similarity The ESA Wikipedia model can be used to calculate semantic similarity. One can score semantic similarity for short and long documents alike. The following two queries capture the fact that words 'street' and 'avenue' are semantically closer than 'street' and 'farm'. SQL> select 1-feature_compare(wiki_model using 'street' as text and using 'avenue' as text) comp from dual;   COMP ------   .235 SQL> SQL> select 1-feature_compare(wiki_model using 'street' as text and using 'farm' as text) comp from dual;   COMP ------   .004 In the next example, the first pair of sentences scores higher because Nick Price is a golfer born in South Africa. Note that the two sentences from the first pair have no common words. SQL> SELECT 1-FEATURE_COMPARE(wiki_model USING 'There are several PGA tour golfers from South Africa' text AND USING 'Nick Price won the 2002 Mastercard Colonial Open' text) comp FROM DUAL;   COMP ------   .119 SQL> SQL> SELECT 1-FEATURE_COMPARE(wiki_model USING 'There are several PGA tour golfers from South Africa' text AND USING 'John Elway played quarterback for the Denver Broncos' text) comp FROM DUAL;   COMP ------   .003 In the following example, one paragraph referring to al Qa ida and Saudi Arabia is compared to two other paragraphs. The first counterpart paragraph refers to similar matters even though it does not mention al Qa ida or Osama bin Laden. The first pair of paragraphs scores a high similarity according to a Wikipedia-based ESA model. The second counterpart paragraph refers to unrelated topics. The second pair of paragraphs scores low as expected. SQL> select 1-feature_compare(wiki_model using   'Senior members of the Saudi royal family paid at least $560 million to Osama bin Laden terror group and the Taliban for an agreement his forces would not attack targets in Saudi Arabia, according to court documents. The papers, filed in a $US3000 billion ($5500 billion) lawsuit in the US, allege the deal was made after two secret meetings between Saudi royals and leaders of al-Qa ida, including bin Laden. The money enabled al-Qa ida to fund training camps in Afghanistan later attended by the September 11 hijackers. The disclosures will increase tensions between the US and Saudi Arabia.' as text   and using   'The Saudi Interior Ministry on Sunday confirmed it is holding a 21-year-old Saudi man the FBI is seeking for alleged links to the Sept. 11 hijackers. Authorities are interrogating Saud Abdulaziz Saud al-Rasheed "and if it is proven that he was connected to terrorism, he will be referred to the sharia (Islamic) court," the official Saudi Press Agency quoted an unidentified ministry official as saying.' as text) comp from dual;   COMP ------   .583 SQL> SQL> select 1-feature_compare(wiki_model using   'Senior members of the Saudi royal family paid at least $560 million to Osama bin Laden terror group and the Taliban for an agreement his forces would not attack targets in Saudi Arabia, according to court documents. The papers, filed in a $US3000 billion ($5500 billion) lawsuit in the US, allege the deal was made after two secret meetings between Saudi royals and leaders of al-Qa ida, including bin Laden. The money enabled al-Qa ida to fund training camps in Afghanistan later attended by the September 11 hijackers. The disclosures will increase tensions between the US and Saudi Arabia.' as text   and using   'Russia defended itself against U.S. criticism of its economic ties with countries like Iraq, saying attempts to mix business and ideology were misguided. "Mixing ideology with economic ties, which was characteristic of the Cold War that Russia and the United States worked to end, is a thing of the past," Russian Foreign Ministry spokesman Boris Malakhov said Saturday, reacting to U.S. Defense Secretary Donald Rumsfeld statement that Moscow economic relationships with such countries sends a negative signal.' as text) comp from dual;   COMP ------   .095   References E. Gabrilovich and S. Markovitch. Computing Semantic Relatedness Using Wikipedia-based Explicit Semantic Analysis, IJCAI, v. 7, pp. 1606-1611, 2007 E. Gabrilovich and S. Markovitch. Wikipedia-based Semantic Interpretation for Natural Language Processing. Journal of Artificial Intelligence Research, v. 34, pp. 443-498, 2009.

  Blog Posting By Alex Sakharov, Principal Member of Technical Staff, Data Mining Technologies Explicit Semantic Analysis (ESA), a new feature in Oracle Advanced Analytics Release 12.2, uses concepts...

BIWA Summit’17, January 31 - February 2, 2017 REGISTRATION STILL OPEN

BIWA Summit’17- THE BigData + Analytics+ Spatial+ Cloud +IoT + Everything Cool Oracle User Conference January 31 - February 2, 2017 @ the OracleHQ Conference Center in Redwood Shores, CA. The BIWA Summit’17, held in conjunction with Spatial Summit’17 at the Oracle HQ Conference Center, annually draws several hundred customers, partners and Oracle experts who share best practices, “novel and interesting” use cases, customer case studies, and present on emerging technologies.   Featured talks by Oracle executives, technical sessions delivered by experts and user friendly introductory talks make this a great event for customers.  There are several 2 hour Hands on Labs running Oracle products on the Oracle Public Cloud where users can learn how to use Oracle software.    See www.biwasummit.org for full agenda and registration information. BIWA and Spatial Summit’17 Agenda: Tuesday, January 31 + Reception sponsored by Deloitte BIWA and Spatial Summit’17 Agenda: Wednesday, February 1 + Reception sponsored by L&T InfoTech BIWA and Spatial Summit’17 Agenda: Thursday, February 2 NOTE:  BIWA and Spatial IOUG SIGs are managed and run as independent organizations.  Oracle Employees (who are not speakers) must register as any customer does.  However, you can use the BIWAALUMNI discount code when registering to receive a $150 discount. Hope to see everyone at BIWA Summit’17! Charlie

BIWA Summit’17 - THE Big Data + Analytics + Spatial + Cloud +IoT + Everything Cool Oracle User Conference January 31 - February 2, 2017 @ the Oracle HQ Conference Center in Redwood Shores, CA. The BIWA Summit’17...

Links to Copies of Presentations for Oracle Advanced Analytics and Machine Learning talks at Oracle OpenWorld

At OOW'16, I had five talks on Oracle Advanced Analytics and machine learning.  Click on the titles to download the presentations!  Enjoy! SAS to Oracle Advanced Analytics Migration at Zagrebacka Banka, UniCredit Group [CAS6499] Sinisa Behin, Head of IT CRM function, Zagrebacka banka - UniCredit Group Charlie Berger, Oracle Thursday, Sep 22, 9:30 a.m. - 10:15 a.m. | Moscone South – 104 360-Degree Customer Predictive Analytics in the Cloud with Clear Visibility [CON5010] Ray Owens, President, DX Marketing Michelle Plecha, Senior Data Scientist, DX Marketing Charlie Berger, Oracle Thursday, Sep 22, 10:45 a.m. - 11:30 a.m. | Park Central - Franciscan I Oracle’s Internal Use of Data Mining and Predictive Analytics: Case Study [CAS1594] Charlie Berger, Oracle Frank Heiland, Senior Specialist Business Operations, Oracle Thursday, Sep 22, 1:15 p.m. - 2:00 p.m. | Palace - Grand Ballroom Moneyball for HCM [CON2408] Charlie Berger, Oracle Nancy Estell Zoder, Oracle Tuesday, Sep 20, 4:00 p.m. - 4:45 p.m. | Palace - Twin Peaks South Oracle's Big Data Predictive Analytics and Machine Learning Strategy and Roadmap [CON6500] Charlie Berger, Oracle Marcos Arancibia Coddou, OracleThursday, Sep 22, 12:00 p.m. - 12:45 p.m. | Moscone South – 104 Additionally, we'll have our Oracle Advanced Analytics, Machine Learning and R demo pod i.n the Oracle Demo Campgrounds.  Stop by to say "hello" and see the latest developments.Oracle Demo Campgrounds:  SDB-013   Oracle Advanced Analytics, Machine Learning and R Charlie 

At OOW'16, I had five talks on Oracle Advanced Analytics and machine learning.  Click on the titles to download the presentations!  Enjoy! SAS to Oracle Advanced Analytics Migration at Zagrebacka...

CALL FOR ABSTRACTS: Oracle BIWA Summit '17 - THE Big Data + Analytics + Spatial + Cloud + IoT + Everything “Cool” Oracle User Conference 2017

THE Big Data + Analytics + Spatial + Cloud + IoT + Everything “Cool" Oracle User Conference 2017 January 31 – February 2, 2017 Oracle Conference Center at Oracle Head Quarters Campus, Redwood Shores, CA What Oracle Big Data + Analytics + Spatial + Cloud + IoT + Everything “Cool” Successes Can You Share? We want to hear your story. Submit your proposal today for Oracle BIWA Summit 2017, January 31– February 2, 2017 and share your successes with Oracle technology. Speaker proposals now are being accepted through October 1, 2016. Submit now for possible early acceptance and publication in OracleBIWA Summit 2017 promotion materials. Presentations must be non-commercial. Sales promotions for products or services disguised as proposals will be eliminated. Speakers whose abstracts are accepted will be expected to submit at a later date a presentation outline and presentation PDF slide deck. Accompanying technical and use case papers are encouraged, but not required. Click HERE  to submit your abstract(s) for Oracle BIWA Summit 2017. BIWA Summits are organized and managed by the Oracle Business Intelligence, Data Warehousing and Analytics (BIWA) SIG, the Oracle Spatial and Graph SIG—both Special Interest Groups in the Independent Oracle User Group (IOUG), and the Oracle Northern California User Group. BIWA Summits attract presentations and talks from the top BI, DW, Advanced Analytics, Spatial, and Big Data experts. The 3-day BIWA Summit 2016 event involved Keynotes by Industry experts, Educational sessions, Hands-on Labs and networking events. Click HERE to see presentations and content from BIWA Summit 2016. Call for Speaker DEADLINE is October 1, 2016 at midnight Pacific Time. Complimentary registration to Oracle BIWA Summit 2017 is provided to the primary speaker of each accepted abstract.  Note: One complimentary registration per accepted session will be provided. Any additional co-presenters need to register for the event separately and provide appropriate registration fees. It is up to the co-presenters’ discretion which presenter to designate for the complimentary registration. Please submit speaker proposals in one of the following tracks: Advanced Analytics Business Intelligence Big Data + Data Discovery Data Warehousing and ETL Cloud Internet of Things Spatial and Graph …Anything else “Cool” using Oracle technologies in “novel and interesting” ways Learn from Industry Experts from Oracle, Partners, and Customers Come join hundreds of professionals with shared interests in the successful deployment of Oracle Business Intelligence, Data Warehousing, IoT and Analytical products: Cloud & Big Data DW & Data Integration BI & Data Discovery & Visualization Advanced Analytics  & Spatial Internet of Things · Oracle Database Cloud Service · Big Data Appliance · Oracle Data Visualization Cloud Service Hadoop · Spark · Big Data Connectors (Hadoop & R) · Oracle Data as a Service · Engineered Systems · Exadata · Oracle Partitioning · Oracle Data Integrator (ETL) · In-Memory · Oracle Big Data Preparation Cloud Service · Big Data Discovery · Data Visualization · OBIEE · OBI Applications · Exalytics · Cloud · Real-Time Decisions · Oracle Advanced Analytics · Oracle Spatial and Graph · Oracle Data Mining & Oracle Data Miner · Oracle R Enterprise · SQL Patterns · Oracle Text · Oracle R Advanced Analytics for Hadoop · Big Data from sensors · Edge Analytics · Industrial Internet · IoT Cloud · Monetizing IoT · Security · Standards What To Expect 500+ Attendees | 90+ Speakers | Hands on Labs | Technical Content| Networking Exciting Topics Include:  · Database, Data Warehouse, and Cloud, Big Data Architecture · Deep Dives on existing Oracle BI, DW and Analytics products and Hands on Labs · Updates on the latest Oracle products and technologies e.g. Oracle Big Data Discovery, Oracle Visual Analyzer, Oracle Big Data SQL · Novel and Interesting Use Cases of Everything! Spatial, Text, Data Mining, ETL, Security, Cloud · Working with Big Data: Hadoop, "Internet of Things", SQL, R, Sentiment Analysis · Oracle Big Data Discovery, Oracle Business Intelligence (OBIEE), Oracle Spatial and Graph, Oracle Advanced Analytics—All Better Together Example Talks from BIWA Summit 2016: (Need to update pictures from 2016)  [Visit www.biwasummit.org to see the last year’s Full Agenda from BIWA’16 and to download copies of BIWA’16 presentations and HOLs.] Advanced Analytics § Dogfooding – How Oracle Uses Oracle Advanced Analytics To Boost Sales Efficiency, Frank Heilland, Oracle Sales and Support § Fiserv Case Study: Using Oracle Advanced Analytics for Fraud Detection in Online Payments, Julia Minkowski, Fiserv § Enabling Clorox as Data Driven Enterprise, Yigal Gur, Clorox § Big Data Analytics with Oracle Advanced Analytics 12c and Big Data SQL and the Cloud, Charlie Berger, Oracle § Stubhub and Oracle Advanced Analytics, Brian Motzer, Stubhub § Fault Detection using Advanced Analytics at CERN's Large Hadron Collider: Too Hot or Too Cold, Mark Hornick, Oracle § Large Scale Machine Learning with Big Data SQL, Hadoop and Spark, Marcos Arancibia, Oracle § Oracle R Enterprise 1.5 - Hot new features!, Mark Hornick, Oracle BI and Visualization § Electoral fraud location in Brazilian General Elections 2014, Alex Cordon, Henrique Gomes, CDS § See What’s There and What’s Coming with BICS & Data Visualization, Philippe Lions, Oracle § Optimize Oracle Business Intelligence Analytics with Oracle 12c In-Memory Database option, Kai Yu, Dell § BI Movie Magic: Maps, Graphs, and BI Dashboards at AMC Theatres, Tim Vlamis, Vlamis § Defining a Roadmap for Migrating to Oracle BI Applications on ODI, Patrick Callahan, AST Corp. § Free form Data Visualization, Mashup BI and Advanced Analytics with BI 12c, Philippe Lions, Oracle Big Data § How to choose between Hadoop, NoSQL or Oracle Database , Jean-Pierre Djicks, Oracle § Enrich, Transform and Analyse Big Data using Big Data Discovery and Visual Analyzer, Mark Rittman, Rittman Mead § Oracle Big Data: Strategy and Roadmap, Neil Mendelson, Oracle § High Speed Video Processing for Big Data Applications, Melliyal Annamalai, Oracle § How to choose between Hadoop, NoSQL or Oracle Database, Shyam Nath, General Electric § What's New With Oracle Business Intelligence 12c, Stewart Bryson, Red Pill § Leveraging Oracle Big Data Discovery to Master CERN’s Control Data, Antonio Romero Marin, CERN Cloud Computing § Hybrid Cloud Using Oracle DBaaS: How the Italian Workers Comp Authority Uses Graph Technology, Giovanni Corcione, Oracle § Oracle DBaaS Migration Road Map, Daniel Morgan, Forsythe Meta7 § Safe Passage to the CLOUD – Analytics, Rich Solari, Privthi Krishnappa, Deloitte § Oracle BI Tools on the Cloud--On Premise vs. Hosted vs. Oracle Cloud, Jeffrey Schauer, JS Business Intelligence Data Warehousing and ETL § Making SQL Great Again (SQL is Huuuuuuuuuuuuuuuge!) , Panel Discussion, Andy Mendelsohn, Oracle, Steve Feuerstein, Oracle, George Lumpkin, Oracle § The Place of SQL in the Hybrid World, Kerry Osborne and Tanel Poder, Accenture Enkitec Group § Is Oracle SQL the best language for Statistics, Brendan Tierney, Oralytics § Taking Full Advantage of the PL/SQL Compiler, Iggy Ferenandez, Oracle Internet of Things § Industrial IoT and Machine Learning - Making Wind Energy Cost Competitive, Robert Liekar, M&S Consulting Spatial Summit § Utilizing Oracle Spatial and Graph with Esri for Pipeline GIS and Linear Asset Management, Dave Ellerbeck, Global Information Systems § Oracle Spatial and Graph: New Features for 12.2, Siva Ravada, Oracle § High Performance Raster Database Manipulation and Data Processing with Oracle Spatial and Graph, Qingyun (Jeffrey) Xie, Oracle Example Hands-on Labs from BIWA Summit 2016: § Scaling R to New Heights with Oracle Database, Mark Hornick, Oracle, Tim Vlamis, Vlamis Software § Learn Predictive Analytics in 2 hours!! Oracle Data Miner 4.1, Charlie Berger, Oracle, Brendan Tierney, Oralytics, Karl Rexer, Rexer Analytics § Predictive Analytics using SQL and PL/SQL, Oracle Brendan Tierney, Oralytics, Charlie Berger, Oracle § Oracle Data Visualization Cloud Service Hands-On Lab with Customer Use Cases, Pravin Patil, Kapstone Lunch & Partner Lightning Rounds § Fast and Fun 5 Minute Presentations from Each Partner--Must See! Submit your abstract(s) today, good luck and hope to see you there! See last year’s Full Agenda from BIWA’16. Dan Vlamis and Shyam Nath , Oracle BIWA Summit '17Conference Co-Chairs

THE Big Data + Analytics + Spatial + Cloud + IoT + Everything “Cool" Oracle User Conference 2017 January 31 – February 2, 2017 Oracle Conference Center at Oracle Head Quarters Campus, Redwood Shores, CA Wh...

Got #Analytics? Machine Learning in the #Cloud? Yes with Oracle Advanced Analytics!

Machine Learning in the Cloud from Oracle?   Yes!   Did you know that Oracle's Database as a a Cloud Service High and Extreme Editions bundle Oracle Advanced Analytics Database Option? High Performance: Multitenant, Partitioning, Real Application Testing, Advanced Compression, Advanced Security, Label Security, Database Vault, OLAP, Advanced Analytics, Spatial and Graph, Diagnostics Pack, Tuning Pack, Database Lifecycle Management Pack, Data Masking & Subsetting Pack and Cloud Management Pack for Oracle Database. Extreme Performance: In-Memory Database, Active Data Guard, Multitenant, Partitioning, Real Application Testing, Advanced Compression, Advanced Security, Label Security, Database Vault, OLAP, Advanced Analytics, Spatial and Graph, Diagnostics Pack, Tuning Pack, Database Lifecycle Management Pack, Data Masking & Subsetting Pack and Cloud Management Pack for Oracle Database. Now, you can get a "Smart" Database in the Cloud with Oracle Advanced Analytics.   After all, who wants a "Dumb" Database anyway?!   For more information on Oracle Advanced Analytics on-premise or in the Cloud, visit Oracle Advanced Analytics on the Oracle Technology Network (OTN) or view this latest Oracle's Advanced Analytics - Making Big Data + Analytics Simple! presentation. Also, you might be interested in the DX Marketing OAA in the Cloud Customer Success Story.   There are two YouTube testimonials as well as the story write up. See everyone on the Oracle Advanced Analytics machine learning Cloud! Charlie 

Machine Learning in the Cloud from Oracle?   Yes!   Did you know that Oracle's Database as a a Cloud Service High and Extreme Editions bundle Oracle Advanced Analytics Database Option? High...

Mining Structured Data and Unstructured Data using Oracle Advanced Analytics 12c

Oracle Advanced Analytics (OAA) Database Option leverages Oracle Text, a free feature of the Oracle Database, to pre-process (tokenize) unstructured data for ingestion by the OAA data mining algorithms.  By moving, parallelized implementations of  machine learning algorithms inside the Oracle Database, data movement is eliminated and we can leverage other strengths of the Database such as Oracle Text (not to mention security, scalability, auditing, encryption, back up, high availability, geospatial data, etc. This Mining Structured Data and Unstructured Data using Oracle Advanced Analytics 12c YouTube video presents an overview of the capabilities for combining and performing data mining on both structured and unstructured data.  The YouTube includes several quick demonstrations on classification and clustering using unstructured data and provides instructions and links on how to get started--either on premise or on the Oracle Cloud.   I hope you find this helpful and a pleasure to watch.    Presentation Slides. You can also access similar YouTube videos at this Oracle Data Mining at the Movies blog posting. Follow CharlieDataMine on Twitter. Thanks for watching! Charlie Sr. Dir. of Product Management, Oracle Advanced Analytics and Data Mining charlie.berger@oracle.com 

Oracle Advanced Analytics (OAA) Database Option leverages Oracle Text, a free feature of the Oracle Database, to pre-process (tokenize) unstructured data for ingestion by the OAA data mining...

Oracle BIWA'17 - THE Big Data + Analytics + Spatial + Cloud + IoT + Everything “Cool” Oracle User Conference 2017

BIWA Summit 2017 THE Big Data + Analytics + Spatial + Cloud + IoT + Everything “Cool" Oracle User Conference 2017 January 31 – February 2, 2017 Oracle Conference Center at Oracle Head Quarters Campus, Redwood Shores, CA What Oracle Big Data + Analytics + Spatial + Cloud + IoT + Everything “Cool” Successes Can You Share? We want to hear your story. Submit your proposal today for Oracle BIWA Summit 2017, January 31– February 2, 2017 and share your successes with Oracle technology. Speaker proposals now are being accepted through October 1, 2016. Submit now for possible early acceptance and publication in Oracle BIWA Summit 2017 promotion materials. Presentations must be non-commercial. Sales promotions for products or services disguised as proposals will be eliminated. Speakers whose abstracts are accepted will be expected to submit at a later date a presentation outline and presentation PDF slide deck. Accompanying technical and use case papers are encouraged, but not required. Click HERE  to submit your abstract(s) for Oracle BIWA Summit 2017. BIWA Summits are organized and managed by the Oracle Business Intelligence, Data Warehousing and Analytics (BIWA) SIG, the Oracle Spatial and Graph SIG—both Special Interest Groups in the Independent Oracle User Group (IOUG), and the Oracle Northern California User Group. BIWA Summits attract presentations and talks from the top BI, DW, Advanced Analytics, Spatial, and Big Data experts. The 3-day BIWA Summit 2016 event involved Keynotes by Industry experts, Educational sessions, Hands-on Labs and networking events. Click HERE to see presentations and content from BIWA Summit 2016. Call for Speaker DEADLINE is October 1, 2016 at midnight Pacific Time. Complimentary registration to Oracle BIWA Summit 2017 is provided to the primary speaker of each accepted abstract.  Note: One complimentary registration per accepted session will be provided. Any additional co-presenters need to register for the event separately and provide appropriate registration fees. It is up to the co-presenters’ discretion which presenter to designate for the complimentary registration. Please submit speaker proposals in one of the following tracks: Advanced Analytics Business Intelligence Big Data + Data Discovery Data Warehousing and ETL Cloud Internet of Things Spatial and Graph …Anything else “Cool” using Oracle technologies in “novel and interesting” ways Learn from Industry Experts from Oracle, Partners, and Customers Come join hundreds of professionals with shared interests in the successful deployment of Oracle Business Intelligence, Data Warehousing, IoT and Analytical products: Cloud & Big Data DW & Data Integration BI & Data Discovery & Visualization Advanced Analytics  & Spatial Internet of Things · Oracle Database Cloud Service · Big Data Appliance · Oracle Data Visualization Cloud Service Hadoop · Spark · Big Data Connectors (Hadoop & R) · Oracle Data as a Service · Engineered Systems · Exadata · Oracle Partitioning · Oracle Data Integrator (ETL) · In-Memory · Oracle Big Data Preparation Cloud Service · Big Data Discovery · Data Visualization · OBIEE · OBI Applications · Exalytics · Cloud · Real-Time Decisions · Oracle Advanced Analytics · Oracle Spatial and Graph · Oracle Data Mining & Oracle Data Miner · Oracle R Enterprise · SQL Patterns · Oracle Text · Oracle R Advanced Analytics for Hadoop · Big Data from sensors · Edge Analytics · Industrial Internet · IoT Cloud · Monetizing IoT · Security · Standards What To Expect 500+ Attendees | 90+ Speakers | Hands on Labs | Technical Content| Networking Exciting Topics Include:  · Database, Data Warehouse, and Cloud, Big Data Architecture · Deep Dives on existing Oracle BI, DW and Analytics products and Hands on Labs · Updates on the latest Oracle products and technologies e.g. Oracle Big Data Discovery, Oracle Visual Analyzer, Oracle Big Data SQL · Novel and Interesting Use Cases of Everything! Spatial, Text, Data Mining, ETL, Security, Cloud · Working with Big Data: Hadoop, "Internet of Things", SQL, R, Sentiment Analysis · Oracle Big Data Discovery, Oracle Business Intelligence (OBIEE), Oracle Spatial and Graph, Oracle Advanced Analytics—All Better Together Example Talks from BIWA Summit 2016: (Need to update pictures from 2016)  [Visitwww.biwasummit.org to see the last year’s Full Agenda from BIWA’16 and to download copies of BIWA’16 presentations and HOLs.] Advanced Analytics § Dogfooding – How Oracle Uses Oracle Advanced Analytics To Boost Sales Efficiency, Frank Heilland, Oracle Sales and Support § Fiserv Case Study: Using Oracle Advanced Analytics for Fraud Detection in Online Payments, Julia Minkowski, Fiserv § Enabling Clorox as Data Driven Enterprise, Yigal Gur, Clorox § Big Data Analytics with Oracle Advanced Analytics 12c and Big Data SQL and the Cloud, Charlie Berger, Oracle § Stubhub and Oracle Advanced Analytics, Brian Motzer, Stubhub § Fault Detection using Advanced Analytics at CERN's Large Hadron Collider: Too Hot or Too Cold, Mark Hornick, Oracle § Large Scale Machine Learning with Big Data SQL, Hadoop and Spark, Marcos Arancibia, Oracle § Oracle R Enterprise 1.5 - Hot new features!, Mark Hornick, Oracle BI and Visualization § Electoral fraud location in Brazilian General Elections 2014, Alex Cordon, Henrique Gomes, CDS § See What’s There and What’s Coming with BICS & Data Visualization, Philippe Lions, Oracle § Optimize Oracle Business Intelligence Analytics with Oracle 12c In-Memory Database option, Kai Yu, Dell § BI Movie Magic: Maps, Graphs, and BI Dashboards at AMC Theatres, Tim Vlamis, Vlamis § Defining a Roadmap for Migrating to Oracle BI Applications on ODI, Patrick Callahan, AST Corp. § Free form Data Visualization, Mashup BI and Advanced Analytics with BI 12c, Philippe Lions, Oracle Big Data § How to choose between Hadoop, NoSQL or Oracle Database , Jean-Pierre Djicks, Oracle § Enrich, Transform and Analyse Big Data using Big Data Discovery and Visual Analyzer, Mark Rittman, Rittman Mead § Oracle Big Data: Strategy and Roadmap, Neil Mendelson, Oracle § High Speed Video Processing for Big Data Applications, Melliyal Annamalai, Oracle § How to choose between Hadoop, NoSQL or Oracle Database, Shyam Nath, General Electric § What's New With Oracle Business Intelligence 12c, Stewart Bryson, Red Pill § Leveraging Oracle Big Data Discovery to Master CERN’s Control Data, Antonio Romero Marin, CERN Cloud Computing § Hybrid Cloud Using Oracle DBaaS: How the Italian Workers Comp Authority Uses Graph Technology, Giovanni Corcione, Oracle § Oracle DBaaS Migration Road Map, Daniel Morgan, Forsythe Meta7 § Safe Passage to the CLOUD – Analytics, Rich Solari, Privthi Krishnappa, Deloitte § Oracle BI Tools on the Cloud--On Premise vs. Hosted vs. Oracle Cloud, Jeffrey Schauer, JS Business Intelligence Data Warehousing and ETL § Making SQL Great Again (SQL is Huuuuuuuuuuuuuuuge!) , Panel Discussion, Andy Mendelsohn, Oracle, Steve Feuerstein, Oracle, George Lumpkin, Oracle § The Place of SQL in the Hybrid World, Kerry Osborne and Tanel Poder, Accenture Enkitec Group § Is Oracle SQL the best language for Statistics, Brendan Tierney, Oralytics § Taking Full Advantage of the PL/SQL Compiler, Iggy Ferenandez, Oracle Internet of Things § Industrial IoT and Machine Learning - Making Wind Energy Cost Competitive, Robert Liekar, M&S Consulting Spatial Summit § Utilizing Oracle Spatial and Graph with Esri for Pipeline GIS and Linear Asset Management, Dave Ellerbeck, Global Information Systems § Oracle Spatial and Graph: New Features for 12.2, Siva Ravada, Oracle § High Performance Raster Database Manipulation and Data Processing with Oracle Spatial and Graph, Qingyun (Jeffrey) Xie, Oracle Example Hands-on Labs from BIWA Summit 2016: § Scaling R to New Heights with Oracle Database, Mark Hornick, Oracle, Tim Vlamis, Vlamis Software § Learn Predictive Analytics in 2 hours!! Oracle Data Miner 4.1, Charlie Berger, Oracle, Brendan Tierney, Oralytics, Karl Rexer, Rexer Analytics § Predictive Analytics using SQL and PL/SQL, Oracle Brendan Tierney, Oralytics, Charlie Berger, Oracle § Oracle Data Visualization Cloud Service Hands-On Lab with Customer Use Cases, Pravin Patil, Kapstone Lunch & Partner Lightning Rounds § Fast and Fun 5 Minute Presentations from Each Partner--Must See! Submit your abstract(s) today, good luck and hope to see you there! See last year’s Full Agenda from BIWA’16. Dan Vlamis and Shyam Nath , Oracle BIWA Summit '17Conference Co-Chairs

BIWA Summit 2017 THE Big Data + Analytics + Spatial + Cloud + IoT + Everything “Cool" Oracle User Conference 2017 January 31 – February 2, 2017 OracleConference Center at Oracle Head Quarters Campus,...

Zagrebačka Bank Increases Cash Loans by 15% Within 18 Months of Advanced Analytics Platform Deployment

Oracle Customer: Zagrebačka Bank (Zagrebačka banka d.d.)Location:  Zagreb, CroatiaIndustry: Financial ServicesEmployees:  4,500Annual Revenue:  $1 to $5 Billion Zagrebačka Bank (Zagrebačka banka d.d., ZABA) is the biggest bank in Croatia, one of the country’s largest employers, and part of the Italian Unicredit Group where it regularly belongs among the most profitable subsidiaries. Through its 130 branches and 850 automatic teller machines (ATM), the bank serves 80,000 corporate customers and more than 1.1 million private customers nationwide, making one in four citizens a ZABA customer. ZABA accounts for 25% of the Croatian banking sector’s total assets and almost 60% of its profits. The bank controls 35% of the country’s investment funds, 41% of obligatory pension funds, and 30% of specialized savings accounts for real estate transactions. Euromoney and Global Finance publications named ZABA Croatia’s best bank in 2011.  A word from Zagrebačka Bank (Zagrebačka banka d.d.) “With Oracle Advanced Analytics we execute computations on thousands of attributes in parallel—impossible with open-source R. Analyzing in Oracle Database without moving data increases our agility. Oracle Advanced Analytics enables us to make quality decisions on time, increasing our cash loans business 15%.” – Jadranka Novoselovic, Head of Business Intelligence Development, Zagrebačka Bank  Challenges Increase bank performance of statistical modeling and predictive analytics, which can take three days for data preparation and model scoring and at least 24 hours for model building Improve ZABA’s business agility by increasing the efficiency of building and testing predictive models in the Oracle Database platform rather than moving large volumes of financial and customer data between servers and databases to generate predictive analytics Gain the ability to use predictive analytics for commercial activities in order to better target customers for new banking products and services Reduce the risks and costs of executing statistical modeling which requires transfers of data to individual analytical servers with dedicated hardware resources using specialized manpower Strengthen prediction power by increasing model hit ratio and making it easy for analysts to refresh scores Solutions Used Oracle Advanced Analytics on Oracle Database to transform traditional predictive analytics that could take days to prepare and execute all tasks, including data import and export, into a seamless process executed in seconds, minutes, or hours—increasing cash loans by 15% within 18 months due to improved hit ratio Saved 1,000 man-days in IT maintenance per year by using Oracle Exadata, Oracle GoldenGate, and Oracle Data Integrator for data warehouse integration with extension to predictive and statistical modeling using Oracle Advanced Analytics Database Increased prediction performance and delivered comprehensive statistical functionality for in-database computation by leveraging the security, reliability, performance, and scalability of Oracle Database and Oracle Advanced Analytics for predictive analytics—running data preparation, transformation, model building, and model scoring within the database Enabled analysts to leverage in-database mining algorithms with both Oracle Advanced Analytics’ Oracle Data Mining and Oracle’ R enterprise components to achieve much faster access to analytics data such as credit risk scoring and customer retention, without needing to transfer data between servers and databases for statistical modeling Empowered the bank’s IT team to focus on business-triggered statistical modeling—for example to increase the hit ratio of target customers for a newly developed credit card or to improve customer retention—rather than spending most of their time addressing regulatory projects Empowered the organization to improve prediction accuracy by simplifying the development effort for analytics and minimizing developer intervention during model design, making it easy for analysts to refresh scores such as customer retention, with an updated data set Strengthened decision-making across the bank’s regulatory and commercial activities by delivering business analytics more rapidly, for example enabling risk managers to improve credit risk scorings with fast ad-hoc analysis Facilitated enterprise-wide access to statistics and advanced analytics through delivery of Oracle Advanced Analytics’ actionable insights via Oracle Business Intelligence Enterprise Editiondashboards and business applications Improved total cost of ownership by eliminating the need for dedicated analytical servers and improving development and execution time of business analytics by 30%, thanks to Oracle Advanced Analytics in-database architecture Incorporated Oracle Database as an enterprisewide analytical platform—eliminating the need for two dedicated administrators focused on server maintenance and administrative tasks Ensured compliance with regulatory demands in terms of consistent, punctual filing of regulatory reports—avoiding heavy penalties from the Croatian National Bank Why Oracle? “We chose Oracle because our entire data modeling process runs on the same machine with the highest performance and level of integration. With Oracle Database we simply switched on the Oracle Advanced Analytics option and needed no new tools,” said Sinisa Behin, ICT coordinator at business intelligence development, Zagrebačka Bank.“Our plan is to maximize coverage of our commercial activities, so that each process of campaign management and sales of financial products and services is empowered by statistical analysis. Oracle’s engineered high-performance platform met our analytical requirements for a low cost of ownership,” said Lidija Glavinic, ICT coordinator at business intelligence development, Zagrebačka Bank. Implementation Process The deployment of Oracle Advanced Analytics represented the final stage of ZABA’s consolidation on Oracle technology. The organization migrated current clients, products, and portfolio models together while starting to develop predictive analytics for its commercial department from scratch. The retention model migration from SAS to the Oracle Advanced Analytics platform was one of the first migrations of this type in the region. Oracle’s Advanced Analytics Development team specifically designed the functionality of clustering variables for ZABA and implemented the Oracle R Enterprise Varclus package in the 1.5 release. Partners: Zagrebačka Bank collaborated with Oracle Partner Combis to successfully migrate its predictive models from SAS to the Oracle Advanced Analytics platform. Combis completed the migration on time and trained ZABA employees in the use of Oracle tools. Multicom d.o.oCombis d.o.oNeos d.o.o Oracle Product and Services Oracle Database Oracle Advanced Analytics Oracle GoldenGate Oracle Business Intelligence Enterprise Edition Oracle Enterprise Performance Management Oracle Exadata

Oracle Customer: Zagrebačka Bank (Zagrebačka banka d.d.) Location:  Zagreb, Croatia Industry: Financial Services Employees:  4,500Annual Revenue:  $1 to $5 Billion Zagrebačka Bank (Zagrebačka banka...

My Favoriate Oracle Data Miner Demo Workflows - Part 1 in a Series: CUST_INSUR_LTV

Part 1 (of a planned series of blog posts):  Here are a few of my favorite Oracle Data Miner demo workflows.  They all are simple, easy to create examples of data mining and predictive analytics using Oracle Advanced Analytics and SQL Developer's Oracle Data Miner extension. CUST_INSUR_LTV Oracle Data Miner ships with some small datasets to get users started including INSUR_CUST_LTV_SAMPLE (1,015 records).  While this tiny dataset doesn't bloat the SQL Developer download size and helps get Oracle Data Miner users quickly up and running, the data size is so small that the the resulting predictive models and insights can seem at times a bit trivial.  Hence, I prefer to use larger files that ship with the Oracle 12c Database Sample Examples (SH.schema, MINING_DATA_BUILD, etc.) and this CUST_INSUR_LTV demo data:   CUST_INSUR_LTV.DMP (~25K records, ~25 attributes)   CUST_INSUR_LTV_APPLY.DMP (~25K records, ~25 attributes)   Predicting Insurance Buyers ODMr workflow You can import the workflow and and datasets and everything should run.   This workflow includes an Explore node and Graph node that are typically used to visualize the data before performing data mining.  The Explore node step is important to make sure the data you are about to analyze makes sense and seems accurate and reasonable.  For example, AGE should all be positive numbers and range from 0 to say 100+. The Column Filter node performs data profiling and data quality checks on the data and is also used to perform an Attribute Importance analysis to determine which attributes (or input variables) have the largest correlation with the target attribute (Buy_Insurance).  Sometimes this step alone provides significant value to a company to better understand the key factors, but here, we're also using it to better understand which attributes have the most inpact on our business problem--targeting customers who are likley to buy insurance.  Note:  Each of the OAA/ODM algorithms have their own embedded attribute importance/feature selection capabilties and each can handle hundreds to thousands of input attributes.  However, many times we want to get a feel for what's driving our business problem and learn where we could focus to pull in additional attributes and "engineered features" e.g "AGE/INCOME ratio" or "Maximum_Amount" etc..   We build four (4) Oracle Data Mining Classification models by default (Decision Tree, Naive Bayes, GLM Logistic Regression and Support Vector Machine (SVM)).  For simplicity, we accept the ODMr defaults for Data Preparation and Algorithm Settings and can be assured that with Oracle Data Miner default settings, we should achieve a "good predictive model". Decision Trees generally produce good predictive models and have the added benefit of being easy to understand. Notice the IF.... THEN... rules. Lastly, we use the Apply node and our Classification node to make predictions on our CUST_INSUR_LTV_APPLY table and get our predictions.   The predictions and associated Prediction_Details are stored inside the Oracle Database and hence easily available for inclusion in any BI Dashboard or real-time application.   Oracle Data Miner generates the PL/SQL and SQL scripts for accelerating deploying analytical methodologies that leverage the scalability and infrstructure of the Oracle Database.  See this Oracle Data Miner: Use Reposiitory APIs to Manage and Schedule Workflows to Run White Paper for more details on the many model deployment options.  Hope you enjoy! Charlie 

Part 1 (of a planned series of blog posts):  Here are a few of my favorite Oracle Data Miner demo workflows.  They all are simple, easy to create examples of data mining and predictive analytics using...

Learn Predictive Analytics in 2 Days - New Oracle University Course!

What you will learn:  This Predictive Analytics using Oracle Data Mining Ed 1 training will review the basic concepts of data mining. Expert Oracle University instructors will teach you how to leverage the predictive analytical power of Oracle Data Mining, a component of the Oracle Advanced Analytics option. Learn To: Explain basic data mining concepts and describe the benefits of predictive analysis. Understand primary data mining tasks, and describe the key steps of a data mining process. Use the Oracle Data Miner to build, evaluate, apply, and deploy multiple data mining models. Use Oracle Data Mining's predictions and insights to address many kinds of business problems. Deploy data mining models for end-user access, in batch or real-time, and within applications. Benefits to You When you've completed this course, you'll be able to use the Oracle Data Miner 4.1, the Oracle Data Mining "workflow" GUI, which enables data analysts to work directly with data inside the database. The Data Miner GUI provides intuitive tools that help you explore the data graphically, build and evaluate multiple data mining models, apply Oracle Data Mining models to new data, and deploy Oracle Data Mining's predictions and insights throughout the enterprise. Oracle Data Miner's SQL APIs - Get Results in Real-Time Oracle Data Miner's SQL APIs automatically mine Oracle data and deploy results in real-time. Because the data, models, and results remain in the Oracle Database, data movement is eliminated, security is maximized and information latency is minimized. Introduction Course Objectives Suggested Course Prerequisites Suggested Course Schedule Class Sample Schemas Practice and Solutions Structure Review location of additional resources Predictive Analytics and Data Mining Concepts What is the Predictive Analytics? Introducting the Oracle Advanced Analytics (OAA) Option? What is Data Mining? Why use Data Mining? Examples of Data Mining Applications Supervised Versus Unsupervised Learning Supported Data Mining Algorithms and Uses Understanding the Data Mining Process Common Tasks in the Data Mining Process Introducing the SQL Developer interface Introducing Oracle Data Miner 4.1 Data mining with Oracle Database Setting up Oracle Data Miner Accessing the Data Miner GUI Identifying Data Miner interface components Examining Data Miner Nodes Previewing Data Miner Workflows Using Classification Models Reviewing Classification Models Adding a Data Source to the Workflow Using the Data Source Wizard Using Explore and Graph Nodes Using the Column Filter Node Creating Classification Models Building the Models Examining Class Build Tabs Using Regression Models Reviewing Regression Models Adding a Data Source to the Workflow Using the Data Source Wizard Performing Data Transformations Creating Regression Models Building the Models Comparing the Models Selecting a Model Using Clustering Models Describing Algorithms used for Clustering Models Adding Data Sources to the Workflow Exploring Data for Patterns Defining and Building Clustering Models Comparing Model Results Selecting and Applying a Model Defining Output Format Examining Cluster Results Performing Market Basket Analysis What is Market Basket Analysis? Reviewing Association Rules Creating a New Workflow Adding a Data Source to the Workflow Creating an Association Rules Model Defining Association Rules Building the Model Examining Test Results Performing Anomaly Detection Reviewing the Model and Algorithm used for Anomaly Detection Adding Data Sources to the Workflow Creating the Model Building the Model Examining Test Results Applying the Model Evaluating Results Mining Structured and Unstructured Data Dealing with Transactional Data Handling Aggregated (Nested) Data Joining and Filtering data Enabling mining of Text Examining Predictive Results Using Predictive Queries What are Predictive Queries? Creating Predictive Queries Examining Predictive Results Deploying Predictive models Requirements for deployment Deployment Options Examining Deployment Options

What you will learn:  This Predictive Analytics using Oracle Data Mining Ed 1 training will review the basic concepts of data mining. Expert Oracle University instructors will teach you how to...

Links to Presentations: BIWA Summit'16 - Big Data + Analytics User Conference Jan 26-28, @ Oracle HQ Conference Center

We had a great www.biwasummit.org event with ~425 attendees, in depth technical presentations delivered byexperts and even had several 2 hour Hands on Labs training classes that usedthe Oracle Database Cloud!  Watch for more coverage of event in variousOracle marketing and partner content venues. Many thanks toall the BIWA board of directors and many volunteers who have put in so muchwork to make this BIWA Summit the best BIWA user event ever.  Mark yourcalendars for BIWA Summit’17, January 31, Feb. 1 & Feb. 2, 2017. We’ll be announcing Call for Abstracts in the future, so please direct yourbest customers and speakers to submit.  We’re aiming to continue to makeBIWA + Spatial + YesSQL Summit the best focused user gathering for sharing bestpractices for novel and interesting use cases of Oracle technologies. BIWA is an IOUGSIG run by entirely by customers, partners and Oracle employeevolunteers.  We’re always looking for people who would like to beinvolved.  Let me know if you’d like to contribute to the planning andorganization of future BIWA events and activities. See everyone atBIWA’17! Charlie, onbehalf of the entire BIWA board of directors  (charlie.berger@oracle.com) (see www.biwasummit.org for more information) See List of BIWA Summit'16 Presentations below.  Click on Details to access the speaker’s abstract and download the files (assuming the speaker has posted them for sharing). We now have a schedule at a glance to show you all the sessions in a tabular agenda. See bottom of page for the Session Search capability Below is a list of the sessions and links to download mostof the materials for the various sessions.  Click on the DETAILS buttonnext to the session you want to download, then the page should refresh with thesession description and (assuming the presenter uploaded files, but be awarethat files may be limited to 5MB) you should see a list of files for thatsession.  See the full list below: Advanced Analytics Presentations (Click on Details to access file if submitted by presenter) Dogfooding – How Oracle Uses Oracle Advanced Analytics To Boost Sales Efficiency Details Oracle Modern Manufacturing - Bridging IoT, Big Data Analytics and ERP for Better Results Details Predictive Modelling and Forecasting using OER Details Enabling Clorox as Data Driven Enterprise Details Fault Detection using Advanced Analytics at CERN's Large Hadron Collider: Too Hot or Too Cold Details Large Scale Machine Learning with Big Data SQL, Hadoop and Spark Details Stubhub and Oracle Advanced Analytics Details Fiserv Case Study: Using Oracle Advanced Analytics for Fraud Detection in Online Payments Details Advanced Analytics for Call Center Operations Details Machine Learning on Streaming Data via Integration of Oracle R Enterprise and Oracle Stream Explorer Details Learn Predictive Analytics in 2 hours!! Oracle Data Miner 4.0 Hands on Lab Details Scaling R to New Heights with Oracle Database Details Predictive Analytics using SQL and PL/SQL Details Big Data Analytics with Oracle Advanced Analytics 12c and Big Data SQL and the Cloud Details Improving Predictive Model Development Time with R and Oracle Big Data Discovery Details Oracle R Enterprise 1.5 - Hot new features! Details Is Oracle SQL the best language for Statistics Details BI and Visualization Presentations (Click on Details to access file if submitted by presenter) Electoral fraud location in Brazilian General Elections 2014 Details The State of BI Details Case Study of Improving BI Apps and OBIEE Performance Details Preparing for BI 12c Upgrade Details Data Visualization at Sound Exchange – a Case Study Details Integrating OBIEE and Essbase, Why it Makes Sense Details The Dash that changed a culture Details Optimize Oracle Business Intelligence Analytics with Oracle 12c In-Memory Database option Details Oracle Data Visualization vs. Answers: The Cage Match Details What's New With Oracle Business Intelligence 12c Details Workforce Analytics Leveraging Oracle Business Intelligence Cloud Serivces (BICS) Details Defining a Roadmap for Migrating to Oracle BI Applications on ODI Details See What’s There and What’s Coming with BICS & Data Visualization Details Free form Data Visualization, Mashup BI and Advanced Analytics with BI 12c Details Oracle Data Visualization Cloud Service Hands-On Lab with Customer Use Cases Details On Metadata, Mashups and the Future of Enterprise BI Details OBIEE 12c and the Leap Forward in Lifecycle Management Details Supercharge BI Delivery with Continuous Integration Details Visual Analyzer and Best Practices for Data Discovery Details BI Movie Magic: Maps, Graphs, and BI Dashboards at AMC Theatres Details Oracle Business Intelligence (OBIEE) the Smart View Way Details Big Data Presentations (Click on Details to access file if submitted by presenter) Oracle Big Data: Strategy and Roadmap Details Oracle Modern Manufacturing - Bridging IoT, Big Data Analytics and ERP for Better Results Details Leveraging Oracle Big Data Discovery to Master CERN’s Control Data Details Enrich, Transform and Analyse Big Data using Big Data Discovery and Visual Analyzer Details Oracle Big Data SQL: Unified SQL Analysis Across the Big Data Platform Details High Speed Video Processing for Big Data Applications Details Enterprise Data Hub with Oracle Exadata and Oracle Big Data Appliance Details How to choose between Hadoop, NoSQL or Oracle Database Details Analytical SQL in the Era of Big Data Details Cloud Computing Presentations (Click on Details to access file if submitted by presenter) Oracle DBaaS Migration Road Map Details Centralizing Spatial Data Management with Oracle Cloud Databases Details End Users data in BI - Data Mashup and Data Blending with BICS , DVCS and BI 12c Details Oracle BI Tools on the Cloud--On Premise vs. Hosted vs. Oracle Cloud Details Hybrid Cloud Using Oracle DBaaS: How the Italian Workers Comp Authority Uses Graph Technology Details Build Your Cloud with Oracle Engineered Systems Details Safe Passage to the CLOUD – Analytics Details Your Journey to the Cloud : From Dedicated Physical Infrastructure to Cloud Bursting Details Data Warehousing and ETL Presentations (Click on Details to access file if submitted by presenter) Getting to grips with SQL Pattern Matching Details Making SQL Great Again (SQL is Huuuuuuuuuuuuuuuge!) Details Controlling Execution Plans (without Touching the Code) Details Taking Full Advantage of the PL/SQL Result Cache Details Taking Full Advantage of the PL/SQL Compiler Details Advanced SQL: Working with JSON Data Details Oracle Database In-Memory Option Boot Camp: Everything You Need to Know Details Best Practices for Getting Started With Oracle Database In-Memory Details Extreme Data Warehouse Performance with Oracle Exadata Details Real-Time SQL Monitoring in Oracle Database 12c Details A Walk Through the Kimball ETL Subsystems with Oracle Data Integration Details MySQL 5.7 Performance: More Than 1.6M SQL Queries per Second Details Implement storage tiering in Data warehouse with Oracle Automatic Data Optimization Details Edition-Based Redefinition Case Study Details 12-Step SQL Tuning Method Details Where's Waldo? Using a brute-force approach to find an Execution Plan the CBO hides Details Delivering an Enterprise-Wide Standard Chart of Accounts at GE with Oracle DRM Details Agile Data Engineering: Introduction to Data Vault Data Modeling Details Worst Practice in Data Warehouse Design Details Same SQL Plan, Different Performance Details Why Use PL/SQL? Details Transforming one table to another: SQL or PL/SQL? Details Understanding the 10053 Trace Details Analytic Views - Bringing Star Queries into the Twenty-First Century Details The Place of SQL in the Hybrid World Details The Next Generation of the Oracle Optimizer Details Internet of Things Presentations (Click on Details to access file if submitted by presenter) Oracle Modern Manufacturing - Bridging IoT, Big Data Analytics and ERP for Better Results Details Meet Your Digital Twin Details Industrial IoT and Machine Learning - Making Wind Energy Cost Competitive Details Fault Detection using Advanced Analytics at CERN's Large Hadron Collider: Too Hot or Too Cold Details Big Data and the Internet of Things in 2016: Beyond the Hype Details IoT for Big Machines Details The State of Internet of Things (IoT) Details Oracle Spatial Summit Presentations (Click on Details to access file if submitted by presenter) Build Your Own Maps with the Big Data Discovery Custom Visualization Component Details Massively Parallel Calculation of Catchment Areas in Retail Details Dismantling Criminal Networks with Graph and Spatial Visualization and Analysis Details Best Practices for Developing Geospatial Apps for the Cloud Details Map Visualization in Analytic Apps in the Cloud, On-Premise, and Mobile Details Best Practices, Tips and Tricks with Oracle Spatial and Graph Details Delivering Smarter Spatial Data Management within Ordnance Survey, UK Details Deploying a Linked Data Service at the Italian National Institute of Statistics Details ATLAS - Utilizing Oracle Spatial and Graph with Esri for Pipeline GIS and Linear Asset Management Details Oracle Spatial 12c as an Applied Science for Solving Today's Real-World Engineering Problems Details Assembling a Large Scale Map for the Netherlands Using Oracle 12c Spatial and Graph Details Using Open Data Models to Rapidly Develop and Prototype a 3D National SDI in Bahrain Details Implementation of LBS services with Oracle Spatial and Graph and MapViewer in Zain Jordan Details Interactive map visualization of large datasets in analytic applications Details Gain Insight into Your Graph Data -- A hands on lab for Oracle Big Data Spatial and Graph Details Applying Spatial Analysis To Big Data Details Big Data Spatial: Location Intelligence, Geo-enrichment and Spatial Analytics Details What’s New with Spatial and Graph? Technologies to Better Understand Complex Relationships Details Graph Databases: A Social Network Analysis Use Case Details High Performance Raster Database Manipulation and Data Processing with Oracle Spatial and Graph Details 3D Data Management - From Point Cloud to City Model Details The Power of Geospatial Visualization for Linear Assets Using Oracle Enterprise Asset Management Details Oracle Spatial and Graph: New Features for 12.2 Details Fast, High Volume, Dynamic Vehicle Routing Framework for E-Commerce and Fleet Management Details Managing National Broadband Infrastructure at Turk Telekom with Oracle Spatial and Graph Details Other Presentations (Click on Details to access file if submitted by presenter) Taking Full Advantage of the PL/SQL Compiler Details Taking Full Advantage of the PL/SQL Result Cache Details Meet Your Digital Twin Details Making SQL Great Again (SQL is Huuuuuuuuuuuuuuuge!) Details Lightning Round for Vendors Details

We had a great www.biwasummit.org event with ~425 attendees, in depth technical presentations delivered by experts and even had several 2 hour Hands on Labs training classes that usedthe Oracle...

BIWA's Got Talent YouTube Demo Contest! - Enter and Win $500!!!

Best Oracle "Tech Stack" YouTube Demo Contest! BIWA Wants YOU (Customers, Partners, Oracle Employees, whatever--everyone!) to post on YouTube one or multiple YouTube videos that highlight BIWA focused Oracle technologies/products/features or anything BIWA related!   See #BIWASGOTTALENT Contest Details Two categories Customers, Partners, Students, Friends of BIWA--Anyone! Oracle Employees--Note:  Any concerns about eligibility for Oracle employees is the responsibility of the employee Judges will award points per the following scheme--MAX 100 points Maximum of 40 points: Perception of usefulness and value added to the BIWA community, user or company Maximum 25 points (5 points each): Each Oracle product or major feature highlighted e.g. 5 points for OAA, 5 points for Spatial, 5 points for OBIEE, BDA, BDD, etc. Maximum of 10 points:  Completeness and clarity of associated documentation, reusable code, etc. Maximum of 15 points:  Intangibles e.g. cleverness, sizzle, coolness, etc.--whatever excites and moves the judges Maximum of 10 points: Most "likes" on YouTube Each YouTube recorded "live" entry must include: BIWAS GOT TALENT with BIWA Summit 2016 Logo (above on this page) Title of your YouTube Video Author(s), titles and contact information Include #BIWASGOTTALENT in the meta information on YouTube When submitting on YouTube and send an email to biwasgottalent@gmail.com with a link Presentation must be not to exceed more than 10 minutes of YouTube video. Submissions longer than 10 minute will be severely penalized by the judges. :( The top two presentations in each category will be shown at BIWA Summit 2016 in Redwood Shores, California, January 26-28, 2016 Winners will be chosen based on a combination of the number of points received from the judges.  Submitters are encouraged to promote their #BIWASGOTTALENT video to accumulate "likes".  Prize can be taken as cash or donation to charity. Rules, Regulations and Other Details: By submitting your entry, you agree that BIWA may use your submission for marketing or other purposes The winner will be notified by email by January 28th, 2016 and does not have to be present at BIWA Summit 2016 to win For questions, please email biwasgottalent@gmail.com

Best Oracle "Tech Stack" YouTube Demo Contest! BIWA Wants YOU (Customers, Partners, Oracle Employees, whatever--everyone!) to post on YouTube one or multiple YouTube videos that highlight BIWA focused...

NHS Business Services Authority Gains Better Insight into Data, Identifies circa GBP100 Million (US$156 Million) in Potential Savings in Just Three Months

NHS Business Services Authority Gains Better Insightinto Data, Identifies circa GBP100 Million (US$156 Million) in PotentialSavings in Just Three Months Oracle Customer: NHS Business Services Authority Location:  Newcastle upon Tyne, United Kingdom Industry: Public Sector Employees:  2,800 Annual Revenue:  $100 to $500 Million The NHS Business ServicesAuthority (NHSBSA) is a special health authority and an arm’s length body ofthe Department of Health for England. It provides a range of critical centralservices to NHS organizations, contractors, patients, and the public. Servicesinclude managing the NHS Pension schemes in England and Wales, managingpayments to primary care dental and pharmacy contractors, and administering theEuropean Health Insurance Card (EHIC). The NHS budget for 2015/16 is approximately GBP116 billion (US$179 billion) andthe total funds administered by the NHSBSA (including those for the NHS Pensionschemes) amount to circa GBP32 billion (US$48 billion). The Department ofHealth asked the NHSBSA to take a proactive role to identify opportunities toreduce costs and eliminate waste. One way to do this was to find better ways touse the vast volumes of data already collected and held within the organizationto help reduce fraud and error throughout the health service. The NHSBSA needed a new, centralized solution that would enable it to gainbetter value from its data which is spread across a disparate set of ITsystems, data, storage, and analytical capabilities. To achieve this, it chosean end-to-end Oracle solution including OracleAdvanced Analytics, OracleExadata Database Machine, OracleExalytics In-Memory Machine, OracleEndeca Information Discovery, and OracleBusiness Intelligence Enterprise Edition. With this Oracle solution, the NHSBSA established its Data Analytics LearningLaboratory (DALL), investing in both technology and expertise to create insightfrom its data. Within the first three months of operation, the organizationidentified circa GBP100 million (US$156 million) in potential savings. Uncovering Savings inDentistry A word from NHS BusinessServices Authority “Oracle Advanced Analytics’ data mining capabilities and Oracle Exalytics’ performance really impressed us. The overall solution is very fast, and our investment very quickly provided value. We can now do so much more with our data, resulting in significant savings for the NHS as a whole.” – Nina Monckton, Head of Information Services, NHS Business Services Authority The NHSBSA used analytics toidentify significant savings within NHS dental services and find instances ofactivities which do not demonstrate good value for money. “With Oracle Advanced Analytics, it is much easier to detect anomalies inbehaviors. We used anomaly detection to discover where there might be evidenceof inappropriate behavior in dentists’ claims, enabling NHS commissioners tofollow up and challenge their activities,” explained Nina Monckton, head ofinformation services, NHSBSA. Preventing Fraud forEuropean Health Insurance Card The EHIC is available to allEuropean citizens covered by a statutory social security scheme and entitlesthem to free healthcare while visiting other European countries.  During analysis of EHIC data, the NHSBSA discovered commercial addresses beingused fraudulently to apply for EHIC cards and uncovered the use of invalid NHSand National Insurance numbers to apply for a card.  “We used Oracle Exalytics and Oracle Business Intelligence for the EHICapplication to improve the front-end validation process, prevent fraud, andblacklist addresses showing suspicious activities,” Monckton said. Analyzing Billions ofRecords in Minutes The NHSBSA receives datarelating to more than one billion prescription items dispensed in primary caresettings each year. Previously, the NHSBSA did not have the computing power toanalyze this data at transaction level. The NHSBSA can now analyze billions of records at one time, and by analyzingmuch larger sets of patient data, the NHSBSA can provide insight that ishelping to improve standards of care throughout the health service. “Previously, our information analysts did not have the ability to directlyquery data as it was mainly held in live operational systems. Now that we areable to transfer data to our Exadata environment, we have dramatically improvedour ability to deliver value from our data,” Monckton said. Analyzing UnstructuredText to Measure Satisfaction Improving Data Matching ToSave Millions of Dollars In England, some people areentitled to free medical prescriptions or dental treatment from the NHS. TheNHSBSA works with the Department of Work and Pensions (DWP) to establish thatthose patients declaring that they are exempt from a charge for dentaltreatment and/or medical prescriptions are claiming correctly. Using OracleExalytics to compare datasets, the NHSBSA reduced the rate of non-matchingrecords for dentistry from 15% to just 5%. The Role of DataGovernance Data is now moving to theheart of all NHSBSA programs. As a result of the organization’s new analyticscapability, teams have a better understanding of what they can do with the dataand are more careful about what data they collect.  “We now know that if we collect the right data at the start of a program, wecan measure what is working down the line. We are starting to change theculture of the organization around our data governance. There has been a massiveshift. Data is now central to all our new programs, and data governance is atthe heart of everything we do,” Monckton said. Using the Data AnalyticsLearning Laboratory to Achieve Strategic Goals The NHSBSA’s data analyticsinvestment is helping the organization to achieve its 5 year strategic goals,which include helping to save GBP1 billion (US$1.56 billion) for NHS patients,reducing unit costs by 50%, improving service and delivering great results forcustomers, and deriving insight from data to drive change. “With our newly established Data Lab in place, we can add even more value tothe NHS. I cannot begin to describe how significant that has been. This projectis really helping us to achieve our strategic goals. In addition, we areworking in a different way now and it has even helped with how people interactand function in the workplace. “We’ve had a very positive response, and our chief executive is extremelyimpressed with our achievements and the results we have shown so far. As aresult, management is recommending that our suppliers and partners come to seewhat we are doing to learn from our experiences,” Monckton said. Over the next six months, the DALL team has a large number of analyticsprojects in the pipeline and is looking to help other areas of the business tobetter leverage their data. The organization will focus on how it can useOracle Business Intelligence Enterprise Edition with business users. Inaddition, the NHSBSA is investigating how it might share data and its analyticalability with other government organizations to drive further value from itsinvestment. Challenges Use new insight gathered from data to help identify cost savings and meet NHSBSA strategic goals Identify and prevent healthcare fraud and benefit eligibility errors to save costs Leverage existing data to transform business and productivity Solutions Oracle Product andServices Oracle Advanced Analytics Oracle Exalytics In-Memory Machine Oracle Endeca Information Discovery Oracle Exadata Database Machine Oracle Business Intelligence Suite, Enterprise Edition Identified up to GBP100 million (US$156 million) that could potentially be saved across the NHS through benefit fraud and error reduction, by deploying new analytics infrastructure Identified and implemented changes to prevent fraudulent European Health Insurance Card (EHIC) applications Used data matching to identify savings that can be made through the recovery of money from patients claiming exemption from charges for dental treatment or prescriptions when not eligible to do so Used anomaly detection to uncover fraudulent activity where some dentists split a single course of treatment into multiple parts and presented claims for multiple treatments Analyzed unstructured text to measure employee satisfaction in more detail and found a direct link between those who felt less engaged at work and those more likely to take time off sick Analyzed billions of records at one time to measure longer-term patient journeys and to analyze drug prescribing patterns to improve patient care Established a new Data Analytics Learning Laboratory (DALL) that uses data and analytics to drive action and significant savings for the NHS Implemented Oracle Advanced Analytics, Oracle Exadata Database Machine, Oracle Exalytics In-Memory Machine, Oracle Endeca Information Discovery, and Oracle Business Intelligence Enterprise Edition to deliver fast analysis and data mining for NHS and wider government departments Why Oracle “We chose Oracle because thesolution could cope with very large data volumes running into billions of rowsand could scale as volumes increase. In addition, the Oracle solution requiredno IT team support to run the queries, which enables our team of data analyststo be self-sufficient. Oracle Exalytics’ in-memory capability gave us the speedwe required, and Oracle’s engineered systems accelerated deployment and reducedrisk. “Working with Oracle has been a very positive experience. The team has beenincredibly responsive and provided a number of experts to help us get up andrunning as quickly as possible. With one vendor providing the whole solution,it’s very easy for us. If we need help, we know where to go,” Monckton said. Implementation Process Oracle ran a proof of concept(POC) to show the speed and capability of the proposed end-to-end solution. ThePOC used publically available data sets for NHS prescription data. It covered50 million prescribed items, 300 million records, and six months of data. Theteam concentrated on finding anomalies in the data and carrying out furtheranalysis to understand them before presenting the findings in a clear andstraightforward way. Following the POC, Oracle worked with NHSBSA and its data center partner,Capita, to complete the implementation. During implementation, Oracle providedthe NHSBSA with access to a virtual environment. This enabled the team to getsome experience with the tools before completing the implementation. As such,NHSBSA was familiar and confident with using the new analytics tools from dayone, saving considerable time and gaining immediate value. NHSBSA identified which data it should use for analysis and transferred itacross to its Oracle Exadata environment. To date it has transferred more than15 billion rows of data into Oracle Exadata. The prescription services databasewith 14 billion rows of data is the largest exported data source using 400gigabytes. The export took 10 hours to complete with Oracle as the sourcedatabase.  Advice from NHSBSA Have a clear plan for the first six months before you begin your implementation Ensure you have buy-in from key stakeholders Choose easy areas to start with, so you can demonstrate positive results quickly and prove the value of the solution to others Build knowledge within your team through training and Oracle events; this helps staff to think differently about the possibilities of using data Get help from the experts: talk to your existing suppliers, go to analytics events, and talk to other organizations who have implemented analytics It’s never too early to think about data governance and data quality: recruit a data standards manager to create data governance policies and identify data leads around the business Resources More Reference Assets About Public Sector

NHS Business Services Authority Gains Better Insight into Data, Identifies circa GBP100 Million (US$156 Million) in Potential Savings in Just Three Months Oracle Customer: NHS Business Services...

Oracle Advanced Analytics at Oracle Open World 2015

While there are a lotof OOW talks that include the work “analytics” or “big data”, this is my shortlist of sessions, training and demos that primarily focus on Oracle Advanced Analytics. Hope to see you there! Charlie  Oracle Advanced Analytics at OOW'15Highlights Big DataAnalytics with Oracle Advanced Analytics12c and Big Data SQL & Fiserv Case Study: Fraud Detection in Online Payments [CON8743] Tuesday,Oct 27, 5:15 p.m. | Moscone South—307 · Charles Berger, Sr. Director of ProductManagement, Advanced Analytics and Data Mining, Oracle · Miguel M Barrera, Director of Risk Analytics andStrategy · Julia Minkowski, Risk Analytics Manager OracleAdvanced Analytics 12c delivers parallelized in-databaseimplementations of data mining algorithms and integration with R. Data analystsuse Oracle Data Miner GUI and R to build and evaluate predictive models andleverage R packages and graphs. Application developers deploy Oracle AdvancedAnalytics models using SQL data mining functions and R. Oracle extends OracleDatabase to an analytical platform that mines more data and data types,eliminates data movement, and preserves security to automatically detectpatterns, anticipate customer behavior, and deliver actionable insights. OracleBig Data SQL adds new big data sources and Oracle R Advanced Analytics forHadoop provides algorithms that run on Hadoop.  Fiserv manage risk for $30B+ in transfers, servicing 2,500+ US financialinstitutions, including 27 of the top 30 banks and prevents $200M in fraudlosses every year.  When dealing with potential fraud, reaction needs tobe fast.  Fiserv describes their use of Oracle Advanced Analytics forfraud prevention in online payments and shares their best practices and resultsfrom turning predictive models into actionable intelligence and next generationstrategies for risk mitigation.  ConferenceSession OAA Demo Pod (#3581—Big Data Predictive Analytics with Oracle Advanced Analytics, R, and Oracle Big Data SQL   Moscone South The Oracle Advanced Analytics database option embeds powerful data mining algorithms in Oracle Database’s SQL kernel and adds integration with R for solving big data problems such as predicting customer behavior, anticipating churn, detecting fraud, and performing market basket analysis. Data analysts work directly with database data, using the Oracle Data Miner workflow GUI (SQL Developer 4.1 ext.), SQL, or R languages and can extend Oracle Advanced Analytics’ functionally with R graphics and CRAN packages. Oracle Big Data SQL enables Oracle Advanced Analytics models to run on Oracle Big Data Appliance. Oracle R Advanced Analytics for Hadoop provides a powerful R interface over Hadoop and Spark with parallel-distributed predictive algorithms. Learn more in this demo. RealBusiness Value from Big Data and Advanced Analytics [UGF4519] Sunday, Oct25, 3:30 p.m. | Moscone South—301 · Antony Heljula, Technical Director, PeakIndicators Limited · Brendan Tierney, Principal Consultant, Oralytics Attend thissession to hear real case studies where big data and advanced analytics havedelivered significant return on investment to a variety of Oracle customers.These solutions can pay for themselves within one year. Customer case studiesinclude predicting which employees are likely to leave within the next 12months, predicting which sales outlets are likely to suffer from out-of-stockproducts, predicting sales based on the weather forecast, and predicting whichstudents are likely to withdraw early from their courses. A live demonstrationillustrates the high-level process for implementing predictive businessintelligence (BI) and its best practices.  User GroupForum Session CustomerPanel: Big Data and Data Warehousing [CON8741] Wednesday,Oct 28, 4:15 p.m. | Moscone South—301 · Craig Fryar, Head of Wargaming BusinessIntelligence, Wargaming.net · Manuel Martin Marquez, Senior Research Fellow andData Scientist, Cern Organisation Européenne Pour La Recherche Nucléaire · Jake Ruttenburg, Senior Manager, DigitalAnalytics, Starbucks · Chris Wones, Chief Enterprise Architect, 8451 · Reiner Zimmermann, Senior Director, DW & BigData Global Leaders Program, Oracle In thissession, hear how customers around the world are solving cutting-edgeanalytical business problems using Oracle Data Warehouse and big datatechnology. Understand the benefits of using these technologies together, andhow software and hardware combined can save money and increase productivity.Learn how these customers are using Oracle Big Data Appliance, Oracle Exadata,Oracle Exalytics, Oracle Database In-Memory 12c, or Oracle Analytics todrive their business, make the right decisions, and find hidden information.The conversation is wide-ranging, with customer panelists from a variety ofindustries discussing business benefits, technical architectures,implementation of best practices, and future directions.  ConferenceSession End-to-End Analytics AcrossBig Data and Data Warehouse for Data Monetization [CON3296] Monday, Oct26, 4:00 p.m. | Moscone West—2022 · Satya Bhamidipati, Senior Principal AdvancedAnalytics Market Dev, Business Analytics Product Group, Oracle · Gokula Mishra, VP, Big Data & AdvancedAnalytics, Oracle Organizationshave used data warehouses to manage structured and operational data, whichprovides business analysts with the ability to analyze key internal data andspot trends. However, the explosion of newer data sources (big data) not onlychallenges the role of the traditional data warehouse in analyzing data fromthese diverse sources but also exposes limitations posed by traditionalsoftware and hardware platforms. This newer data can be combined with the datain the data warehouse and analyzed without creating another data silo andcreating a hybrid data analytics structure. This presentation discusses thedata and analytics platform architecture that enables this data monetizationand presents various industry use cases.  ConferenceSession BuildingPredictive Models for Identifying and Preventing Tax Fraud [CON3294] Wednesday,Oct 28, 9:00 a.m. | Park Central—Concordia · Brian Bequette, Managing Partner, TPS · Satya Bhamidipati, Senior Principal AdvancedAnalytics Market Dev, Business Analytics Product Group, Oracle Accordingto a TIGTA Audit Report issued in February 2013, in 2012 alone, the IRSidentified almost 642,000 incidents of identity theft affecting taxadministration, a 38 percent increase since 2010. And this number continues toincrease. Tax Processing Systems (TPS) consultants have focused on frauddetection and developed innovative solutions and proprietary algorithms fordetecting fraud. In 2012, TPS formed a partnership with Oracle and has adaptedits cloud-based methodologies and algorithms for use on the Oracle technologystack. Together, TPS and Oracle have created an end-to-end fraud detectionsolution that is effective, efficient, and accurate. This presentation focuseson the technology and the algorithms they have developed to detect fraud.  ConferenceSession Oracle University Pre-OOW Course –Sunday, Oct. 25th Using Data Mining Techniques forPredictive Analysis Course, Sunday October 25th This session teaches students thebasic concepts of data mining and how to leverage the predictive analyticalpower of data mining with Oracle Database by using Oracle Data Miner 12c.Students will learn how to explore the data graphically, build and evaluatemultiple data models, apply data mining models to new data, and deploy datamining's predications and insights throughout the enterprise. All this can beperformed on the data in Oracle Database on a real-time basis by using OracleData Miner SQL APIs. As the data, models, and results remain in OracleDatabase, data movement is eliminated, security is maximized, and informationlatency is minimized. See Oracle University at Oracle OpenWorld and Make the Most of Your Oracle OpenWorld and JavaOneExperience with Preconference Training by Oracle Experts When: Sunday, October 25, 2015, 9a.m.-4 p.m., with a one-hour lunch break Where: Golden Gate University, 536 Mission Street, San Francisco,CA 94105 (three blocks from Moscone Center) Cost: US$850 for a full day of training (cost includes lightrefreshments and a boxed lunch) Instructor: Ashwin Agarwal… Read full bio Target Audience: Data scientists, applicationdevelopers, and data analysts Course Objectives: Understand the basic concepts and describe the primary terminology of data mining Understand the steps associated with a data mining process Use Oracle Data Miner 12c to perform data mining Understand the options for deploying data mining predictive results Course Topics: Understanding the Data Mining Concepts Understanding the Benefits of Predictive Analysis Understanding Data Mining Tasks Key Steps of a Data Mining Process (Includes Demo) Using Oracle Data Miner to Build, Evaluate, and Apply Multiple Data Mining Models Includes Demo) Using Data Mining Predictions and Insights to Address Various Business Problems (Includes Demo) Predicting Individual Behavior (Includes Demo) Predicting Values (Includes Demo) Finding Co-Occurring Events (Includes Demo) Detecting Anomalies (Includes Demo) Learning How to Deploy Data Mining Results for Real-Time Access by End Users Prerequisites: A working knowledge of the SQL language and OracleDatabase design and administration Also, on the Big Data + Analytics related products OTN pages, there is a“Must See” Program Guide. Clicking onthe .pdf link http://www.oracle.com/technetwork/database/openworld2015pdf-2650488.pdfyou’ll see the full list.

While there are a lot of OOW talks that include the work “analytics” or “big data”, this is my short list of sessions, training and demos that primarily focus on Oracle Advanced Analytics. Hope to see...

Oracle Advanced Analytics Oracle University (OU) Classes in Cambridge, MA. September 28-Oct. 1, 2015

Oracle University has rescheduled their 2 day back to back Oracle Advanced Analytics OU Classes in Cambridge, MA.   Please help spread the word.  Oracle Advanced Analytics combo-course (ODM + ORE) training Oracle Data Mining Techniques (ID:4443453)   September 28 – 29, 2015 Oracle R Enterprise Essentials (ID:4443455)   September 30 - October 1, 2015 This is great opportunity for big data analytics customers and partners to learn hands on about using Oracle Advanced Analytics.  Vlamis, authorized OU instructor(s), will be teaching the OAA/ODM & OAA/ORE courses again and have been a great and knowledgeable OAA training and implementation partner. The courses are also during the week of Predictive Analytics World in Boston (Oracle will be exhibiting and speaking) so perhaps a good time for customers to come to Boston, perhaps use some OU credits, learn some new skills and focus on Oracle’s predictive analytics.  Anyone (customers and Oracle Employees) can register through us at http://www.vlamis.com/training/ or via their normal OU connections. They should be able to utilize OU training credits for either course.  Oracle Employees should register through the Employee Self Service from Self Service Applications Please forward to any appropriate Oracle Advanced Analytics customers and partners.  Thanks! Charlie 

Oracle University has rescheduled their 2 day back to back Oracle Advanced Analytics OU Classes in Cambridge, MA.   Please help spread the word.  Oracle Advanced Analytics combo-course (ODM +...

Big Data Analytics with Oracle Advanced Analytics: Making Big Data and Analytics Simple white paper

Big Data Analytics with Oracle Advanced Analytics: Making Big Data and Analytics Simple Oracle White Paper  |  July 2014  Executive Summary: Big Data Analytics with Oracle Advanced Analytics (Click HERE to read entire Oracle white paper)   (Click HERE to watch YouTube video) The era of “big data” and the “cloud” are driving companies to change.  Just to keep pace, they must learn new skills and implement new practices that leverage those new data sources and technologies.  Increasing customer expectations from sharing their digital exhaust with corporations in exchange for improved customer interactions and greater perceived value are pushing companies forward.  Big data and analytics offer the promise to satisfy these new requirements.  Cloud, competition, big data analytics and next-generation “predictive” applications are driving companies towards achieving new goals of delivering improved “actionable insights” and better outcomes.  Traditional BI & Analytics approaches don’t deliver these detailed predictive insights and simply can’t satisfy the emerging customer expectations in this new world order created by big data and the cloud. Unfortunately, with big data, as the data grows and expands in the three V’s; velocity, volume and variety (data types), new problems emerge.  Data volumes grow and data becomes unmanageable and immovable.  Scalability, security, and information latency become new issues.  Dealing with unstructured data, sensor data and spatial data all introduce new data type complexities.   Traditional advanced analytics has several information technology inherent weak points: data extracts and data movement, data duplication resulting in no single-source of truth, data security exposures, separate and many times, depending on the skills of the data analysts/scientists involved, multiple analytical tools (commercial and open source) and languages (SAS, R, SQL, Python, SPSS, etc.).  Problems become particularly egregious during a deployment phase when the worlds of data analysis and information management collide.    Traditional data analysis typically starts with a representative sample or subset of the data that is exported to separate analytical servers and tools (SAS, R, Python, SPSS, etc.) that have been especially designed for statisticians and data scientists to analyze data.  The analytics they perform range from simple descriptive statistical analysis to advanced, predictive and prescriptive analytics.  If a data scientist builds a predictive model that is determined to be useful and valuable, then IT needs to be involved to figure out deployment and enterprise deployment and application integration issues become the next big challenge. The predictive model(s)—and all its associated data preparation and transformation steps—have to be somehow translated to SQL and recreated inside the database in order to apply the models and make predictions on the larger datasets maintained inside the data warehouse.  This model translation phase introduces tedious, time consuming and expensive manual coding steps from the original statistical language (SAS, R, and Python) into SQL.  DBAs and IT must somehow “productionize” these separate statistical models inside the database and/or data warehouse for distribution throughout the enterprise.  Some vendors will charge for specialized products and options for just for predictive model deployment.  This is where many advanced analytics projects fail.  Add Hadoop, sensor data, tweets, and expanding big data reservoirs and the entire “data to actionable insights” process becomes more challenging.   Not with Oracle.  Oracle delivers a big data and analytics platform that eliminates the traditional extract, move, load, analyze, export, move load paradigm.  With Oracle Database 12c and the Oracle Advanced Analytics Option, big data management and big data analytics are designed into the data management platform from the beginning.  Oracle’s multiple decades of R&D investment in developing the industry’s leading data management platform, Oracle SQL, Big Data SQL, Oracle Exadata, Oracle Big Data Appliance and integration with open source R are seamlessly combined and integrated into a single platform—the Oracle Database.   Oracle’s vision is a big data andanalytic platform for the era of big data and cloud to: Make bigdata and analytics simple (for any data size, on any computerinfrastructure and any variety of data, in any combination) and Make bigdata and analytics deployment simple (asa service, as a platform, as an application) Oracle Advanced Analytics offers a wide library of powerful in-database algorithms and integration with open source R that together can solve a wide variety of business problems and can be accessed via SQL, R or GUI.  Oracle Advanced Analytics, an option to the Oracle Database Enterprise Edition 12c, extends the database into an enterprise-wide analytical platform for data-driven problems such as churn prediction, customer segmentation, fraud and anomaly detection, identifying cross-sell and up-sell opportunities, market basket analysis, and text mining and sentiment analysis.  Oracle Advanced Analytics empowers data analyst, data scientists and business analysts to more extract knowledge, discover new insights and make informed predictions—working directly with large data volumes in the Oracle Database.    Data analysts/scientists have choice and flexibility in how they interact with Oracle Advanced Analytics.  Oracle Data Miner is an Oracle SQL Developer extension designed for data analysts that provides an easy to use “drag and drop” workflow GUI to the Oracle Advanced Analytics SQL data mining functions (Oracle Data Mining).  Oracle SQL Developer is a free integrated development environment that simplifies the development and management of Oracle Database in both traditional and Cloud deployments. When Oracle Data Miner users are satisfied with their analytical methodologies, they can share their workflows with other analysts and/or generate SQL scripts to hand to their DBAs to accelerate model deployment.  Oracle Data Miner also provides a PL/SQL API for workflow scheduling and automation.   R programmers and data scientists can use the familiar open source R statistical programming language console, RStudio or any IDE to work directly with data inside the database and leverage Oracle Advanced Analytics’ R integration with the database (Oracle R Enterprise).  Oracle Advanced Analytics’ Oracle R Enterprise provides transparent SQL to R translation to equivalent SQL and Oracle Data Mining functions for in-database performance, parallelism, and scalability—this making R ready for the enterprise.   Application developers, using the ODM SQL data mining functions and ORE R integration can build completely automated predictive analytic solutions that leverage the strengths of the database and the flexibly of R to integrate Oracle Advanced Analytics analytical solutions into BI dashboards and enterprise applications. By integrating big data management and big data analytics into the same powerful Oracle Database 12c data management platform, Oracle eliminates data movement, reduces total cost of ownership and delivers the fastest way to deliver enterprise-wide predictive analytics solutions and applications.   (Click HERE to read entire Oracle white paper)

Big Data Analytics with Oracle Advanced Analytics: Making Big Data and Analytics Simple Oracle White Paper  |  July 2014  Executive Summary:  Big Data Analytics with Oracle Advanced Analytics (Click HERE...

Call for Abstracts at BIWA Summit'16 - The Oracle Big Data + Analytics User Conference

Abstract submissions are now being accepted! Please email shyamvaran@gmail.com with any questions regarding the submission process. What Successes Can You Share? We want to hear your story. Submit your proposal today for the Oracle BIWA Summit 2016. Proposals will be accepted through Monday evening, November 2, 2015, at midnight, EST. Don’t wait, though—we’re accepting submissions on a rolling basis, so that selected sessions can be published early on our online agenda. To submit your abstract, click here, select a track, fill out the form. Please note: Presentations must be noncommercial. Sales promotions for products or services disguised as proposals will be eliminated.  Speakers whose abstracts are accepted will be expected to submit (at a later date) a PowerPoint presentation slide set.  Accompanying technical and use case papers are encouraged, but not required. Speakers whose abstracts are accepted will be given a complimentary registration to the conference. (Any additional co-presenters must register for the event separately and provide appropriate registration fees. It is up to the co-presenters’ discretion which presenter to designate for the complimentary registration.)  This Year’s Tracks Proposals can be submitted for the following tracks:  Advanced Analytics Big Data Business Intelligence Cloud Computing Data Warehousing and Integration Internet of Things Spatial and Graph (click here for more information about Spatial Summit Submissions) Other More About the Conference The Oracle BIWA Summit 2016 is organized and managed by the Oracle BIWA SIG, the Oracle Spatial SIG, and the Oracle Northern California User Group. The event attracts top BI, data warehousing, analytics, Spatial, IoT and Big Data experts. The three-day event includes keynotes from industry experts, educational sessions, hands-on labs, and networking events. Hot topics include:  Database, data warehouse and cloud, Big Data architecture Deep dives and hands-on labs on existing Oracle BI, data warehouse, and analytics products Updates on the latest Oracle products and technologies (e.g. Big Data Discovery, Oracle Visual Analyzer, Oracle Big Data SQL) Novel and interesting use cases on everything – Spatial, Graph, Text, Data Mining, IoT, ETL, Security, Cloud Working with Big Data (e.g., Hadoop, "Internet of Things,” SQL, R, Sentiment Analysis) Oracle Business Intelligence (OBIEE), Oracle Big Data Discovery, Oracle Spatial, and Oracle Advanced Analytics—Better Together Hope to see you at BIWA'16 in January, 2016! Charlie

Abstract submissions are now being accepted! Please email shyamvaran@gmail.com with any questions regarding the submission process. What Successes Can You Share? We want to hear your story. Submit your...

Oracle Data Miner 4.1, SQL Developer 4.1 Extension Now Available!

To download, visit:   http://www.oracle.com/technetwork/developer-tools/sql-developer/overview/index-097090.html New Data Miner Features in SQL Developer 4.1 These new Data Miner 4.1 features are supported for database versions supported by Oracle Data Miner:  JSON Data Support for Oracle Database 12.1.0.2 and above In response to the growing popularity of JSON data and its use in Big Data configurations, Data Miner now provides an easy to use JSON Query node. The JSON Query node allows you to select and aggregate JSON data without entering any SQL commands. The JSON Query node opens up using all of the existing Data Miner features with JSON data. The enhancements include:Data Source Node o    Automatically identifies columns containing JSON data by identifying those with the IS_JSON constraint. o    Generates JSON schema for any selected column that contain JSON data. o    Imports a JSON schema for a given column. o    JSON schema viewer.Create Table Node o    Ability to select a column to be typed as JSON. o    Generates JSON schema in the same manner as the Data Source node. JSON Data Type o    Columns can be specifically typed as JSON data. JSON Query Node (see related JSON node blog posting)o    Ability to utilize any of the selection and aggregation features without having to enter SQL commands. o    Ability to select data from a graphical layout of the JSON schema, making data selection as easy as it is with scalar relational data columns. o    Ability to partially select JSON data as standard relational scalar data while leaving other parts of the same JSON document as JSON data. o    Ability to aggregate JSON data in combination with relational data. Includes the Sub-Group By option, used to generate nested data that can be passed into mining model build nodes. General Improvements o    Improved database session management resulting in less database sessions being generated and a more responsive user interface. o    Filter Columns Node - Combined primary Editor and associated advanced panel to improve usability. o    Explore Data Node - Allows multiple row selection to provide group chart display. o    Classification Build Node - Automatically filters out rows where the Target column contains NULLs or all Spaces. Also, issues a warning to user but continues with Model build. o    Workflow - Enhanced workflows to ensure that Loading, Reloading, Stopping, Saving operations no longer block the UI. o    Online Help - Revised the Online Help to adhere to topic-based framework.Selected Bug Fixes (does not include 4.0 patch release fixes)o    GLM Model Algorithm Settings: Added GLM feature identification sampling option (Oracle Database 12.1 and above). o    Filter Rows Node: Custom Expression Editor not showing all possible available columns. o    WebEx Display Issues: Fixed problems affecting the display of the Data Miner UI through WebEx conferencing.For More Information and Support, please visit the Oracle Data Mining Discussion Forum on the Oracle Technology Network (OTN) Return to Oracle Data Miner page on OTN

To download, visit:   http://www.oracle.com/technetwork/developer-tools/sql-developer/overview/index-097090.html New Data Miner Features in SQL Developer 4.1 These new Data Miner 4.1 features are...

OpenWorld 2015 Call for Proposals Extended to Wed, May 6th, 11:59 p.m

OpenWorld 2015 Call for Proposals Extended to Wed, May 6th, 11:59 p.mhttps://www.oracle.com/openworld/call-for-proposals.htmlSubmit your Oracle Advanced Analytics stories now If you’re an Oracle technology expert, conference attendees want to hear it straight from you. So don’t wait—proposals must be submitted by April 29. Wanted: Outstanding Oracle Experts The Oracle OpenWorld 2015 Call for Proposals is now open. Attendees at the conference are eager to hear from experts on Oracle business and technology. They’re looking for insights and improvements they can put to use in their own jobs: exciting innovations, strategies to modernize their business, different or easier ways to implement, unique use cases, lessons learned, the best of best practices. If you’ve got something special to share with other Oracle users and technologists, they want to hear from you, and so do we. Submit your proposal now for this opportunity to present at Oracle OpenWorld, the most important Oracle technology and business conference of the year. We recommend you take the time to review the General Information, Submission Information, Content Program Policies, and Tips and Guidelines pages before you begin. We look forward to your submissions. Submit Your Proposal By submitting a session for consideration, you authorize Oracle to promote, publish, display, and disseminate the content submitted to Oracle, including your name and likeness, for use associated with the Oracle OpenWorld and JavaOne San Francisco 2015 conferences. Press, analysts, bloggers and social media users may be in attendance at OpenWorld or JavaOne sessions. Submit Now General Information Conference location: San Francisco, California, USA Dates: Sunday, October 25 to Thursday, October 29, 2015 Website: Oracle OpenWorld Key Dates for 2015 Deliverables Due Dates Call for Proposals—Open Wednesday, March 25 Call for Proposals—Closed Wednesday, April 29, 11:59 p.m. PDT Notifications for accepted and declined submissions sent Mid-June Contact us For questions regarding the Call for Proposals, send an e-mail to speaker-services_ww@oracle.com. For technical questions about the submission tool or issues with submitting your proposal, send an e-mail to OpenWorldContent@gpj.com. Oracle employee submitters should contact the appropriate Oracle track leads before submitting. To view a list of track leads, click here.

OpenWorld 2015 Call for Proposals Extended to Wed, May 6th, 11:59 p.mhttps://www.oracle.com/openworld/call-for-proposals.html Submit your Oracle Advanced Analytics stories now If you’re an Oracle...

Use Oracle Data Miner to Perform Sentiment Analysis inside Database using Twitter Data Demo

Sentiment analysis has been a hot topic recently; sentiment analysis or opinion mining refers to the application of natural language processing, computational linguistics, and text analytics to identify and extract subjective information in source materials.  Social media websites are good source of people sentiments.  Companies have been using social networking sites to make new product announcements, promote their products, collect product reviews and user feedback, interact with their customers, etc.  It is important for companies to sense customer sentiments toward their products, so they can react accordingly to benefit from customers’ opinion. In this blog, we will show you how to use Data Miner to perform some basic sentiment analysis (based on text analytics) using Twitter data.  The demo data was downloaded from the developer API console page of the Twitter website.  The data itself originated from the Oracle Twitter page, and it contains about a thousand tweets posted in the past six months (May to Oct 2014).  We will determine the sentiments (highly favored, moderately favored, and less favored) of tweets based on their favorite counts, and assign the sentiment to each tweet.  We then build classification models using these tweets along with their assigned sentiments.  The goal is to predict how well a new tweet will be received by customers.  This may help marketing department to better craft a tweet before it is posted. The demo (click here to download demo twitter data and workflow) will use the newly added JSON Query node in the Data Miner 4.1 to import the twitter data; please review the “How to import JSON data to Data Miner for Mining” blog entry in previous post. Workflow for Sentiment Analysis The following workflow shows the process we use to prepare the twitter data, determine the sentiments of tweets, and build classification models on the data. The following describes the nodes used in the above workflow: Data Source (TWITTER_LARGE) Select the demo Twitter data source.  The sample Twitter data is attached with this blog. JSON Query (JSON Query) Select the required JSON attributes used for analysis; we only use the “id”, “text”, and “favorite_count” attributes.  The “text” attribute contains the tweet, and the “favorite_count” attribute indicates how many times the tweet has been favorited. SQL Query (Cleanse Tweets) Remove shorten URLs and punctuations within tweets because these data contain no predictive information. Filter Rows (Filter Rows) Remove retweeted tweets because these are duplicate tweets. Transform (Transform) Perform quantile bin of the “favorite_count” data into three quantiles; each quantile represent a sentiment.  The top quantile represents “highly favored” sentiment, the middle quantile represents “moderately favored” sentiment, and the bottom quantile represents “less favored” sentiment. SQL Query (Recode Sentiment) Assign quantiles as determined sentiments to tweets. Create Table (OUTPUT_4_29) Persist the data to a table for classification model build (optional). Classification (Class Build) Build classification models to predict customer sentiment toward a new tweet (how much will customer like this new tweet?). Data Source Node (TWITTER_LARGE) Select the JSON_DATA in the TWITTER_LARGE table.  The JSON_DATA contains about a thousand tweets to be used for sentiment analysis. JSON Query Node (JSON Query) Use the new JSON Query node to select the following JSON attributes.  This node projects the JSON data to relational data format, so that it can be consumed within the workflow process. SQL Query Node (Cleanse Tweets) Use the REGEXP_REPLACE function to remove numbers, punctuations, and shorten URLs inside tweets because these data are considered noises and do not provide any predictive information.  Notice we do not treat hash tags inside tweets specially; these tags are treated as regular words. We specify the number, punctuation, and URL patterns in regular expression syntax and use the database function REGEXP_REPLACE to replace these patterns inside all tweets with empty spaces. SELECTREGEXP_REPLACE("JSON Query_N$10055"."TWEET", '([[:digit:]*]|[[:punct:]*]|(http[s]?://(.*?)(\s|$)))', '', 1, 0) "TWEETS","JSON Query_N$10055"."FAVORITE_COUNT","JSON Query_N$10055"."ID"FROM"JSON Query_N$10055" Filter Rows Node (Filter Rows) Remove retweeted tweets because these are duplicate tweets.  Usually, retweeted tweets start with a “RT” abbreviate, so we specify the following row filter condition to filter out those tweets. Transform Node (Transform) Use the Transform node to perform quantile bin of the “favorite_count” data into three quantiles; each quantile represent a sentiment.  For simplicity, we just bin the count into three quantiles without applying any special treatment first. SQL Query Node (Recode Sentiment) Assign quantiles as determined sentiments to tweets; top quantile represents “highly favored” sentiment, the middle quantile represents “moderately favored” sentiment, and the bottom quantile represents “less favored”.  These sentiments become target classes for the classification model build. Classification Node (Class Build) Build Classification models using the sentiment as target and tweet id as case id. Since the TWEETS column contains the textual tweets, so we change the mining type to Text Custom. Enable the Stemming option for text processing. Compare Test Results After the model build completes successfully, open the test viewer to compare model test results, the SVM model seems to produce the best prediction for the “highly favored” sentiment (57% correct prediction). Moreover, the SVM model has better lift result than other models, so we will use this model for scoring. Sentiment Prediction (Scoring) Let’s score this tweet “this is a boring tweet!” using the SVM model. As expected, this tweet receives a “less favored” prediction. How about this tweet “larry is doing a data mining demo now!” ? Not surprisingly, this tweet receives a “highly favored” prediction. Last but not least, let’s see the sentiment prediction for the title of this blog Not bad it gets a “highly favored” prediction, so it seems this title will be well received by audience. Conclusion The best SVM model only produces 57% accuracy for the “highly favored” sentiment prediction, but it is reasonably better than random guess.  For a larger sample of tweet data, the model accuracy could be improved.  With the new JSON Query node, it enables us to perform data mining on JSON data which is the most popular data format produced by prominent social networking sites.

Sentiment analysis has been a hot topic recently; sentiment analysis or opinion mining refers to the application of natural language processing, computational linguistics, and text analytics to...

How to import JSON data to Data Miner for Mining

JSON is a popular lightweight data structure used by Big Data. Increasingly, a lot of data produced by BigData are in JSON format. For example, weblogs generated in the middle tier web servers are likely in JSON format. NoSQL database vendors have chosen JSON astheir primary data representation. Moreover, the JSON format is widely used in the RESTful style Webservices responses generated by most popular social media websites likeFacebook, Twitter, LinkedIn, etc. ThisJSON data could potentially contain wealth of information that is valuable forbusiness use. So it is important that wecan bring this data over to Data Miner for analysis and mining purposes. Oracle database 12.1.0.2 provides ability to store and query JSON data. To take advantage of the database JSONsupport, the upcoming Data Miner 4.1 added a new JSON Query node that allowsusers to query JSON data as relational format. In additional, the current Data Source node and Create Table node areenhanced to allow users to specify JSON data in the input data source. In this blog, I will show you how to specify a JSON data in the input datasource and use JSON Query node to selectively query desirable attributes andproject the result in relational format. Once the data is in relational format, users can treat it as a normalrelational data source and start analyzing and mining it immediately. The Data Miner repository installationinstalls a sample JSON dataset ODMR_SALES_JSON_DATA, which I will be using ithere. However, OracleBig Data SQL supports queries against vast amounts of big datastored in multiple data sources, including Hadoop. Users can view and analyze data from variousdata stores together, as if it were all stored in an Oracle database. Specify JSON Data The Data Source node and Create Table nodes are enhanced to allow users tospecify the JSON data type in the input data source. Data Source Node For this demo, we will focus on the Data Source node. To specify JSON data, create a new workflowwith a Data Source node. In the DefineData Source wizard, select the ODMR_SALES_JSON_DATA table. Notice there is only one column (JSON_DATA)in this table, which contains the JSON data. Click Next to go to the next step where it shows the JSON_DATA is selectedwith the JSON(CLOB) data type. The JSONprefix indicates the data stored is in JSON format; the CLOB is the originaldata type. The JSON_DATA column isdefined with the new “IS JSON” constraint, which indicates only valid JSONdocument can be stored there. The UI candetect this constraint and automatically select the column as JSON type. If there was not a “IS JSON” constraintdefined, the column would be shown with a CLOB data type. To manually designate a column as a JSONtype, click on the data type itself to bring up a in-place dropdown where itlists the original data type (e.g. CLOB) and a corresponding JSON type (e.g.JSON(CLOB)), so just select the JSON type. Note: only the following data types can be set to JSON type: VARCHAR2,CLOB, BLOB, RAW, NCLOB, and NVARCHAR2. Click Finish and run the node now. Once the node is run successfully, open the editor to examine the generatedJSON schema. Notice the message “System Generated Data Guide is available” at the bottomof the Selected Attributes listbox. Whathappens here is when the Data Source node is run, it parsed the JSON documentsto produce a schema that represents the document structure. Here is what the schema looks like: PATH TYPE $."CUST_ID" NUMBER $."EDUCATION" STRING $."OCCUPATION" STRING $."HOUSEHOLD_SIZE" STRING $."YRS_RESIDENCE" STRING $."AFFINITY_CARD" STRING $."BULK_PACK_DISKETTES" STRING $."FLAT_PANEL_MONITOR" $."HOME_THEATER_PACKAGE" $."BOOKKEEPING_APPLICATION" $."PRINTER_SUPPLIES" $."Y_BOX_GAMES" $."OS_DOC_SET_KANJI" $."COMMENTS" $."SALES" $."SALES"."PROD_ID" $."SALES"."QUANTITY_SOLD" $."SALES"."AMOUNT_SOLD" $."SALES"."CHANNEL_ID" $."SALES"."PROMO_ID" STRING STRING STRING STRING STRING STRING STRING ARRAY NUMBER NUMBER NUMBER NUMBER NUMBER The JSON Path expression syntax and associated data type info (OBJECT,ARRAY, NUMBER, STRING, BOOLEAN, NULL) are used to represent JSON documentstructure. We will refer to this JSONschema as Data Guide throughout the product. Before we look at the Data Guide in the UI, let’s look at the settings thatcan affect how it is generated. Clickthe “JSON Settings…” button to open the JSON Parsing Settings dialog. The settings are described below: · Generate Data Guide if necessary o Generate a Data Guide if it is not alreadygenerated in parent node. · Sampling o Sample JSON documents for Data Guide generation. · Max. number of documents o Specify maximum number of JSON documents to beparsed for Data Guide generation. · Limit Document Values to Process o Sample JSON document values for Data Guidegeneration. · Max. number per document o Specify maximum number of JSON document scalarvalues (e.g. NUMBER, STRING, BOOLEAN, NULL) per document to be parsed for DataGuide generation. The sampling option is enabled by default to prevent long-running parsing ofJSON documents; parsing could take a while for large number of documents. However, users may supply a Data Guide(Import from File) or reuse an existing Data Guide (Import from Workflow) ifcompatible Data Guide is available. Now let’s look at the Data Guide, go back to the Edit Data Source Nodedialog, select the JSON_DATA column and click the above to open the Edit Data Guide dialog. The dialog shows the JSON structure in ahierarchical tree view with data type information. The “Number of Values Processed” shows thetotal number of JSON scalar values was parsed to produce the Data Guide. Users can control whether to enable Data Guide generation or import acompatible Data Guide via the menu under the icon. The menu options are described below: · Default o Use the “Generate Data Guide if necessary”setting found in the JSON Parsing Setting dialog (see above). · On o Always generate a Data Guide. · Off o Do not generate a Data Guide. · Import From Workflow o Import a compatible Data Guide from a workflownode (e.g. Data Source, Create Table). The option will be set to Off after the import (disable Data Guidegeneration). · Import From File o Import a compatible Data Guide from a file. The option will be set to Off after theimport (disable Data Guide generation). Users can also export the current Data Guide to a file via the icon. Select JSON Data In Data Miner 4.1, a new JSON Query node is added to allowusers to selectively bring over desirable JSON attributes as relational format. JSON Query Node The JSON Query node is added to the Transforms group of theWorkflow. Let’s create a JSON Query node and connect the Data Source node to it. Double click the JSON Query node to open the editor. The editor consists of four tabs, and thesetabs are described as followings: · JSON The Column dropdown lists all available columns inthe data source where JSON structure (Data Guide) is found. It consists of the following two sub tabs: o Structure o Show the JSON structure of the selected columnin a hierarchical tree view. o Data o Show sample of JSON documents found in the selectedcolumn. By default it displays first2,000 characters (including spaces) of the documents. Users can change the sample size (max. 50,000chars) and run the query to see more of the documents. · Addition output o Allow users to select any non-JSON columns inthe data source as additional output columns. · Aggregation o Allow users to define aggregations of JSONattributes. · Preview o Output Columns o Show columns in the generated relational output. o Output Data o Show data in the generated relational output. JSON Tab Let’s select some JSON attributes to bring over. Skip the SALES attributes because we want todefine aggregations for these attributes (QUANTITY_SOLD and AMOUNT_SOLD). To peek at the JSON documents, go to the Data tab. You can change the Sample Size to look atmore JSON data. Also, you can search forspecific data within the displayed documents by using the search control. Addition Output Tab If you have any non-JSON columns in the data source that youwant to carry over for output, you can select those columns here. Aggregate Tab Let’s define aggregations (use SUM function) forQUANTITY_SOLD and AMOUNT_SOLD attributes (within the SALES array) for eachcustomer group (group by CUST_ID). Click the icon in the top toolbar to open the Edit GroupBy dialog, where you can select the CUST_ID as the Group-By attribute. Notice the Group-By attribute can consists ofmultiple attributes. Click OK to return to the Aggregate tab, where you can seethe selected CUST_ID Group-By attribute is now added to the Group By Attributestable at the top. Click the icon in the bottom toolbar to open the AddAggregations dialog, where you can define the aggregations for bothQUANTITY_SOLD and AMOUNT_SOLD attributes using the SUM function. Next, click the icon in the toolbar to open the Edit Sub GroupBy dialog, where you can specify a Sub-Group By attribute (PROD_ID) tocalculate quantity sold and amount sold per product per customer. Specifying a Sub-Group By column creates a nested table; thenested table contains columns with data type DM_NESTED_NUMERICALS. Click OK to return to the Aggregate tab, where you can seethe defined aggregations are now added to the Aggregation table at the bottom. Preview Tab Let’s go to the Preview tab to look at the generatedrelational output. The Output Columnstab shows all output columns and their corresponding source JSONattributes. The output columns can berenamed by using the in-place edit control. The Output Data tab shows the actual data in the generatedrelational output. Click OK to close the editor when you are done. The generated relational output issingle-record case format; each row represents a case. If we had not defined the aggregations forthe JSON array attributes, the relational output would have been inmultiple-record case format. Themultiple-record case format is not suitable for building mining models exceptfor Association model (which accepts transactional data format with transactionid and item id). Use Case Here is an example of how JSON Query node is used to projectthe JSON data source to relational format, so that the data can be consumed byExplore Data node for data analysis and Class Build node for building models. Conclusion This blog shows how JSON data can be brought over to DataMiner via the new JSON Query node. Oncethe data is projected to relational format, it can easily be consumed by DataMiner for graphing, data analysis, text processing, transformation, and modeling.

JSON is a popular lightweight data structure used by Big Data. Increasingly, a lot of data produced by Big Data are in JSON format. For example, weblogs generated in the middle tier web servers are...

ORACLE BI, DW, ANALYTICS, BIG DATA AND SPATIAL USER COMMUNITY - BIWA Summit'15 www.biwasummit.org

Please share with your OracleBI, DW, Analytics, big Data and Spatial User coMMUNITY.   THANKS.  CB BIWA Summit’15 Jan27-29, 2015 Early Bird Registration Ends Friday.  Registration is now LIVE. Register by November21st (tomorrow) to receive the early bird pricing of $249 and save $50. Pleasedirect your colleagues to REGISTER NOW and participate to takeadvantage of the Early Bird registration ($249.00 USD).  EARLYBIRD SPECIAL ENDS TOMORROW (Friday, Nov. 21).  Here’s some informationabout the event below and some pics and talksfrom last year to give some feel for theopportunity.    BIWASummits have been organized and managed by the Oracle BI, DW and Analytics SIG usercommunity of IOUG (Independent Oracle User Group) and attract the topOracle BI, DW and Advanced Analytics and Big Data experts. The 2.5-day BIWASummit'15 event joins forces with the Oracle Spatial SIG and involves Keynotes by Industry experts, Educational sessions,Hands-on Labs and networking events. We have a great line up so far w/ Tom Kyte SeniorTechnical Architect in Oracles Server Technology, Doug Cutting (ChiefArchitect, Cloudera), Oracle BI Senior Management, Neil Mendelson, VP ofProduct Management Big Data and Advanced Analytics, Matt Bradley, SVP, OracleProduct Development, EPM Applications, other features speakers, and manycustomers/tech experts (see web site and search % Sessions). Our BIWASummit offers a broad, multi-track user driven conference that has built up agrowing reputation over the years. We emphasize technical content andnetworking with like minded customers, users, developers, product managers (Database, Big Data Appliance, Oracle Advanced Analytics,Spatial, OBIEE, Endeca, Big Data Discovery, In-Memory, SQL Patterns,etc.), etc. who all share an interest in “novel and interesting usecases” of Oracle BI, DW, Advanced Analytics and Spatial technologies,applications and solutions. We’re off to a great start this year with a greatagenda and hope to pack the HQ CC this Jan 27-29, 2015 with 300+ attendees. Pleaseforward and share with your Oracle BI, DW, Analytics, Big Data and Spatialcolleagues.    Thankyou!  Hope to see you at BIWA Summit'15 Charlie

Please share with your Oracle BI, DW, Analytics, big Data and Spatial User coMMUNITY.   THANKS.  CB BIWA Summit’15 Jan 27-29, 2015 Early Bird Registration Ends Friday.  Registration is now LIVE. Register...

2014 was a very good year for Oracle Advanced Analytics at Oracle Open World 2014

2014 was a very good yearfor Oracle Advanced Analytics at Oracle Open World 2014.   We had a number ofcustomer, partner and Oracle talks that focused on the Oracle Advanced Analytics Database Option.    Seebelow with links to presentations.  Check back later to OOW Sessions Content Catalog as not all presentations have been uploaded yet. :-( BigData and Predictive Analytics: Fiserv Data Mining Case Study [CON8631] Julia Minkowski - Risk Manager, Fiserv, Inc. Miguel Barrera - Director, Risk Analytics, fiserv Charles Berger - Senior Director, Product Management, Data Mining and Advanced Analytics, Oracle Moving data mining algorithmsto run as native data mining SQL functions eliminates data movement, automatesknowledge discovery, and accelerates the transformation of large-scale data toactionable insights from days/weeks to minutes/hours. In this session, Fiserv,a leading global provider of electronic commerce systems for the financialservices industry, shares best practices for turning in-database predictivemodels into actionable policies and illustrates the use of Oracle Data Minerfor fraud prevention in online payments. Attendees will learn how businessesthat implement predictive analytics in their production processes significantlyimprove profitability and maximize their ROI. DevelopingRelevant Dining Visits with Oracle Advanced Analytics at Olive Garden [CON2898] Matt Fritz - Senior Data Scientist Olive Garden, traditionallymanaging its 830 restaurants nationally, transitioned to a localized approachwith the help of predictive analytics. Using k-means clustering and logisticclassification algorithms, it divided its stores into five behavioral segments.The analysis leveraged Oracle SQL Developer 4.0 and Oracle R Enterprise 1.3 toevaluate 115 million transactions in just 5 percent the time required by thecompany’s BI tool. While saving both time and money by making it possible todevelop the solution internally, this analysis has informed Olive Garden’slatest remodel campaign and continues to uncover millions in profits byoptimizing pricing and menu assortment. This session illustrates how OracleAdvanced Analytics solutions directly affect the bottom line. A Perfect Storm: OracleBig Data Science for Enterprise R and SAS Users [CON8331] Marcos Arancibia Coddou - Product Manager, Oracle Advanced Analytics, Oracle Mark Hornick - Director, Advanced Analytics, Oracle With the advent of R and arich ecosystem of users and developers, a myriad of bloggers, and thousands ofpackages with functionality ranging from social network analysis and spatialdata analysis to empirical finance and phylogenetics, use of R is on a steepuptrend. With new R tools from Oracle, including Oracle R Enterprise, Oracle RDistribution, and Oracle R Advanced Analytics for Hadoop, users can scale andintegrate R for their enterprise big data needs. Come to this session to learnabout Oracle’s R technologies and what data scientists from smart companiesaround the world are doing with R. Extending the Power of In-Database Analytics with Oracle Big Data Appliance [CON2452] Masoud Charkhabi - Director, Advanced Analytics, Canadian Imperial Bank of Commerce The need for speed could not be greater—not speed of processing but time to market. The problem is driven by the long journey data takes before evolving into insight. Insight, however, is always relative to assumption. In fact, analytics is often seen as a battle between assumption and data. Assumptions can be classified into three types: related to distributions, ratios, and relations. In this session, you will see how the most-valuable business insights can come in the matter of hours, not months, when assumptions are challenged with data. This is made possible by the integration of Oracle Big Data Appliance, enabling transparent access to in-database analytics from the data warehouse and avoiding the traditional long journey of data to insight. MarketBasket Analysis at Dunkin’ Brands [CON6545] PrasannaKumar Palanisamy - Development Manager, Dunkin Donuts Mahesh Jagannath, Dunkin Brands With almost 120 years offranchising experience, Dunkin’ Brands owns two of the world’s most recognized,beloved franchises: Dunkin’ Donuts and Baskin-Robbins. This session describes amarket basket analysis solution built from scratch on the Oracle AdvancedAnalytics platform at Dunkin’ Brands. This solution enables Dunkin’ to look atproduct affinity and a host of associated sales metrics with a view toimproving promotional effectiveness and cross-sell/up-sell to increase customerloyalty. The presentation discusses the business value achieved and technicalchallenges faced in scaling the solution to Dunkin’ Brands’ transactionvolumes, including engineered systems (Oracle Exadata) hardware and parallelprocessing at the core of the implementation. PredictiveAnalytics with Oracle Data Mining [CON8596] Bryan Hodge - Global Leader, Customer Intelligence, Oracle Vinay Deshmukh - Senior Director, Oracle This session presents threecase studies related to predictive analytics with the Oracle Data Miningfeature of Oracle Advanced Analytics. Service contracts cancellation avoidance withOracle Data Mining is about predicting the contracts at risk of cancellation atleast nine months in advance. Predicting hardware opportunities that have ahigh likelihood of being won means identifying such opportunities at least fourmonths in advance to provide visibility into suppliers of required materials.Finally, predicting cloud customer churn involves identifying the customersthat are not as likely to renew subscriptions as others. SQL Is the BestDevelopment Language for Big Data [CON7439] Thomas Kyte - Architect, Oracle SQL has a long and storiedhistory. From the early 1980s till today, data processing has been dominated bythis language. It has changed and evolved greatly over time, gaining featuressuch as analytic windowing functions, model clauses, and row-pattern matching.This session explores what's new in SQL and Oracle Database for exploiting bigdata. You'll see how to use SQL to efficiently and effectively process datathat is not stored directly in Oracle Database. AdvancedPredictive Analytics for Database Developers on Oracle [CON7977] Pirama Arumuga nainar - Senior Software Engineer, Oracle Ekine Akuiyibo - Software Engineer, Oracle Debabrata Sarkar - Senior Engineering Manager, Oracle Traditional databaseapplications use SQL queries to filter, aggregate, and summarize data. This iscalled descriptive analytics. The next level is predictive analytics, wherehidden patterns are discovered to answer questions that give unique insightsthat cannot be derived with descriptive analytics. Businesses are increasinglyusing machine learning techniques to perform predictive analytics, which helpsthem better understand past data, predict future trends, and enable betterdecision-making. This session discusses how to use machine learning algorithmssuch as regression, classification, and clustering to solve a few selectedbusiness use cases. What Are They Thinking?With Oracle Application Express and Oracle Data Miner [UGF2861] Roel Hartman - Director, Logica Brendan Tierney - Consultant, DIT & Oralytics.com (Author, Predictive Analytics using Oracle Data Miner book) Have you ever wanted to addsome data science to your Oracle Application Express applications? This sessionshows you how you can combine predictive analytics from Oracle Data Miner intoyour Oracle Application Express application to monitor sentiment analysis.Using Oracle Data Miner features, you can build data mining models of your dataand apply them to your new data. The presentation uses Twitter feeds from conferenceevents to demonstrate how this data can be fed into your Oracle ApplicationExpress application and how you can monitor sentiment with the native SQL andPL/SQL functions of Oracle Data Miner. Oracle Application Express comes withseveral graphical techniques, and the presentation uses them to create asentiment dashboard. Transforming CustomerExperience with Big Data and Predictive Analytics [CON8148] Chris King - Sr. Director, Oracle Tony Velcich, Oracle Delivering a high-qualitycustomer experience is essential for long-term profitability and customerretention in the communications industry. Although service providers own awealth of customer data within their systems, the sheer volume and complexityof the data structures inhibit their ability to extract the full value of theinformation. To change this situation, service providers are increasinglyturning to a new generation of business intelligence tools. This session beginsby discussing the key market challenges for business analytics and continues byexploring Oracle’s approach to meeting these challenges, including the use ofpredictive analytics, big data, and social network analytics. There are a few others whereOracle Advanced Analytics is included e.g. Retail GBU, Big Data Strategy, etc.but they are typically more broadly focused.  If you search the ContentCatalog for “Advanced Analytics” etc. you can find other relatedpresentations that involve OAA. Hope this helps.  Enjoy! cb

2014 was a very good year for Oracle Advanced Analytics at Oracle Open World 2014.   We had a number ofcustomer, partner and Oracle talks that focused on the Oracle Advanced Analytics Database Option. ...

Take a FREE Test Drive of Oracle Data Miner on Amazon Cloud - Offered by Vlamis Software, Oracle Partner

Thanks to a wonderful and extremely convenient and easy to use Amazon Cloud hosting by Vlamis Software, an Oracle Partner, you can now take a FREE Test Drive of Oracle Data Miner in about 10 minutes!  There are 3 simple steps: Step 1—Fill out request Go to http://www.vlamis.com/testdrive-registration/  Select the Oracle Advanced Analytics Test Drive Step 2—Connect and Launch Launch the Amazon Cloud instance and wait for the assigned IP address.  Vlamis has provided a nice YouTube instructional video that you should watch for instructions. Connect with Remote Desktop Step 3—Start Test Drive! The Amazon Cloud that Vlamis has set up includes everything you'll need to try out Oracle Data Miner: Oracle Database EE  11g Release 2 Oracle Advanced Analytics Option SQL Developer 4.0/Oracle Data Miner GUI Demo data for learning - this makes it fast and easy to get started.  The demo data covers multiple scenarios for simple graphing, classification, regression, market basket analysis, anomaly detection, text mining and mining star schema 360 degree customer views Follow the Oracle Data Miner Tutorials that are provided.  These Tutorials are also available on the Oracle Technology Network Try it out!  Many thanks to Oracle Partner, Vlamis Software for this terrific Oracle Data Miner Test Drive on the Amazon Cloud.  By the way, if interested, Vlamis is an authorized Instructor for the Oracle University 2 Day Instructor Led Course on Oracle Data Mining and provides data mining consulting and implementation assistance services.

Thanks to a wonderful and extremely convenient and easy to use Amazon Cloud hosting by Vlamis Software, an Oracle Partner, you can now take a FREE Test Drive of Oracle Data Miner in about 10 minutes! ...

Oracle Data Miner and Oracle R Enterprise Integration - Watch Demo

Oracle Data Miner and Oracle R Enterprise Integration - Watch Demo Oracle Advanced Analytics (Database EE) Option turns the database into an enterprise-wide analytical platform that can quickly deliver enterprise-wide predictive analytics and actionable insights.  Oracle Advanced Analytics is comprised of both the Oracle Data Mining SQL data mining functions, Oracle Data Miner, an extension to SQL Developer that exposes the data mining SQL functions for data analysts, and Oracle R Enterprise which integrates the R statistical programming language with SQL.  15 powerful in-database SQL data mining functions, the SQL Developer/Oracle Data Miner workflow GUI and the ability to integrate open source R within an analytical methodology, makes the Oracle Database + Oracle Advanced Analytics Option the ideal platform for building and deploying enterprise-wide predictive analytics applications/solutions.   In Oracle Data Miner 4.0 we added a new SQL Query node to allow users to insert arbitrary SQL scripts within an ODMr analytical workflow. Additionally, the SQL Query node allows users to leverage registered R scripts to extend Oracle Data Miner's analytical capabilities.  For applications that are mostly OAA/Oracle Data Mining SQL data mining functions based but require additional analytical techniques found in the R community, this is an ideal method for integrating the power of in-database SQL analytical and data mining functions with the flexibility of open source R.  For applications that are built entirely using the R statistical programming language, it may be more practical to stay within the R console or RStudio environments, but for SQL-centric in-database predictive methodologies, this integration is just what might satisfy your needs. Watch this Oracle Data Miner and Oracle R Enteprise Integration YouTube to see the demo.  There is an excellent related Oracle Data Miner:  Integrate Oracle R Enterprise Algorithms into workflow using the SQL Query node (pdf, companion files) white paper on this topic that includes examples on the Oracle Technology Network in the Oracle Data Mining pages.  

Oracle Data Miner and Oracle R Enterprise Integration - Watch Demo Oracle Advanced Analytics (Database EE) Option turns the database into an enterprise-wide analytical platform that can quickly deliver...

Real Time Association Rules Recommendation Engine

This blog shows how you can write a SQL query for Association Rules recommendation; such a query can be used to recommend products (cross-sell) to a customer based on products already placed in his current shopping cart.  Before we can perform the recommendation, we need to build an association rules model that based on previous customer sales transactions. For the demo, I will use the SALES and PRODUCTS tables found in the sample SH schema as input data and build the association model using the free Oracle Data Miner GUI tool. Association Rules Model Workflow The SALES table contains time based (TIME_ID) sales transactions of all customers (CUST_ID) product purchases (PROD_ID). The actual product names can be found in the PRODUCTS table, so we join these two tables to get the sales transactions with real product names (instead of looking up the product names using the PROD_ID later). Enter the following Transaction ids (CUST_ID, TIME_ID) and item id (PROD_NAME) in the Association Rule Build node editor. Enter the Maximum Rule length of 2 and Minimum Confidence and Support as followings. The lower the Confidence and Support percents will yield more rules; the higher the percents will yield fewer rules. We want the generated rules to have one Antecedent to one Consequent, so we set the Maximum Rule length to 2. SQL Query for Recommendation The following SQL query returns the top 3 products recommendation based on products placed in the customer’s current shopping cart. SELECT rownum AS rank,   consequent  AS recommendation FROM (   WITH rules AS (     SELECT AR.rule_id AS "ID",       ant_pred.attribute_subname antecedent,       cons_pred.attribute_subname consequent,       AR.rule_support support,       AR.rule_confidence confidence     FROM TABLE(dbms_data_mining.get_association_rules('AR_RECOMMENDATION')) AR,       TABLE(AR.antecedent) ant_pred,       TABLE(AR.consequent) cons_pred   ),   cust_data AS (     SELECT 'Comic Book Heroes' AS prod_name FROM DUAL     UNION     SELECT 'Martial Arts Champions' AS prod_name FROM DUAL   )   SELECT rules.consequent,     MAX(rules.confidence) max_confidence,     MAX(rules.support) max_support   FROM rules, cust_data   WHERE cust_data.prod_name = rules.antecedent   AND rules.consequent NOT IN (SELECT prod_name FROM cust_data)   GROUP BY rules.consequent   ORDER BY max_confidence DESC, max_support DESC ) WHERE rownum <=3; The above SQL query consists of 3 main sections: association rules, current customer data, and product recommendation. Association Rules The first section returns the associated rules (antecedent, consequent) and associated confidence and support values discovered by the model (AR_RECOMMENDATION) that was built in the above workflow. You may find the DBMS_DATA_MINING.GET_ASSOCIATION_RULES function reference here.   WITH rules AS (     SELECT AR.rule_id AS "ID",       ant_pred.attribute_subname antecedent,       cons_pred.attribute_subname consequent,       AR.rule_support support,       AR.rule_confidence confidence     FROM TABLE(dbms_data_mining.get_association_rules('AR_RECOMMENDATION')) AR,       TABLE(AR.antecedent) ant_pred,       TABLE(AR.consequent) cons_pred Current Customer Data The middle section defines the current customer product selection on the fly (real time). For example, we assume this customer placed the 'Comic Book Heroes' and 'Martial Arts Champions' products in the current shopping cart.   cust_data AS (     SELECT 'Comic Book Heroes' AS prod_name FROM DUAL     UNION     SELECT 'Martial Arts Champions' AS prod_name FROM DUAL   ) Product Recommendation Last but not least is the query to return the recommended products based on the discovered rules and current customer product selection. It is possible that the rules may suggest the same product (consequent) for different customer products (prod_name), so we aggregate the consequents using the MAX function on the confidence and support values. In case of duplicate recommendations, we just use the max confidence and support values for comparison. Moreover, we don’t want the recommended products that are already placed in the customer’s shopping cart, so we add the “NOT IN (SELECTprod_name FROM cust_data)” condition. Finally, the query returns the recommendations in the order of highest confident and support first.   SELECT rules.consequent,     MAX(rules.confidence) max_confidence,     MAX(rules.support) max_support   FROM rules, cust_data   WHERE cust_data.prod_name = rules.antecedent   AND rules.consequent NOT IN (SELECT prod_name FROM cust_data)   GROUP BY rules.consequent   ORDER BY max_confidence DESC, max_support DESC The recommendation query returns the following recommendations for the 'Comic Book Heroes' and 'Martial Arts Champions' products. RANK   RECOMMENDATION ---------- --------------------------------          1   Xtend Memory          2   Endurance Racing          3   Adventures with Numbers Alternative SQL Query for Recommendation The first recommendation query may not be scalable; it returns all possible rules to be processed by the recommendation sub query. The more scalable approach is to push as much processing to the GET_ASSOCIATION_RULES function as possible, so that it returns minimal set of rules for further processing. Here we specify the topn=10, min_confidence=0.1, min_support=0.01, sort_order='RULE_CONFIDENCE DESC', 'RULE_SUPPORT DESC', and the antecedent items to the function, and let it finds the top 10 set of rules that satisfy these criteria. Once we obtain the refined rule set, we filter out recommendations that already in the customer’s shopping cart and also perform aggregation (use MAX() function) on the confidence and support values. Finally, we query the top 3 recommendations based on the order of highest confident and support first. SELECT rownum AS rank,   consequent  AS recommendation FROM   (SELECT cons_pred.attribute_subname consequent,     MAX(AR.rule_support) max_support,     MAX(AR.rule_confidence) max_confidence   FROM TABLE (DBMS_DATA_MINING.GET_ASSOCIATION_RULES ( 'AR_RECOMMENDATION', 10, NULL, 0.1, 0.01, 2, 1,                   ORA_MINING_VARCHAR2_NT ( 'RULE_CONFIDENCE DESC', 'RULE_SUPPORT DESC'),                   DM_ITEMS(DM_ITEM('PROD_NAME', 'Comic Book Heroes', NULL, NULL),                            DM_ITEM('PROD_NAME', 'Martial Arts Champions', NULL, NULL)), NULL, 1)) AR, TABLE(AR.consequent) cons_pred   WHERE cons_pred.attribute_subname NOT IN ('Comic Book Heroes', 'Martial Arts Champions')   GROUP BY cons_pred.attribute_subname   ORDER BY max_confidence DESC, max_support DESC   ) WHERE rownum <=3; Note: another consideration is to order the rules by the lift value; the higher the lift value the more accurate the recommendation. SQL Query for Recommendation using Customer Previous Sales TransactionsI am going to extend the above recommendation SQL query to include the customer previous sales transactions, so that the recommendation is now based on the previous purchased products and the products in the current shopping cart. Moreover, we don’t want any recommended products that have been purchased previously or already placed in the current shopping cart. For this example, we use a window of 12 months since the last customer purchase as the past sales history used for recommendation. To include the customer sales history (assume cust_id = 3), a hist_cust_data sub query is added to obtain the previously purchased products. A tot_cust_data sub query is added to include both the products in the current shopping cart and the previously purchased products. The following query returns top 3 recommendations based on customer previously purchased products in the last 12 months and the products in the current shopping cart. SELECT rownum AS rank, consequent AS recommendation FROM (   WITH rules AS (     SELECT AR.rule_id AS "ID",       ant_pred.attribute_subname antecedent,       cons_pred.attribute_subname consequent,       AR.rule_support support,       AR.rule_confidence confidence,       AR.rule_lift lift     FROM TABLE(dbms_data_mining.get_association_rules('AR_RECOMMENDATION')) AR,          TABLE(AR.antecedent) ant_pred,          TABLE(AR.consequent) cons_pred   ),   cur_cust_data AS (     SELECT 'Comic Book Heroes' AS PROD_NAME FROM DUAL     UNION     SELECT 'Martial Arts Champions' AS PROD_NAME FROM DUAL   ),   hist_cust_data AS(     SELECT       DISTINCT PROD_NAME     FROM sh.sales s, sh.products p     WHERE cust_id = 3       AND s.prod_id = p.prod_id       -- customer historical purchase for last 12 months       AND time_id  >= add_months((SELECT MAX(time_id) FROM sh.sales WHERE cust_id = 3), -12)    ),   tot_cust_data AS (     SELECT PROD_NAME FROM cur_cust_data     UNION     SELECT PROD_NAME FROM hist_cust_data   )   SELECT rules.consequent,     SUM(rules.lift) lift_sum,     SUM(rules.confidence) confidence_sum,     SUM(rules.support) support_sum   FROM rules, tot_cust_data   WHERE tot_cust_data.prod_name = rules.antecedent     -- don't recommend products that customer already owned or about to purchase       AND rules.consequent NOT IN (SELECT prod_name FROM tot_cust_data)    GROUP BY rules.consequent   ORDER BY lift_sum DESC, confidence_sum DESC, support_sum DESC ) WHERE rownum <= 3; Conclusion This blog shows a few examples of how you can write a recommendation SQL query with different flavors (with or without historical sales transactions). You may also consider assign a profit for each product, so that you may come up with a query that returns the top most profitable product recommendations.

This blog shows how you can write a SQL query for Association Rules recommendation; such a query can be used to recommend products (cross-sell) to a customer based on products already placed in his...

Oracle Data Miner 4.0 New Features, Installation and Migration Release Notes

If you are downloading the new SQL Developer 4.0 to use the new Oracle Data Miner 4.0 Extension, you may want to read these release notes and pay particular attention to the information about migrating your ODM'r Repository and previous ODM'r work flows.  During the Oracle Data Miner set up, you will be prompted if you want Oracle Data Miner to migrate your ODM'r repository.  Just click "yes" and all your projects and work flows will be migrated automatically.  Alternatively, there are SQL scripts for those who prefer to perform the SQLDEV/ODM'r set up manually.  (Also, see Oracle by Example Tutorial:  Setting Up Oracle Data Miner).See the full list Oracle by Example Tutorials for Oracle Data Miner at the Oracle Learning Library.  Hope this helps! Charlie   --- Oracle Data Miner Extension for SQL Developer 4.0 Release Notes December 2013 These release notes contain the following information about Oracle Data Miner 4.0: Oracle Data Miner Functionality New Features Before You Start Support Getting Started Migration Learn How Use Data Miner General Comments and Limitations Oracle Data Miner Functionality Oracle Data Mining, part of the Oracle Advanced Analytics option of Oracle Enterprise Edition, provides powerful data mining functionality to leverage data stored in an Oracle Database. Oracle Data Miner is the graphical user interface (GUI) for Oracle Data Mining. Oracle Data Miner is an extension to Oracle SQL Developer. Oracle Data Mining enables users to build descriptive and predictive models that Predict customer behavior Identify promising selling opportunities Identify customer retention risks Discover customer clusters, segments, and profiles Detect anomalous behavior For more information about Oracle Data Mining, see Oracle Data Mining on Oracle Technology Network. New Features The Oracle Data Miner extension to SQL Developer 4.0 includes these new features: These features are supported for all database versions supported by Data Miner (Oracle Database 11.2.0.1 and above): Workflow SQL Script Deployment Generates SQL scripts to support full deployment of workflow contents SQL Query Node Integrate SQL queries to transform data or provide a new data source Supports the running of R Language Scripts and viewing of R generated data and graphics Graph Node Generate Line, Scatter, Bar, Histogram and Box Plots Workflow Performance Features Workflow Parallel Query Setting: Specifies degree of parallel query processing desired per workflow node. Table Compression: Table generation utilizes table compression feature. Model Build Node Improvements Node-level data usage specification applied to underlying models Node-level text specifications to govern text transformations Displays heuristic rules responsible for excluding predictor columns Ability to control the amount of Classification and Regression test results generated Model Tuning option, turn off/on generation of tuning results View Data Ability to drill in to view custom objects and nested tables Model Details Node Added Cluster Centroid Details Explore and Transform Nodes Improvements in NULL handling Explore Node Ability to select specific statistical outputs Workflow Run and Validation Options Ability to select multiple nodes when invoking Workflow Run options Added Validation option that allows users to run the Workflow in validation mode Workflow Import Validate if the target database version supports the functionality contained in the imported workflow Model Tuning Added option to control generation of results used for post-model build tuning Column Filter Node Added System Determined option for Attribute Importance sampling technique Model Test Performance Specifying CASE ID improves performance of Classification and Regression testing Viewers and Editors Removed blocking dialogs triggered by long running processes and queries Copy and Paste added functionality Ability to copy charts to clipboard or save to file Ability to copy data grids to clipboard Ability to easily copy Cluster and Decision Tree Rules to clipboard or file Ability to copy and paste workflows between different Data Miner repositories as long as the target repository is compatible with the source repository Charts Ability to view and copy data content of all charts Demo Data Added ODMR_CARS_DEMO table to the demo data scripts Backup And Recovery New scripts to provide workflow-level backup and recovery options For Database 11.2.0.4 or higher, significant improvements in Data Miner Repository performance for workflow save, run, and load times These features require connection to Oracle Database 12c Release 1 or above: Predictive Query Nodes Predictive results without the need to build models using Analytical Queries Refined predictions based on data partitions Clustering Node New Algorithm Added Expectation Maximization algorithm Feature Extraction Node New Algorithms Added Singular Value Decomposition and Principal Component Analysis algorithms Text Mining Enhancements Text transformations integrated as part of Model's Automatic Data Preparation Ability to import Build Text node specifications into a Model Build node Prediction Result Explanations Scoring details that explain predictive result Generalized Linear Model New Algorithm Settings New algorithm settings provide feature selection and generation Extended Data Type Support Support for VARCHAR2 size of 32767 Before You Start To use Oracle Data Miner, you must connect to an Oracle Database that satisfies these requirements: Oracle Data Mining is installed. Oracle Data Mining is installed automatically when you install Oracle Database Enterprise Edition.New features may require Oracle Database 12c Release 1. Oracle Text is installed. Oracle Text is installed automatically when you install Oracle Database Enterprise Edition. If you plan to use Oracle Text to extract Themes, you must install the Knowledge Base by installing the Oracle Database Examples, as described in the Oracle Database Examples Installation Guide. Oracle XML DB is installed. Oracle XML DB is installed automatically when you install Oracle Database Enterprise Edition. Data used by the Oracle By Example tutorials requires the SH schema. If you install a starter database when you install Oracle Database Enterprise Edition, SH is automatically installed. Data Miner has two components: The Repository, which runs on an Oracle Database; one repository supports many connected clients; the connected clients, in turn, can either connect to their own database account or share a common database account. The Client, which runs on any platform that SQL Developer supports (Windows, Mac, or Linux). Support For released products, you are supported by Oracle Support under your current Oracle Database Support license. Log Oracle Data Miner bugs and issues using My Oracle Support for the product. You can post Data Miner questions or issues at Data Mining Forum and receive replies from other Data Miner users as well as from Oracle Data Mining Development team. You may find it useful to "follow" the Data Miner forum to keep up with useful postings. Getting Started Follow these steps to install the prerequisites for Data Miner and install the Data Miner Repository: Step 1: Install Oracle 11g  Release 2 or higher Database. To have access to all available data mining features install Oracle 12.1. Download the software from Oracle Database Software Downloads. To use Oracle Data Miner, you must connect to an Oracle Database that satisfies the requirements specified in Before You Start. For Oracle Database 11g Release 2, the Oracle Data Miner Administrator's Guide describes how to install Oracle Database Enterprise Edition on Microsoft Windows. For instructions for other platforms, see Installing and Upgrading in the Oracle Database Documentation Library.For Oracle Database 12c Release 1, Oracle® Data Mining User's Guide describes Oracle Data Mining installation. Oracle Data Mining Documentation describes how to view Data Mining Documentation. Step 2: Download SQL Developer 4.0 from SQL Developer. Install SQL Developer by unzipping the download to any directory on your system. Note that SQL Developer 4.0 requires Java version 1.7 and above. Step 3: Install Data Miner Repository from the Data Miner GUI by following the Oracle By Example tutorial Setting Up Oracle Data Miner .Alternatively, you can install the Data Miner Repository using the Installation Scripts. The installation scripts are in SQLDevHome\sqldeveloper\dataminer\scripts, where SQLDevHome is the directory where you installed SQL Developer. Use of the scripts is optional. The scripts are described in SQLDevHome\sqldeveloper\dataminer\scripts/install_scripts_readme.html and in the online help for Data Miner. After you install SQL Developer 4.0, you can also view start up instructions for Data Miner as follows: Select the menu item Help>Table of Contents; in the Help Center, expand Data Miner Concepts and Usage and then expand Data Miner 4.0 and its subfolder Install Prerequisites and Oracle Data Miner Repository. Oracle Data Mining Documentation You can view or download Oracle documentation from Documentation. Go to Database to find the documentation library for your database. Oracle Data Mining is a component of the Oracle Advanced Analytics Option. Oracle Advanced Analytics is described on the Data Warehousing and Business Intelligence page. Migration When you initially open a connection from the Data Miner navigator to an existing Data Miner repository, you are prompted to update the repository. The repository update will be performed for you through the GUI-guided process. If you want to perform migration by running scripts manually, the installation scripts are located in SQLDevHome\sqldeveloper\dataminer\scripts, where SQLDevHome is the directory where SQL Developer is installed. The scripts are described ininstall_scripts_readme.html in the scripts directory. and in the online help for Data Miner. Learn How to Use Data Miner Data Miner includes Oracle By Example (OBE) tutorials in the Oracle Learning Library at Oracle Data Mining 12c OBE Series. The OBEs describe how to set up and use Oracle Data Miner. In-depth White Papers are available at Oracle Data Mining under the heading Technical Information for the following topics: Generate a PL/SQL script for workflow deployment Integrate Oracle R Enterprise Algorithms into workflow using the SQL Query node Using Oracle Data Miner 11g Release 2 with Star Schema data - A Telco Churn Case Study General Comments and Limitations If a workflow appears to be running too long, you can use the Event Viewer to determine the status of the workflow. To open the Event Viewer, click the Event Viewer icon in the workflow tool bar. You can click on the Info icon in the toll bar of the Event Viewer to see the information event logs in addition to the Warnings and Errors. If the workflow is complete but the UI still shows that the workflow is running, simply close the workflow and reopen it. When you upgrade to Oracle Database 11g Release 2 (11.2.0.4) or higher from 11.2.0.3 or lower, you must also upgrade the ODMRSYS repository. When you start SQL Developer 4.0 after the database upgrade, you are prompted to perform an upgrade for Data Miner. The upgrade process is fully automated in the Data Miner GUI. Beginning with Oracle Database 12c, you can specify a maximum size of 32767 bytes for the VARCHAR2, NVARCHAR2, and RAW data types. You will have failures when you view data in an extended data type, that is, a VARCHAR2 column with a declared size of greater than 4000 bytes. In order to view extended data types, you must set the connection properties following these steps: In SQL Developer, go to Tools > Preferences > Database > Advanced. Click Use Oracle Client. Then click Configure. Set Client Type to Oracle Home and set Client Location to the value of Oracle Home for either a full Oracle 12c Client or for Oracle Database 12c installed on your local system. (Client Type equal to Instant Client does not work at the time of this release.)

If you are downloading the new SQL Developer 4.0 to use the new Oracle Data Miner 4.0 Extension, you may want to read these release notes and pay particular attention to the information...

Deploy Data Miner Apply Node SQL as RESTful Web Service for Real-Time Scoring

The free OracleData Miner GUI is an extension to Oracle SQL Developer that enables dataanalysts to work directly with data inside the database, explore the datagraphically, build and evaluate multiple data mining models, apply Oracle DataMining models to new data and deploy Oracle Data Mining's predictions andinsights throughout the enterprise. Theproduct enables a complete workflow deployment to a production system viagenerated PL/SQL scripts (See Generatea PL/SQL script for workflow deployment). This time I want to focus on the modelscoring side, especially the single record real-time scoring. Would it be nice if the scoring function canbe accessed by different systems on different platforms? How about deploying the scoring function as aWeb Service? This way any system thatcan send HTTP request can invoke the scoring Web Service, and consume thereturning result as they see fit. Forexample, you can have a mobile app that collects customer data, and theninvokes the scoring Web Service to determine how likely the customer is goingto buy a life insurance. This blog showsa complete demo from building predictive models to deploying a scoring functionas a Web Service. However, the demo doesnot take into account of any authentication and security consideration relatedto Web Services, which is out of the scope of this blog. Web Services Requirement This demo uses the Web Services feature provided by the OracleAPEX 4.2 and OracleREST Data Services 2.0.6 (formerly Oracle APEX Listener). Here are the installation instructions forboth products: For 11g Database Go to the OracleApplication Express Installation Guide and following the instructionsbelow: 1.5.1 Scenario 1: Downloading from OTN and Configuring theOracle Application Express Listener · Step 1: Installthe Oracle Database and Complete Pre-installation Tasks · Step 2: Downloadand Install Oracle Application Express · Step 3: Changethe Password for the ADMIN Account · Step 4: ConfigureRESTful Services · Step 5: RestartProcesses · Step 6: ConfigureAPEX_PUBLIC_USER Account · Step 7: Downloadand Install Oracle Application Express Listener · Step 8: EnableNetwork Services in Oracle Database 11g · Step 9: SecurityConsiderations · Step 10: AboutDeveloping Oracle Application Express in Other Languages · Step 11: AboutManaging JOB_QUEUE_PROCESSES · Step 12: Createa Workspace and Add Oracle Application Express Users For 12c Database Go to OracleApplication Express Installation Guide (Release 4.2 for Oracle Database 12c)and following the instructions below: 4.4 Installing from the Databaseand Configuring the Oracle Application Express Listener · Installthe Oracle Database and Complete Preinstallation Tasks · Downloadand Install Oracle Application Express Listener · ConfigureRESTful Services · EnableNetwork Services in Oracle Database 12c · SecurityConsiderations · AboutRunning Oracle Application Express in Other Languages · AboutManaging JOB_QUEUE_PROCESSES · Createa Workspace and Add Oracle Application Express Users Note: The APEX is pre-installed with the Oracle database 12c, but you need toconfigure it in order to use it. For this demo, create a Workspace called DATAMINER that isbased on an existing user account that has already been granted access to theData Miner (this blog assumes DMUSER is the Data Miner user account). Please refer to the OracleBy Example Tutorials to review how to create a Data Miner user account andinstall the Data Miner Repository. Inaddition, you need to create an APEX user account (for simplicity I useDMUSER). Build Models to Predict BUY_INSURANCE This demo uses the demo data set, INSUR_CUST_LTV_SAMPLE,that comes with the Data Miner installation. Now, let’s use the Classification Build node to build some models usingthe CUSTOMER_ID as the case id and BUY_INSURANCE as the target. Evaluate the Models Nice thing about the Build node is that it builds a set ofmodels with different algorithms within the same mining function by default, sowe can select the best model to use. Let’s look at the models in the Test Viewer; here we can compare themodels by looking at their Predictive Confidence, Overall Accuracy, and AverageAccuracy values. Basically, the modelwith the highest values across these three metrics is the good one to use. As you can see, the winner here is theCLAS_DT_3_6 decision tree model. Next, let’s see what input data columns are used aspredictors for the decision tree model. You can find that information in the Model Viewer below. Surprisingly, it only uses a few columns forthe prediction. These columns will beour input data requirement for the scoring function, the rest of the inputcolumns can be ignored. Score the ModelLet’s complete the workflow with an Apply node, from whichwe will generate the scoring SQL statement to be used for the Web Service. Here we reuse the INSUR_CUST_LTV_SAMPLE dataas input data to the Apply node, and select only the required columns as foundin the previous step. Also, in the ClassBuild node we deselect the other models as output in the Property Inspector(Models tab), so that only decision tree model will be used for the Apply node. The generated scoring SQL statement will useonly the decision tree model to score against the limited set of input columns. Generate SQL Statement for Scoring After the workflow is run successfully, we can generate thescoring SQL statement via the “Save SQL” context menu off the Apply node asshown below. Here is thegenerated SQL statement: /* SQL Deployed byOracle SQL Developer 4.1.0.14.78 from Node "Apply", Workflow "workflowscore", Project "project", Connection "conn_12c" onMar 16, 2014 */ALTER SESSION set "_optimizer_reuse_cost_annotations"=false;ALTER SESSION set NLS_NUMERIC_CHARACTERS=".,";--ALTER SESSION FOR OPTIMIZERWITH /* Start of sql for node: INSUR_CUST_LTV_SAMPLE APPLY */"N$10013" as (select /*+ inline */"INSUR_CUST_LTV_SAMPLE"."BANK_FUNDS", "INSUR_CUST_LTV_SAMPLE"."CHECKING_AMOUNT", "INSUR_CUST_LTV_SAMPLE"."CREDIT_BALANCE", "INSUR_CUST_LTV_SAMPLE"."N_TRANS_ATM", "INSUR_CUST_LTV_SAMPLE"."T_AMOUNT_AUTOM_PAYMENTS" from "DMUSER"."INSUR_CUST_LTV_SAMPLE" )/* End of sql for node: INSUR_CUST_LTV_SAMPLE APPLY */,/* Start of sql for node: Apply */"N$10011" as (SELECT /*+ inline */PREDICTION("DMUSER"."CLAS_DT_3_6" COST MODEL USING *)"CLAS_DT_3_6_PRED", PREDICTION_PROBABILITY("DMUSER"."CLAS_DT_3_6",PREDICTION("DMUSER"."CLAS_DT_3_6" COST MODEL USING *) USING*) "CLAS_DT_3_6_PROB", PREDICTION_COST("DMUSER"."CLAS_DT_3_6" COST MODEL USING *)"CLAS_DT_3_6_PCST"FROM "N$10013" )/* End of sql for node: Apply */select * from "N$10011"; We need to modifythe first SELECT SQL statement to change the data source from a database tableto a record that can be constructed on the fly, which is crucial for real-timescoring. The bind variables (e.g. :funds)are used; these variables will be replaced with actual data (passed in by theWeb Service request) when the SQL statement is executed. /* SQL Deployed byOracle SQL Developer 4.1.0.14.78 from Node "Apply", Workflow"workflow score", Project "project", Connection"conn_12c" on Mar 16, 2014 */WITH /* Start of sql for node: INSUR_CUST_LTV_SAMPLE APPLY */"N$10013" as (select /*+inline */ :funds "BANK_FUNDS", :checking "CHECKING_AMOUNT", :credit "CREDIT_BALANCE", :atm "N_TRANS_ATM", :payments "T_AMOUNT_AUTOM_PAYMENTS" from DUAL)/* End of sql for node: INSUR_CUST_LTV_SAMPLE APPLY */,/* Start of sql for node: Apply */"N$10011" as (SELECT /*+ inline */PREDICTION("DMUSER"."CLAS_DT_3_6" COST MODEL USING *)"CLAS_DT_3_6_PRED", PREDICTION_PROBABILITY("DMUSER"."CLAS_DT_3_6",PREDICTION("DMUSER"."CLAS_DT_3_6" COST MODEL USING *) USING*) "CLAS_DT_3_6_PROB", PREDICTION_COST("DMUSER"."CLAS_DT_3_6" COST MODEL USING *)"CLAS_DT_3_6_PCST"FROM "N$10013" )/* End of sql for node: Apply */select * from "N$10011"; Create Scoring Web Service Assume the Oracle APEX and Oracle REST Data Services havebeen properly installed and configured; we can proceed to create a RESTful webservice for real-time scoring. Thefollowings describe the steps to create the Web Service in APEX: 1. APEX Login You can bring up the APEX login screen by pointing yourbrowser to http://<host>:<port>/ords. Enter your Workspace name and account info tologin. The Workspace should be based onthe Data Miner DMUSER account for this demo to work. 2. Select SQLWorkshop Select the SQLWorkshop icon to proceed. 3. Select RESTfulServices Select the RESTfulServices to create the Web Service. Click the “Create”button to continue. 4. Define RestfulServices Enter the followinginformation to define the scoring Web Service in the RESTful Services Moduleform: Name: buyinsurance URI Prefix: score/ Status: Published URI Template:buyinsurance?funds={funds}&checking={checking}&credit={credit}&atm={atm}&payments={payments} Method: GET Source Type:Query Format: CSV Source: /* SQL Deployed byOracle SQL Developer 4.1.0.14.78 from Node "Apply", Workflow "workflowscore", Project "project", Connection "conn_11204" onMar 16, 2014 */WITH /* Start of sql for node: INSUR_CUST_LTV_SAMPLE APPLY */"N$10013" as (select /*+inline */ :funds "BANK_FUNDS", :checking "CHECKING_AMOUNT", :credit "CREDIT_BALANCE", :atm "N_TRANS_ATM", :payments "T_AMOUNT_AUTOM_PAYMENTS" from DUAL)/* End of sql for node: INSUR_CUST_LTV_SAMPLE APPLY */,/* Start of sql for node: Apply */"N$10011" as (SELECT /*+ inline */PREDICTION("DMUSER"."CLAS_DT_3_6" COST MODEL USING *)"CLAS_DT_3_6_PRED", PREDICTION_PROBABILITY("DMUSER"."CLAS_DT_3_6",PREDICTION("DMUSER"."CLAS_DT_3_6" COST MODEL USING *) USING*) "CLAS_DT_3_6_PROB", PREDICTION_COST("DMUSER"."CLAS_DT_3_6" COST MODEL USING *)"CLAS_DT_3_6_PCST"FROM "N$10013" )/* End of sql for node: Apply */select * from "N$10011"; Note: JSON outputformat is supported. Lastly, create thefollowing parameters that are used to pass the data from the Web Service request(URI) to the bind variables used in the scoring SQL statement. The final RESTfulServices Module definition should look like the following. Make sure the “Requires Secure Access” is setto “No” (HTTPS secure request is not addressed in this demo). Test the Scoring Web Service Let’s create a simple web page using your favorite HTMLeditor (I use JDeveloper to create this web page). The page includes a form that is used tocollect customer data, and then fires off the Web Service request uponsubmission to get a prediction and associated probability. Here is the HTMLsource of the above Form:<!DOCTYPE html><html> <head> <meta http-equiv="Content-Type" content="text/html; charset=UTF-8"/> <title>score</title></head> <body> <h2> Determine if Customer will Buy Insurance </h2> <form action="http://localhost:8080/ords/dataminer/score/buyinsurance" method="get"> <table> <tr> <td>Bank Funds:</td> <td><input type="text" name="funds"/></td> </tr> <tr> <td>Checking Amount:</td> <td><input type="text" name="checking"/></td> </tr> <tr> <td>Credit Balance:</td> <td><input type="text" name="credit"/></td> </tr> <tr> <td>Number ATM Transactions:</td> <td><input type="text" name="atm"/></td> </tr> <tr> <td>Amount Auto Payments:</td> <td><input type="text" name="payments"/></td> </tr> <tr> <td colspan="2" align="right"> <input type="submit" value="Score"/> </td> </tr> </table> </form></body></html> When the Scorebutton is pressed, the form sends a GET HTTP request to the web server with thecollected form data as name-value parameters encoded in the URL. checking=%7bchecking%7d&credit=%7bcredit%7d&atm=%7batm%7d&payments=%7bpayments%7d">http://localhost:8080/ords/dataminer/score/buyinsurance?funds={funds}&checking={checking}&credit={credit}&atm={atm}&payments={payments} Notice the {funds},{checking}, {credit}, {atm}, {payments} will be replaced with actual data fromthe form. This URI matches the URITemplate specified in the RESTful Services Module form above. Let’s test out thescoring Web Service by entering some values in the form and hit the Scorebutton to see the prediction. The prediction alongwith its probability and cost is returned as shown below. Unfortunately, this customer is less likely tobuy insurance. Let’s change somevalues and see if we have any luck. Bingo! This customer is more likely to buyinsurance. Conclusion This blog shows how to deploy Data Miner generated scoringSQL as Web Service, which can be consumed by different systems on differentplatforms from anywhere. In theory, anySQL statement generated from the Data Miner node could potentially be made asWeb Services. For example, we can have aWeb Service that returns Model Details info, and this info can be consumed bysome BI tool for application integration purpose.

The free Oracle Data Miner GUI is an extension to Oracle SQL Developer that enables data analysts to work directly with data inside the database, explore the datagraphically, build and evaluate...

How to generate training and test dataset using SQL Query node in Data Miner

Overview In Data Miner, the Classification and Regression Build nodesinclude a process that splits the input dataset into training and test datasetinternally, which are then used by the model build and test processes withinthe nodes. This internal data split featurealleviates user from performing external data split, and then tie the split datasetinto a build and test process separately as found in other competitiveproducts. However, there are times usermay want to perform an external data split. For example, user may want to generate a single training and test dataset,and reuse them in multiple workflows. The generation of training and test dataset can be done easily via theSQL Query node. Stratified Split The stratified split is used internally by the ClassificationBuild node, because this technique can preserve the categorical targetdistribution in the resulting training and test dataset, which is important forthe classification model build. Thefollowing shows the SQL statements that are essentially used by the ClassificationBuild node to produce the training and test dataset internally: SQL statement for Training dataset SELECT v1.* FROM ( -- randomly divide members of the population into subgroupsbased on target classes SELECT a.*, row_number() OVER (partition by {targetcolumn} ORDER BY ORA_HASH({case id column})) "_partition_caseid" FROM {inputdata} a ) v1, ( -- get thecount of subgroups based on target classes SELECT {targetcolumn}, COUNT(*)"_partition_target_cnt" FROM {input data} GROUP BY {target column} ) v2 WHERE v1. {targetcolumn} = v2. {target column} -- random sample subgroups basedon target classes in respect to the sample size AND ORA_HASH(v1."_partition_caseid", v2."_partition_target_cnt"-1, 0) <= (v2."_partition_target_cnt" * {percent of training dataset} /100) SQL statement for Test dataset SELECT v1.* FROM ( -- randomly divide members of the population into subgroupsbased on target classes SELECT a.*, row_number() OVER (partition by {targetcolumn} ORDER BY ORA_HASH({case id column})) "_partition_caseid" FROM {inputdata} a ) v1, ( -- get thecount of subgroups based on target classes SELECT {targetcolumn}, COUNT(*)"_partition_target_cnt" FROM {input data} GROUP BY {target column} ) v2 WHERE v1. {targetcolumn} = v2. {target column} -- random sample subgroups basedon target classes in respect to the sample size AND ORA_HASH(v1."_partition_caseid", v2."_partition_target_cnt"-1, 0) > (v2."_partition_target_cnt" * {percent of training dataset} /100) The followings describe the placeholders used in the SQLstatements: {target column} - target column. Itmust be categorical type. {case id column} - case id column. Itmust contain unique numbers that identify the rows. {input data} - input data set. {percent of training dataset} - percent of training dataset. For example, if you want to split 60% of inputdataset into training dataset, use the value 60. The test dataset will contain 100%-60% = 40%of the input dataset. The training andtest dataset are mutually exclusive. Random Split The random split is used internally by the Regression Buildnode because the target is usually numerical type. The following shows the SQL statements that areessentially used by the Regression Build node to produce the training and testdataset: SQL statement for Training dataset SELECT v1.* FROM {input data} v1 WHERE ORA_HASH({case id column},99, 0) <= {percent of training dataset} SQL statement for Test dataset SELECT     v1.* FROM {input data} v1 WHERE ORA_HASH({case id column},99, 0) > {percent of training dataset} The followings describe the placeholders used in the SQLstatements: {case id column} - case id column. Itmust contain unique numbers that identify the rows. {input data} - input data set. {percent of training dataset} - percent of training dataset. For example, if you want to split 60% of inputdataset into training dataset, use the value 60. The test dataset will contain 100%-60% = 40%of the input dataset. The training andtest dataset are mutually exclusive. Use SQL Query node to create training and test dataset Assume you want to create the training and test dataset outof the demo INSUR_CUST_LTV_SAMPLE dataset using the stratified split technique;you can create the following workflow to utilize the SQL Query nodes to executethe above split SQL statements to generate the dataset, and then use the CreateTable nodes to persist the resulting dataset. Assume the case id is CUSTOMER_ID, target is BUY_INSURANCE, andthe training dataset is 60% of the input dataset. You can enter the following SQL statement tocreate the training dataset in the “SQL Query Stratified Training” SQL Querynode: SELECT v1.* FROM ( -- randomly divide members of the population into subgroupsbased on target classes SELECT a.*, row_number() OVER (partition by"BUY_INSURANCE" ORDER BY ORA_HASH("CUSTOMER_ID")) "_partition_caseid" FROM"INSUR_CUST_LTV_SAMPLE_N$10009" a ) v1, ( -- get thecount of subgroups based on target classes SELECT"BUY_INSURANCE", COUNT(*)"_partition_target_cnt" FROM"INSUR_CUST_LTV_SAMPLE_N$10009" GROUP BY "BUY_INSURANCE" ) v2 WHERE v1."BUY_INSURANCE"= v2."BUY_INSURANCE" -- random sample subgroups basedon target classes in respect to the sample size AND ORA_HASH(v1."_partition_caseid", v2."_partition_target_cnt"-1, 0) <= (v2."_partition_target_cnt" * 60 / 100) Likewise, you can enter the following SQL statement tocreate the test dataset in the “SQL Query Stratified Test” SQL Query node: SELECT v1.* FROM ( -- randomly divide members of the population into subgroupsbased on target classes SELECT a.*, row_number() OVER (partition by"BUY_INSURANCE" ORDER BY ORA_HASH("CUSTOMER_ID")) "_partition_caseid" FROM"INSUR_CUST_LTV_SAMPLE_N$10009" a ) v1, ( -- get thecount of subgroups based on target classes SELECT"BUY_INSURANCE", COUNT(*)"_partition_target_cnt" FROM"INSUR_CUST_LTV_SAMPLE_N$10009" GROUP BY "BUY_INSURANCE" ) v2 WHERE v1."BUY_INSURANCE"= v2."BUY_INSURANCE" -- random sample subgroups basedon target classes in respect to the sample size AND ORA_HASH(v1."_partition_caseid", v2."_partition_target_cnt"-1, 0) > (v2."_partition_target_cnt" * 60 / 100) Now run the workflow to create the training and test dataset. You can find the table names of the persisteddataset in the associated Create Table nodes. Conclusion This blog shows how easily to create the training and testdataset using the stratified split SQL statements via the SQL Query nodes. Similarly, you can generate the training andtest dataset using the random split technique by replacing SQL statements withthe random split SQL statements in the SQL Query nodes in the above workflow. If a large dataset (tens of millions of rows)is used in multiple model build nodes, it may be a good idea to split the data aheadof time to optimize the overall processing time (avoid multiple internal datasplits inside the model build nodes).

Overview In Data Miner, the Classification and Regression Build nodes include a process that splits the input dataset into training and test datasetinternally, which are then used by the model build...

dunnhumby Accelerates Complex Segmentation Queries from Weeks to Minutes—Gains Competitive Advantage

See original story on http://www.oracle.com/us/corporate/customers/customersearch/dunnhumby-1-exadata-ss-2137635.html dunnhumby Accelerates Complex Segmentation Queries from Weeks to Minutes—Gains Competitive Advantage Oracle Customer: dunnhumby Ltd.Location:  London, EnglandIndustry: Professional ServicesEmployees:  2,000 Printer View dunnhumby is the world’s leading customer-science company. It analyzes customer data and applies insights from more than 400 million customers across the globe to create better customer experiences and build loyalty. With its unique analytical capabilities, dunnhumby helps retailers better serve customers, create a competitive advantage, and enjoy sustained growth. Challenges A word from dunnhumby Ltd. “Oracle Exadata Database Machine is helping us to transform our business and improve our competitive edge. We can now complete queries that took weeks in just minutes—driving new product offerings, more competitive bids, and more accurate analyses based on 100% of data instead of just a sampling.” – Chris Wones, Director of Data Solutions, dunnhumby USA Expand breadth of services to maintain a competitive advantage in the customer-science industry Provide clients, including major retail organizations in the United Kingdom and North America, with expanded historical and real-time insight into customer behavior, buying tendencies, and response to promotional campaigns and product offerings Ensure competitive pricing for the company’s customer-analysis services while delivering added value to a growing client base Analyze growing volumes of data rapidly and comprehensively Ensure the security of sensitive information, including protected personal information to reduce risk and support compliance Protect against data loss and reduce the backup and recovery window, as data is crucial to the competitive advantage and success of the business Optimize IT investment and performance across the technology-intensive business Reduce licensing and maintenance costs of previous analytical and data warehouse software Solutions Oracle Product and Services Oracle Exadata Database Machine Oracle ZFS Storage Appliance Oracle GoldenGate Oracle Partitioning Oracle Advanced Analytics Oracle Advanced Compression Oracle Advanced Security Oracle Identity and Access Management Suite  Oracle Directory Services Plus Oracle Enterprise Single Sign-On Suite Plus Deployed Oracle Exadata Database Machine and accelerated queries that previously took two-to-three weeks to just minutes, enabling the company to bid on more complex, custom analyses and gain a competitive advantage Achieved 4x to 30x more data compression using Hybrid Columnar Compression and Oracle Advanced Compression across sets—reducing storage requirements, increasing analysis and backup performance, and optimizing IT investment Consolidated data marts securely with data warehouse schemas in Oracle Exadata, enabling extremely faster presummarizations of large volumes of data Accelerated analytic capabilities to near real time using Oracle Advanced Analytics and third-party tools, enabling analysis of unstructured big data from emerging sources, like smart phones Accelerated segmentation and customer-loyalty analysis from one week to just four hours—enabling the company to deliver more timely information as well as finer-grained analysis Improved analysts’ productivity and focus as they can now run queries and complete analysis without having to wait hours or days for a query to process Generated more accurate business insights and marketing recommendations with the ability to analyze 100% of data—including years of historical data—instead of just a small sample Improved accuracy of marketing recommendations by analyzing larger sample sizes and predicting the market’s reception to new product ideas and strategies Improved secure processing and management of 60 TB of data, growing at a rate of 500 million customer records a week, including information from clients’ customer loyalty programs  Ensured data security and compliance with requirements for safeguarding protected personal information and reduced risk with Oracle Advanced Security, Oracle Directory Services Plus, and Oracle Enterprise Single Sign-On Suite Plus Gained high-performance identity virtualization, storage, and synchronization services that meet the needs of the company’s high-volume environment Ensured performance scalability even with concurrent queries with Oracle Exadata, which demonstrated higher throughput than competing solutions under such conditions Deployed integrated backup and recovery using Oracle’s Sun ZFS Backup Appliance—to support high performance and continuous availability and act as a staging area for both inbound and outbound extract, transform, and load processes Why Oracle dunnhumby considered Teradata, IBM Netezza, and other solutions, and selected Oracle Exadata for its ability to sustain high performance and throughput even during concurrent queries. “We needed to see how the system performed when scaled to multiple concurrent queries, and Oracle Exadata’s throughput was much higher than competitive offerings,” said Chris Wones, director of data solutions, dunnhumby, USA. Implementation Process dunnhumby began its Oracle Exadata implementation in September 2012 and went live in April 2013. It has installed four Oracle Exadata machine units in the United States and four in the United Kingdom. The company is using three of the four machines in each country as production environments and one machine in each country for development and testing. dunnhumby runs an active-active environment across its Oracle Exadata clusters to ensure high availability. Resources dunnhumby Increases Customer Loyalty with Oracle Big Data More Reference Assets About Professional Services

See original story on http://www.oracle.com/us/corporate/customers/customersearch/dunnhumby-1-exadata-ss-2137635.html dunnhumby Accelerates Complex Segmentation Queries from Weeks to Minutes—Gains...

How to generate Scatterplot Matrices using R script in Data Miner

Data Miner provides Explorer node that produces descriptivestatistical data and histogram graph, which allows analyst to analyze input datacolumns individually. Often time an analystis interested in analyzing the relationships among the data columns, so that hecan choose the columns that are closely correlated to the target column formodel build purpose. To examinerelationships among data columns, he can create scatter plots using the Graphnode. For example, an analyst may want to build a regression modelthat predicts the customer LTV (long term value) using the INSUR_CUST_LTV_SAMPLEdemo data. Before building the model, hecan create the following workflow with the Graph node to examine therelationships between interested data columns and the LTV target column. In the Graph node editor, create a scatter plot with an interesteddata column (X Axis) against the LTV target column (Y Axis). For the demo, let’s create three scatterplots using these data columns: HOUSE_OWNERSHIP, N_MORTGAGES, andMORTGAGE_AMOUNT. Here are the scatter plots generated by the Graph node. As you can see the HOUSE_OWNERSHIP andN_MORTGAGES are quite positively correlated to the LTV target column. However, the MORTGAGE_AMOUNT seems lesscorrelated to the LTV target column. The problem with the above approach is it is laborious tocreate scatter plots one by one and you cannot examine relationships amongthose data columns themselves. To solvethe problem, we can create a Scatterplot matrix graph as the following: This is a 4 x4 scatterplot matrix of data column LTV,HOUSE_OWNERSHIP, N_MORTGAGES, and MORTGAGE_AMOUNT. In the top row, you can examine the relationshipsbetween HOUSE_OWNERSHIP, N_MORTGAGES, and MORTGAGE_AMOUNT against the LTVtarget column. In the second row, youcan examine the relationships between LTV, N_MORTGAGES, and MORTGAGE_AMOUNTagainst the HOUSE_OWNERSHIP column. Inthe third and forth rows, you can examine the relationships of other columnsagainst the N_MORTGAGES, and MORTGAGE_AMOUNT respectively. To generate this scatterplot matrix, we need to invoke thereadily available R script RQG$pairs (via the SQL Query node) in the Oracle REnterprise. Please refer to http://www.oracle.com/technetwork/database/options/advanced-analytics/r-enterprise/index.html?ssSourceSiteId=ocomenfor Oracle R Enterprise installation. Let’s create the following workflow with the SQL Query nodeto invoke the R script. Note: a Samplenode may be needed to sample down the data size (e.g. 1000 rows) for large dataset before it is used for charting. Enter the following SQL statement in the SQL Queryeditor. The rqTableEval is a R SQLfunction that allows user to invoke R script from the SQL side. The first SELECT statement within thefunction specifies the input data (LTV, HOUSE_OWNERSHIP, N_MORTGAGES, andMORTGAGE_AMOUNT). The second SELECTstatement specifies the optional parameter to the R script, where we define thegraph title “Scatterplot Matrices”. Theoutput of the function is an XML document with the graph data embedded in it. SELECT VALUE FROM TABLE(rqTableEval(cursor(select "INSUR_CUST_LTV_SAMPLE_N$10001"."LTV","INSUR_CUST_LTV_SAMPLE_N$10001"."HOUSE_OWNERSHIP","INSUR_CUST_LTV_SAMPLE_N$10001"."N_MORTGAGES","INSUR_CUST_LTV_SAMPLE_N$10001"."MORTGAGE_AMOUNT"from "INSUR_CUST_LTV_SAMPLE_N$10001"), -- Input Cursorcursor(select 'Scatterplot Matrices' as MAIN from DUAL), -- Param Cursor'XML', -- Output Definition'RQG$pairs' -- R Script)) You can see what default R scripts are available in the RScripts tab. This tab is visible onlywhen the Oracle R Enterprise installation is detected. Click the buttonin the toolbar to invoke the R script to produce the Scatterplot matrix below. You can copy the Scatterplot matrix image to a clipboard orsave it to an image file (PNG) for reporting purpose. To do so, right click on the graph to bring upthe pop-up menu below. The Scatterplot matrix is also available in the Data Viewerof the SQL Query node. To open the DataViewer, select the “View Data” item in the pop-up menu of the node. The returning XML data is shown in the Data Viewer as shownbelow. To view the Scatterplot matrixembedded in the data, click on the XML data to bring up the iconin the far right of the cell, and then click on the icon to bring up the viewer.

Data Miner provides Explorer node that produces descriptive statistical data and histogram graph, which allows analyst to analyze input data columns individually. Often time an analystis interested in...

How to export data from the Explore Node using Data Miner and SQL Developer

Blog posting by Denny Wong, Principal Member of Technical Staff, User Interfaces and Components, Oracle Data Mining Development The Explorer node generates descriptive statistical data andhistogram data for all input table columns. These statistical and histogram data may help user to analyze the inputdata to determine if any action (e.g. transformation) is needed before using itfor data mining purpose.  An analyst maywant to export this data to a file for offline analysis (e.g. Excel) orreporting purpose.  The Explorer nodegenerates this data to a database table specified in the Output tab of the PropertyInspector.  In this case, the data isgenerated to a table named “OUTPUT_1_2”. To export the table to a file, we can use the SQL DeveloperExport wizard. Go to the Connections tabin the Navigator Window, search for the table “OUTPUT_1_2” within the properconnection, then bring up the pop-up menu off the table. Click on the Export menu to launch the ExportWizard. In the wizard, uncheck the “Export DDL” and select the“Export Data” option since we are only interested in the data itself. In the Format option, select “excel” in thisexample (a dozen of output formats are supported) and specify the output filename. Upon wizard finish, an excel fileis generated. Let’s open the file to examine what is in it. As expected, it contains all statistical datafor all input columns. The histogramdata is listed as the last column (HISTOGRAMS), and it has thisODMRSYS.ODMR_HISTOGRAMS structure. For example, let’s take a closer look at the histogram datafor the BUY_INSURANCE column: ODMRSYS.ODMR_HISTOGRAMS(ODMRSYS.ODMR_HISTOGRAM_POINT('"BUY_INSURANCE"',''No'',NULL,NULL,73.1),ODMRSYS.ODMR_HISTOGRAM_POINT('"BUY_INSURANCE"',''Yes'',NULL,NULL,26.9)) This column contains an ODMRSYS.ODMR_HISTOGRAMS object which is anarray of ODMRSYS.ODMR_HISTOGRAM_POINT structure. We can describe the structure to see what isin it. The ODMRSYS.ODMR_HISTOGRAM_POINTcontains five attributes, which represent the histogram data. The ATTRIBUTE_NAME contains the attributename (e.g. BUY_INSURANCE), the ATTRIBUTE_VALUE contains the attribute values(e.g. No, Yes), the GROUPING_ATTRIBUTE_NAME and GROUPING_ ATTRIBUTE_VALUE arenot used (these fields are used when the Group By option is specified), and theATTRIBUTE_PERCENT contains the percents (e.g. 73.1, 26.9) for the attributevalues respectively. As you can see the ODMRSYS.ODMR_HISTOGRAMScomplex output format may be difficult to read and it may require someprocessing before the data can be used. Alternatively, we can “unnest” the histogram data to transactional data formatbefore exporting it. This way we don’thave to deal with the complex array structure, thus the data is moreconsumable. To do that, we can write asimple SQL query to “unnest” the data and use the new SQL Query node (Extracthistogram data) to run this query (see below). We then use a Create Table node (Explorer output table) to persist the“unnested” histogram data along with the statistical data. 1. Create a SQL Query node Create a SQL Query node andconnect the “Explore Data” node to it. You may rename the SQL Query node to “Extract histogram data” to make itclear it is used to “unnest” the histogram data. 2. Specify a SQL query to “unnest” histogram data Double click the “Extract histogram data” node to bring up the editor, enterthe following SELECT statement in the editor:SELECT    "Explore Data_N$10002"."ATTR",    "Explore Data_N$10002"."AVG",    "Explore Data_N$10002"."DATA_TYPE",    "Explore Data_N$10002"."DISTINCT_CNT",    "Explore Data_N$10002"."DISTINCT_PERCENT",    "Explore Data_N$10002"."MAX",    "Explore Data_N$10002"."MEDIAN_VAL",    "Explore Data_N$10002"."MIN",    "Explore Data_N$10002"."MODE_VALUE",    "Explore Data_N$10002"."NULL_PERCENT",    "Explore Data_N$10002"."STD",    "Explore Data_N$10002"."VAR",    h.ATTRIBUTE_VALUE,    h.ATTRIBUTE_PERCENTFROM    "Explore Data_N$10002", TABLE("Explore Data_N$10002"."HISTOGRAMS") h Click OK to close the editor. This query is used to extract out the ATTRIBUTE_VALUE and ATTRIBUTE_PERCENT fields from the ODMRSYS.ODMR_HISTOGRAMS nested object.Note: you may select only columns that contain the statistics you are interested in.  The "Explore Data_N$10002" is a generated unique name reference to the Explorer node, you may have a slightly different name ending with some other unique number.  The query produces the following output.  The last two columns are the histogram datain transactional format. 3. Create a Create Table node to persist the “unnested”histogram data Create a Create Table nodeand connect the “Extract histogram data” node to it. You may rename the Create Table node to “Exploreroutput table” to make it clear it is used to persist the “unnested” histogramdata. 4. Export “unnested” histogram data to Excel file Run the “Explorer output table” node to persist the“unnested” histogram data to a table. The name of the output table (OUTPUT_3_4) can be found in the PropertyInspector below. Next, we can use the SQL Developer Export wizard asdescribed above to export the table to an Excel file. As you can see the histogram data are now intransactional format; they are more readable and can readily be consumed.

Blog posting by Denny Wong, Principal Member of Technical Staff, User Interfaces and Components, Oracle Data Mining Development The Explorer node generates descriptive statistical data andhistogram...

Oracle BIWA Summit 2014 January 14-16, 2014 at Oracle HQ in Redwood Shores, CA

Oracle Business Intelligence, Warehousing & Analytics Summit - Redwood CityOracle is a proud sponsor of the Business Intelligence, Warehousing & Analytics (BIWA) Summit happening January 14 – 16 at the Oracle Conference Center in Redwood City.The Oracle BIWA Summit brings together Oracle ACE experts, customers who are currently using or planning to use Oracle BI, Warehousing and Analytics products and technologies, partners and Oracle Product Managers, Support Personnel and Development Managers.Join us on Tuesday, January 14 at 5 p.m. to hear featured speaker Balaji Yelamanchili, Senior Vice President Analytics and Performance Management Products, for his keynote: Oracle Business Intelligence -- Innovate Faster.Visit the BIWA site http://www.biwasummit.com/ for more information today. Among the approximately 50 technical presentations, featured talks a Hands on Labs, I'll be delivering a presentation on Oracle Advanced Analytics and a Hands on Lab on using the OAA/Oracle Data Miner GUI.   AA-1010 BEST PRACTICES FOR IN-DATABASE ANALYTICSSession ID: AA-1010Presenter: Charlie Berger, OracleAbstract:In the era of Big Data, enterprises are acquiring increasing volumes and varieties of data from a rapidly growing range of internet, mobile, sensor and other real-time and near real-time sources.  The driving force behind this trend toward Big Data analysis is the ability to use this data for “actionable intelligence” -- to predict patterns and behaviors and to deliver essential information when and where it is needed. Oracle Database uniquely offers a powerful platform to perform this predictive analytics and location analysis with in-database data mining, statistical processing and SQL Analytics.  Oracle Advanced Analytics embeds powerful data mining algorithms and adds enterprise scale open source R to solve problems such as predicting customer behavior, anticipating churn, detecting fraud, market basket analysis and discovering customer segments.  Oracle Data Miner GUI , a new SQL Developer 4.0 Extension, enables business analysts to quickly analyze data and visualize data, build, evaluate and apply predictive models and deploy via SQL scripts sophisticated predictive analytics methodologies—all while keeping the data inside the Oracle Database.  Come learn best practices and customer examples for exploiting Oracle’s scalable, performant and secure in-database analytics capabilities to extract more value and actionable intelligence from your data.HOL-AA-1008 LEARN TO USE ORACLE ADVANCED ANALYTICS FOR PREDICTIVE ANALYTICS SOLUTIONSSession ID: HOL-AA-1008Presenter: Charles Berger, Oracle & Karl Rexer, Rexer AnalyticsAbstract:Big Data;  Bigger Insights!  Oracle Data Mining Release 12c, a component of the Oracle Advanced Analytics database Option, embeds powerful data mining algorithms in the SQL kernel of the Oracle Database for problems such as predicting customer behavior, anticipating churn, identifying up-sell and cross-sell, detecting anomalies and potential fraud, market basket analysis, customer profiling, text mining and retail market basket analysis.  Oracle Data Miner GUI , a new SQL Developer 4.0 Extension, enables business analysts to quickly analyze data and visualize data, build, evaluate and apply predictive models and develop sophisticated predictive analytics methodologies—all while keeping the data inside Oracle Database.  Come see how easily you can discover big insights from your Oracle data and generate SQL scripts for deployment and automation and deploying results into Oracle Business Intelligence (OBIEE) dashboards.  var _gaq = _gaq || []; _gaq.push(['_setAccount', 'UA-46756583-1']); _gaq.push(['_trackPageview']); (function() { var ga = document.createElement('script'); ga.type = 'text/javascript'; ga.async = true; ga.src = ('https:' == document.location.protocol ? 'https://ssl' : 'http://www') + '.google-analytics.com/ga.js'; var s = document.getElementsByTagName('script')[0]; s.parentNode.insertBefore(ga, s); })();

Oracle Business Intelligence, Warehousing & Analytics Summit - Redwood City Oracle is a proud sponsor of the Business Intelligence, Warehousing & Analytics (BIWA) Summit happening January 14 – 16 at...

Come See and Test Drive Oracle Advanced Analytics at the BIWA Summit'14, Jan 14-16, 2014

The BIWA Summit '14January 14-16 at Oracle HQ Conference Center Detailed Agendais now published.    Please sharewith your others by Tweeting, Blogging, Facebook, LinkedIn, Email, etc.! The BIWA Summit is known for novel and interesting use cases of Oracle Big Data, Exadata,Advanced Analytics/Data Mining, OBIEE, Spatial, Endeca and more!   Opportunities to get hands on experience with products in the Handson Labs, great customer case studies and talks by Oracle TechnicalProfessionals and Partners.  Meet with technical experts.  Click HERE to read detailed abstractsand speaker profiles.  Use the SPECIAL DISCOUNT code ORACLE12C andregistration is only $199 for the 2.5 day technically focused Oracle user groupevent. Charlie  (Oracle Employee Advisor to Oracle BIWA Special InterestUser Group) ---- var _gaq = _gaq || []; _gaq.push(['_setAccount', 'UA-46756583-1']); _gaq.push(['_trackPageview']); (function() { var ga = document.createElement('script'); ga.type = 'text/javascript'; ga.async = true; ga.src = ('https:' == document.location.protocol ? 'https://ssl' : 'http://www') + '.google-analytics.com/ga.js'; var s = document.getElementsByTagName('script')[0]; s.parentNode.insertBefore(ga, s); })();

The BIWA Summit '14 January 14-16 at Oracle HQ Conference Center Detailed Agenda is now published.    Please share with your others by Tweeting, Blogging, Facebook, LinkedIn, Email, etc.! The BIWA Summit...

Oracle Big Data Learning Library

Click on LEARN BY PRODUCT to view all learning resources. Oracle Big Data Learning Library... Learn about Oracle Big Data, Data Science, Data Mining, Oracle NoSQL Database, and more! Oracle Big Data Essentials Attend this Oracle University Course! Using Oracle NoSQL Database Attend this Oracle University class! Oracle and Big Data on OTN See the latest resource on OTN. Search Welcome Get Started Learn by Role Learn by Product Latest Additions Additional Resources Oracle Big Data Appliance Oracle Big Data and Data Science Basics Meeting the Challenge of Big Data Oracle Big Data Tutorial Video Series Oracle MoviePlex - a Big Data End-to-End Series of Demonstrations  Oracle Big Data Overview  Oracle Big Data Essentials Oracle NoSQL Database Oracle NoSQL Database Tutorial Videos Oracle NoSQL Database Tutorial Series Oracle NoSQL Database Release 2 New Features  Using Oracle NoSQL Database Data Mining Oracle Data Mining 11g Release 2 OBE Series Oracle Database 12c: Oracle Data Mining New Features Oracle Advanced Analytics and Oracle Data Mining Demonstration Solving Business Problems Using Oracle Data Mining  Oracle Data Mining Essentials - 2 day class Oracle R Enterprise Oracle R Enterprise Tutorial Series Oracle Big Data Connectors  Integrate All Your Data with Oracle Big Data Connectors Using Oracle Direct Connector for HDFS to Read the Data from HDSF Using Oracle R Connector for Hadoop to Analyze Data Oracle Data Integrator Oracle Data Integrator Application Adapter for Hadoop Oracle Data Integrator 12c: Getting Started Series Exalytics Enterprise Manager 12c R3: Manage Exalytics Setting Up and Running Summary Advisor on an Exalytics Machine Oracle Business Intelligence Enterprise Edition  Oracle Business Intelligence  Oracle BI 11g R1: Create Analyses and Dashboards - 4 day class  Oracle BI Publisher 11g R1: Fundamentals - 3 day class  Oracle BI 11g R1: Build Repositories - 5 day class var _gaq = _gaq || []; _gaq.push(['_setAccount', 'UA-46756583-1']); _gaq.push(['_trackPageview']); (function() { var ga = document.createElement('script'); ga.type = 'text/javascript'; ga.async = true; ga.src = ('https:' == document.location.protocol ? 'https://ssl' : 'http://www') + '.google-analytics.com/ga.js'; var s = document.getElementsByTagName('script')[0]; s.parentNode.insertBefore(ga, s); })();

Click on LEARN BY PRODUCT to view all learning resources. Oracle Big Data Learning Library... Learn about Oracle Big Data, Data Science, Data Mining, Oracle NoSQL Database, and more! Oracle Big Data...

Oracle Data Miner (Extension of SQL Developer 4.0) Integrate Oracle R Enterprise Mining Algorithms into workflow using the SQL Query node

I posted a new white paper authored by Denny Wong, Principal Member of Technical Staff, User Interfaces and Components, Oracle Data Mining Technologies.  You can access the white paper here and the companion files here.  Here is an excerpt: Oracle DataMiner (Extension of SQL Developer 4.0)  Integrate Oracle R Enterprise Mining Algorithms into workflowusing the SQL Query node Oracle R Enterprise (ORE), a component of the OracleAdvanced Analytics Option, makes the open source R statistical programminglanguage and environment ready for the enterprise and big data. Designed forproblems involving large amounts of data, Oracle R Enterprise integrates R withthe Oracle Database. R users can develop, refine and deploy R scripts thatleverage the parallelism and scalability of the database to perform predictive analyticsand data analysis. Oracle Data Miner (ODMr) offers a comprehensive set ofin-database algorithms for performing a variety of mining tasks, such asclassification, regression, anomaly detection, feature extraction, clustering,and market basket analysis. One of theimportant capabilities of the new SQL Query node in Data Miner 4.0 is a simplifiedinterface for integrating R scripts registered with the database. This providesthe support necessary for R Developers to provide useful mining scripts for useby data analysts. This synergy providesmany additional benefits as noted below. · R developers can further extend ODMr miningcapabilities by incorporating the extensive R mining algorithms from the opensource CRAN packages or leveraging any user developed custom R algorithms viaSQL interfaces provided by ORE. · Since this SQL Query node can be part of aworkflow process, R scripts can leverage functionalities provided by otherworkflow nodes which can simplify the overall effort of integrating Rcapabilities within the database. · R miningcapabilities can be included in the workflow deployment scripts produced by thenew sql script generation feature. So the ability of deploy R functionalitywithin the context of an Data Miner workflow is easily accomplished. · Data and processing are secured and controlledby the Oracle Database. This alleviates a lot of risk that are incurred byother providers, when users have to export data out of the database in order toperform advanced analytics. Oracle Advanced Analytics saves analysts, developers,database administrators and management the headache of trying to integrate Rand database analytics. Instead, userscan quickly gain the benefit of new R analytics and spend their time and efforton developing business solutions instead of building homegrown analyticalplatforms. This paper should be very useful to R developers wishing tobetter understand how to leverage imbedding R Scripts for use by DataAnalysts.  Analysts will also find thepaper useful to see how R features can be surfaced for their use in Data Miner. The specific use case covereddemonstrates how to use the SQL Query node to integrate R glm and rpartregression model build, test, and score operations into the workflow along withnodes that perform data preparation and residual plot graphing. However, the integration process describedhere can easily be adapted to integrate other R operations like statisticaldata analysis and advanced graphing to expand ODMr functionalities. var _gaq = _gaq || []; _gaq.push(['_setAccount', 'UA-46756583-1']); _gaq.push(['_trackPageview']); (function() { var ga = document.createElement('script'); ga.type = 'text/javascript'; ga.async = true; ga.src = ('https:' == document.location.protocol ? 'https://ssl' : 'http://www') + '.google-analytics.com/ga.js'; var s = document.getElementsByTagName('script')[0]; s.parentNode.insertBefore(ga, s); })();

I posted a new white paper authored by Denny Wong, Principal Member of Technical Staff, User Interfaces and Components, Oracle Data Mining Technologies.  You can access the white paper here and the...

Oracle Data Miner GUI, part of SQL Developer 4.0 Early Adopter 1 is now available for download on OTN

The NEW Oracle DataMiner GUI, part of SQL Developer 4.0 Early Adopter 1 is now available fordownload on OTN.  See link to SQL Developer 4.0 EA1.    The Oracle Data Miner 4.0 NewFeatures are applicableto Oracle Database 11g Release 2 and Oracle Database Release 12c:  See Oracle Data Miner Extension to SQL Developer 4.0 Release Notes for EA1 for additional information   · WorkflowSQL Script Deployment o Generates SQL scripts to support fulldeployment of workflow contents · SQLQuery Node o Integrate SQL queries to transform dataor provide a new data source o Supports the running of R LanguageScripts and viewing of R generated data and graphics · GraphNode o Generate Line, Scatter, Bar, Histogramand Box Plots · ModelBuild Node Improvements o Node level data usage specificationapplied to underlying models o Node level text specifications to governtext transformations o Displays heuristic rules responsible forexcluding predictor columns o Ability to control the amount ofClassification and Regression test results generated · ViewData o Ability to drill in to view customobjects and nested tables These new Oracle Data Miner GUIcapabilities expose Oracle Database 12c and Oracle Advanced Analytics/Data Mining Release1 features: · PredictiveQuery Nodes o Predictive results without the need tobuild models using Analytical Queries o Refined predictions based on datapartitions · ClusteringNode New Algorithm o Added Expectation Maximization algorithm · FeatureExtraction Node New Algorithms o Added Singular Value Decomposition andPrincipal Component Analysis algorithms · TextMining Enhancements o Text transformations integrated as partof Model's Automatic Data Preparation o Ability to import Build Text node specificationsinto a Model Build node · PredictionResult Explanations o Scoring details that explain predictiveresult · GeneralizedLinear Model New Algorithm Settings o New algorithm settings provide featureselection and generation See OAAon OTN pages http://www.oracle.com/technetwork/database/options/advanced-analytics/index.html for more information on Oracle Advanced Analytics. var _gaq = _gaq || []; _gaq.push(['_setAccount', 'UA-46756583-1']); _gaq.push(['_trackPageview']); (function() { var ga = document.createElement('script'); ga.type = 'text/javascript'; ga.async = true; ga.src = ('https:' == document.location.protocol ? 'https://ssl' : 'http://www') + '.google-analytics.com/ga.js'; var s = document.getElementsByTagName('script')[0]; s.parentNode.insertBefore(ga, s); })();

The NEW Oracle Data Miner GUI, part of SQL Developer 4.0 Early Adopter 1 is now available for download on OTN.  See link to SQL Developer 4.0 EA1.    The Oracle Data Miner 4.0 NewFeatures are...

Oracle Advanced Analytics and Data Mining at the Movies on YouTube - Updated May 13, 2018

Updated May 13 2018 Periodically, I've recorded a demonstration and/or presentation on Oracle Advanced Analytics and Data Mining and have posted them on YouTube. Here are links to some of more recent YouTube postings--sort of an Oracle Advanced Analytics and Data Mining at the Movies experience. The Naked Future:  What Happens in a World that Anticipates your every Move?  This fun presentation/demo inspired by the book of the same name by Patrick Tucker shows a very interesting "near future" made possible by big data + machine learning.   It also demonstrates many examples and enabling technologies that exist today and make this sort of future all the more likely! Mining Structured and Unstructured Data using Oracle Advanced Analytics (slides)  - Watch on YouTube   New Big Data Analyics using Oracle Advanced Analytics12c and Big Data SQL  - Watch on YouTube   New - Oracle Academy Webcast:  Ask the Oracle Experts Fraud &  Anomaly Detection using Oracle Advanced Analytics 12c & Big Data SQL - Watch YouTube   New - Oracle Academy Webcast:  Ask the Oracle Experts Big Data Analytics with Oracle Advanced Analytics - Watch YouTube   Oracle Data Miner and Oracle R Enterprise Integration via SQL Query node - Watch Demo   Oracle Data Miner 4.0 (SQL Developer 4.0 Extension) New Features - Watch Demo   Oracle Business Intelligence Enterprise Edition (OBIEE) SampleAppls Demo featuring integration with Oracle Advanced Analytics/Data Mining   Oracle Big Data Analytics Demo mining remote sensor data from HVACs for better customer service    In-Database Data Mining for Retail Market Basket Analysis Using Oracle Advanced Analytics   In-Database Data Mining Using Oracle Advanced Analytics for Classification using Insurance Use Case   Fraud and Anomaly Detection using Oracle Advanced Analytics Part 1 Concepts   Fraud and Anomaly Detection using Oracle Advanced Analytics Part 2 Demo   Overview Presentation and Demonstration of Oracle Advanced Analytics Database Option So.... grab your popcorn and a comfortable chair.  Hope you enjoy! Charlie   

Updated May 13 2018 Periodically, I've recorded a demonstration and/or presentation on Oracle Advanced Analytics and Data Mining and have posted them on YouTube. Here are links to some of more recent...

Oracle OpenWorld Call for Proposals now OPEN Submit your Oracle Advanced Analytics/Data Mining/ORE talks today!!

Calling All Oracle OpenWorld Oracle Advanced Analytics, Data Mining and R Experts The Call for Proposals is open. Have something interesting to present to the world’s largest gathering of Oracle technologists and business leaders? Making breakthrough innovations with Java or MySQL? We want to hear from you, and so do the attendees at this year’s Oracle OpenWorld, JavaOne, and MySQL Connect conferences. Submit your proposal now for a chance to share your expertise at one of the most important technology and business conferences of the year. CHOOSE... Select one of Oracle’s premiere conferences SHARE... Submit your proposal for sharing your most innovative ideas and experiences JOIN... Connect with the elite community of Oracle OpenWorld, JavaOne, and MySQL Connect session leaders in 2013 We recommend you take the time to review the General Information, Content Program Policies, and Tips and Guidelines pages before you begin. We look forward to your submissions! Submit Papers Please submit your papers by clicking on the link below and then select the event for which you are submitting.Submit Now! GENERAL INFORMATION SUBMISSION INFORMATION PROGRAM POLICIES TIPS AND GUIDELINES General Information Conferences location: San Francisco, California, USA Dates Oracle OpenWorld: Sunday, September 22, 2013–Thursday, September 26, 2013 JavaOne: Sunday, September 22, 2013–Thursday, September 26, 2013 MySQL Connect: Saturday, September 21–Monday, September 23, 2013 Key 2013 deadlines Deliverables Due Dates Call for Proposals–Open Wednesday, March 13 Call for Proposals–Closed Friday, April 12, 11:59 p.m. PDT Notifications for accepted and declined submissions sent Mid-June For Oracle OpenWorld, Oracle employee submitters will need to contact the appropriate Oracle track leads before submitting. To view a list of track leads, click here Contact us: For questions regarding the Call for Proposals, send an e-mail to speaker-services_ww@oracle.com For technical questions about the submission tool or issues with submitting your proposal, send an e-mail to OpenWorldContent@gpj.com

Calling All Oracle OpenWorld Oracle Advanced Analytics, Data Mining and R Experts The Call for Proposals is open. Have something interesting to present to the world’s largest gathering of Oracle...

Take a FREE Test Drive with Oracle Advanced Analytics/Data Mining on the Amazon Cloud

I wanted to highlight a wonderful new resource provided by our partner Vlamis Software.  Extremely easy!  Fill out the form, wait a few minutes for the Amazon Cloud instance to start up and them BAM!  You can login and start using the Oracle Advanced Analytics Oracle Data Miner work flow GUI.  Demo data and online Oracle by Example Learning Tutorials are also provided to ensure your data mining test drive is a positive one,  Enjoy!!  Test Drive -- Powered by Amazon AWS We have partnered with Amazon Web Services to provide to you, free of charge, the opportunity to work, hands-on, with the latest of Oracle's Business Intelligence offerings. By signing up to one of the labs below, Amazon's Elastic Cloud Computer (EC2) environment will generate a complete server for you to work with. These hands on labs are working with the actual Oracle software running on the Amazon Web Services EC2 environment. They each take approximately 2 hours to work through and will give you hands-on experience with the software and a tour of the features. Your EC2 environment will be available for you for 5 hours, at which time it will self-terminate. If, after registration, you need additional time or need further instructions, simply reply to the registration email and we would be glad to help you. Data Mining This test drive walks through some basic exercises in doing predictive analytics within an Oracle 11g Database instance using the Oracle Data Miner extension for Oracle SQL Developer. You use a drag-and-drop "workflow" interface to build a data mining model that predicts the likelihood of purchase for a set of prospects. Oracle Data Mining is ideal for automatically finding patterns, understanding relationships, and making predictions in large data sets. var _gaq = _gaq || []; _gaq.push(['_setAccount', 'UA-46756583-1']); _gaq.push(['_trackPageview']); (function() { var ga = document.createElement('script'); ga.type = 'text/javascript'; ga.async = true; ga.src = ('https:' == document.location.protocol ? 'https://ssl' : 'http://www') + '.google-analytics.com/ga.js'; var s = document.getElementsByTagName('script')[0]; s.parentNode.insertBefore(ga, s); })();

I wanted to highlight a wonderful new resource provided by our partner Vlamis Software.  Extremely easy!  Fill out the form, wait a few minutes for the Amazon Cloud instance to start up and them BAM!  ...

Turkcell Combats Pre-Paid Calling Card Fraud Using In-Database Oracle Advanced Analytics

Turkcell İletişim Hizmetleri A.S. Successfully Combats Communications Fraud with Advanced In-Database Analytics [Original link available on oracle.com http://www.oracle.com/us/corporate/customers/customersearch/turkcell-1-exadata-ss-1887967.html] Oracle Customer: Turkcell İletişim Hizmetleri A.Ş. Location:  Istanbul, Turkey Industry: Communications Employees:  3,583 Annual Revenue:  Over $5 Billion Turkcell İletişim Hizmetleri A.Ş. is a leading provider of mobile communications in Turkey with more than 34 million subscribers. Established in 1994, Turkcell created the first global system for a mobile communications (GSM) network in Turkey. It was the first Turkish company listed on the New York Stock Exchange.Communications fraud, or the  use of telecommunications products or services without intention to pay, is a major issue for the organization. The practice is fostered by prepaid card usage, which is growing rapidly. Anonymous network-branded prepaid cards are a tempting vehicle for money launderers, particularly since these cards can be used as cash vehicles—for example, to withdraw cash at ATMs. It is estimated that prepaid card fraud represents an average loss of US$5 per US$10,000 in transactions. For a communications company with billions of transactions, this could result in millions of dollars lost through fraud every year.Consequently, Turkcell wanted to combat communications fraud and money laundering by introducing advanced analytical solutions to monitor key parameters of prepaid card usage and issue alerts or block fraudulent activity. This type of fraud prevention would require extremely fast analysis of the company’s one petabyte of uncompressed customer data to identify patterns and relationships, build predictive models, and apply those models to even larger data volumes to make accurate fraud predictions. To achieve this, Turkcell deployed Oracle Exadata Database Machine X2-2 HC Full Rack, so that data analysts can build predictive antifraud models inside the Oracle Database and deploy them into Oracle Exadata for scoring, using Oracle Data Mining, a component of Oracle Advanced Analytics, leveraging Oracle Database11g technology. This enabled the company to create predictive antifraud models faster than with any other machine, as models can be built using search and query language (SQL) inside the database, and Oracle Exadata can access raw data without summarized tables, thereby achieving extremely fast analyses. Challenges A word from Turkcell İletişim Hizmetleri A.Ş. “Turkcell manages 100 terabytes of compressed data—or one petabyte of uncompressed raw data—on Oracle Exadata. With Oracle Data Mining, a component of the Oracle Advanced Analytics Option, we can analyze large volumes of customer data and call-data records easier and faster than with any other tool and rapidly detect and combat fraudulent phone use.” – Hasan Tonguç Yılmaz, Manager, Turkcell İletişim Hizmetleri A.Ş. Combat communications fraud and money laundering by introducing advanced analytical solutions to monitor prepaid card usage and alert or block suspicious activity Monitor numerous parameters for up to 10 billion daily call-data records and value-added service logs, including the number of accounts and cards per customer, number of card loads per day, number of account loads over time, and number of account loads on a subscriber identity module card at the same location Enable extremely fast sifting through huge data volumes to identify patterns and relationships, build predictive antifraud models, and apply those models to even larger data volumes to make accurate fraud predictions Detect fraud patterns as soon as possible and enable quick response to minimize the negative financial impact Solutions Oracle Product and Services Oracle Exadata Database Machine Oracle Advanced Analytics Used Oracle Exadata Database Machine X2-2 HC Full Rack to create predictive antifraud models more quickly than with previous solutions by accessing raw data without summarized tables and providing unmatched query speed, which optimizes and shortens the project design phases for creating predictive antifraud models Leveraged SQL for the preparation and transformation of one petabyte of uncompressed raw communications data, using Oracle Data Mining, a feature of Oracle Advanced Analytics to increase the performance of predictive antifraud models Deployed Oracle Data Mining models on Oracle Exadata to identify actionable information in less time than traditional methods—which would require moving large volumes of customer data to a third-party analytics software—and achieve an average gain of four hours and more, taking into consideration the absence of any system crash (as occurred in the previous environment) during data import Achieved extreme data analysis speed with in-database analytics performed inside Oracle Exadata, through a row-wise information search—including day, time, and duration of calls, as well as number of credit recharges on the same day or at the same location—and query language functions that enabled analysts to detect fraud patterns almost immediately Implemented a future-proof solution that could support rapidly growing data volumes that tend to double each year with Oracle Exadata’s massively scalable data warehouse performance Why Oracle “We selected Oracle because in-database mining to support antifraud efforts will be a major focus for Turkcell in the future. With Oracle Exadata Database Machine and the analytics capabilities of Oracle Advanced Analytics, we can complete antifraud analysis for large amounts of call-data records in just a few hours. Further, we can scale the solution as needed to support rapid communications data growth,” said Hasan Tonguç Yılmaz, datawarehouse/data mining developer, Turkcell Teknoloji Araştırma ve Geliştirme A.Ş. Partner Oracle Partner: Turkcell Teknoloji Araştırma ve Geliştirme A.Ş. All development and test processes were performed by Turkcell Teknoloji. The company also made significant contributions to the configuration of numerous technical analyses which are carried out regularly by Turkcell İletişim Hizmetleri's antifraud specialists. Resources Turkcell İletişim Hizmetleri Uses Engineered System to Analyze 10 Billion Daily, Call-Data Records and Service Logs and to Generate 100,000 Monthly Reports Turkcell Deploys Oracle Data Integrator to Drive Efficiency Turkcell Accelerates Reporting Tenfold, Saves on Storage and Energy Costs with Consolidated Oracle Exadata Platform Turkcell Superonline Transforms Its Order Management and Service Fulfillment with Oracle Communications Solutions Technologist of the Year Turkcell is an Exemplary Oracle Cross Stack Customer  Turkcell Gets Three 10X Improvements with Oracle Oracle Exadata Changes the Rules of the Game for Turkcell Customers Discuss Benefits of Oracle Exadata Turkcell Technology Uses Oracle Complex Event Processing for Extreme Scale Mobile Networks Turkcell Technology Research & Development Inc. Achieves Substantial Savings with Fault Prevention Turkcell iletisim Hizmetleri A.S. Processes Mobile Network Data of 33 Million Subscribers in Real Time Kcell Boosts Business Intelligence with Data Warehouse Solution Turkcell Gets the Benefits of Oracle Exadata  Turkcell Eliminates Manual Updates with Oracle IDM var _gaq = _gaq || []; _gaq.push(['_setAccount', 'UA-46756583-1']); _gaq.push(['_trackPageview']); (function() { var ga = document.createElement('script'); ga.type = 'text/javascript'; ga.async = true; ga.src = ('https:' == document.location.protocol ? 'https://ssl' : 'http://www') + '.google-analytics.com/ga.js'; var s = document.getElementsByTagName('script')[0]; s.parentNode.insertBefore(ga, s); })();

Turkcell İletişim Hizmetleri A.S. Successfully Combats Communications Fraud with Advanced In-Database Analytics [Original link available on oracle.com http://www.oracle.com/us/corporate/customers/custom...

Fraud and Anomaly Detection using Oracle Data Mining YouTube-like Video

I've created and recorded another YouTube-like presentation and "live" demos of Oracle Advanced Analytics Option, this time focusing on Fraud and Anomaly Detection using Oracle Data Mining.  [Note:  It is a large MP4 file that will open and play in place.  The sound quality is weak so you may need to turn up the volume.] Data is your most valuable asset. It represents the entire history of your organization and its interactions with your customers.  Predictive analytics leverages data to discover patterns, relationships and to help you even make informed predictions.   Oracle Data Mining (ODM) automatically discovers relationships hidden in data.  Predictive models and insights discovered with ODM address business problems such as:  predicting customer behavior, detecting fraud, analyzing market baskets, profiling and loyalty.  Oracle Data Mining, part of the Oracle Advanced Analytics (OAA) Option to the Oracle Database EE, embeds 12 high performance data mining algorithms in the SQL kernel of the Oracle Database. This eliminates data movement, delivers scalability and maintains security.  But, how do you find these very important needles or possibly fraudulent transactions and huge haystacks of data? Oracle Data Mining’s 1 Class Support Vector Machine algorithm is specifically designed to identify rare or anomalous records.  Oracle Data Mining's 1-Class SVM anomaly detection algorithm trains on what it believes to be considered “normal” records, build a descriptive and predictive model which can then be used to flags records that, on a multi-dimensional basis, appear to not fit in--or be different.  Combined with clustering techniques to sort transactions into more homogeneous sub-populations for more focused anomaly detection analysis and Oracle Business Intelligence, Enterprise Applications and/or real-time environments to "deploy" fraud detection, Oracle Data Mining delivers a powerful advanced analytical platform for solving important problems.  With OAA/ODM you can find suspicious expense report submissions, flag non-compliant tax submissions, fight fraud in healthcare claims and save huge amounts of money in fraudulent claims  and abuse.   This presentation and several brief demos will show Oracle Data Mining's fraud and anomaly detection capabilities.   var _gaq = _gaq || []; _gaq.push(['_setAccount', 'UA-46756583-1']); _gaq.push(['_trackPageview']); (function() { var ga = document.createElement('script'); ga.type = 'text/javascript'; ga.async = true; ga.src = ('https:' == document.location.protocol ? 'https://ssl' : 'http://www') + '.google-analytics.com/ga.js'; var s = document.getElementsByTagName('script')[0]; s.parentNode.insertBefore(ga, s); })();

I've created and recorded another YouTube-like presentation and "live" demos of Oracle Advanced Analytics Option, this time focusing on Fraud and Anomaly Detection using Oracle Data Mining.  [Note:  ...

Oracle Virtual SQL Developer Days DB May 15th - Session #3: 1Hr. Predictive Analytics and Data Mining Made Easy!

All, Oracle Data Mining's SQL Developer based ODM'r GUI + ODM is being featured in this upcoming Virtual SQL Developer Day online event next Tuesday, May 15th.  Several thousand people have already registered and registration is still growing.  We recorded and uploaded presentations/demos and then anyone can view them "on demand", but at the specified date/time per the SQL DD event agenda. Anyone can also download a complete 11gR2 Database w/ SQL Developer 3.1 & Oracle Data Miner GUI extension VM installation for the Hands-on Labs and follow our 4 ODM Oracle by Examples e-training.  We moderators monitor the online chat and answer questions.  Session #3: 1Hr. Predictive Analytics and Data Mining Made Easy! Oracle Data Mining, a component of the Oracle Advanced Analytics database option, embeds powerful data mining algorithms in the SQL kernel of the Oracle Database for problems such as customer churn, predicting customer behavior, up-sell and cross-sell, detecting fraud, market basket analysis (e.g. beer & diapers), customer profiling and customer loyalty. Oracle Data Miner, SQL Developer 3.1 extension, provides data analysts a “workflow” paradigm to build analytical methodologies to explore data and build, evaluate and apply data mining models—all while keeping the data inside the Oracle Database. This workshop will teach the student the basics of getting started using Oracle Data Mining. We're also included in the June 7th physical event in NYC and future virtual and physical events.  Great event(s) and great "viz" for OAA/ODM.  Charlie

All, Oracle Data Mining's SQL Developer based ODM'r GUI + ODM is being featured in this upcoming Virtual SQL Developer Day online event next Tuesday, May 15th.  Several thousand people have already...

Oracle Data Mining Virtual Classes Scheduled

Two Oracle Data Mining Virtual Classes are now scheduled.  Register for a course in 2 easy steps. Step 1: Select your Live Virtual Class options Select Live Virtual Class Course ID: D76362GC10 Course Title: Oracle Database 11g: Data Mining Techniques NEW Duration: 2 Days Price: US$ 1,300 Dollars Step 2: Select the date and location of your Live Virtual Class Please select a location below then click on the Add to Cart button i Location  Duration Class Date Class Start Time Class End Time Course Materials Instruction Language Seats Audience Employees Online 2 Days 09-Aug-2012 04:00 AM EDT 12:00 PM EDT English English Available Public Online 2 Days 18-Oct-2012 04:00 AM EDT 12:00 PM EDT English English Available Public 100% Student Satisfaction: Oracle's 100% Student Satisfaction program applies to those publicly scheduled and publicly available Oracle University Instructor Led Training classes that are identified as part of the 100% Student Satisfaction program on the http://www.oracle.com/education website at the time the class is purchased. Oracle will permit unsatisfied students to retake the class, subject to terms and conditions. Customers are not entitled to a refund. For more information and additional terms, conditions and restrictions that apply, click here

Two Oracle Data Mining Virtual Classes are now scheduled.  Register for a course in 2 easy steps. Step 1: Select your Live Virtual Class options Select Live Virtual Class Course ID: D76362GC10Course...

NEW 2-Day Instructor Led Course on Oracle Data Mining Now Available!

UPDATED - See the updated and new Learn Predictive Analytics using Oracle Data Mining 2-day Oracle University Course.A NEW 2-Day Instructor Led Course on Oracle Data Mining has been developed for customers and anyone wanting to learn more about data mining, predictive analytics and knowledge discovery inside the Oracle Database.  To register interest in attending the class, click here and submit your preferred format. Course Objectives: Explain basic data mining concepts and describe the benefits of predictive analysis Understand primary data mining tasks, and describe the key steps of a data mining process Use the Oracle Data Miner to build,evaluate, and apply multiple data mining models Use Oracle Data Mining's predictions and insights to address many kinds of business problems, including: Predict individual behavior, Predict values, Find co-occurring events Learn how to deploy data mining results for real-time access by end-users Five reasons why you should attend this 2 day Oracle Data Mining Oracle University course. With Oracle Data Mining, a component of the Oracle Advanced Analytics Option, you will learn to gain insight and foresight to: Go beyond simple BI and dashboards about the past. This course will teach you about "data mining" and "predictive analytics", analytical techniques that can provide huge competitive advantage Take advantage of your data and investment in Oracle technology Leverage all the data in your data warehouse, customer data, service data, sales data, customer comments and other unstructured data, point of sale (POS) data, to build and deploy predictive models throughout the enterprise. Learn how to explore and understand your data and find patterns and relationships that were previously hidden Focus on solving strategic challenges to the business, for example, targeting "best customers" with the right offer, identifying product bundles, detecting anomalies and potential fraud, finding natural customer segments and gaining customer insight.

UPDATED - See the updated and new Learn Predictive Analytics using Oracle Data Mining 2-day Oracle University Course. A NEW 2-Day Instructor Led Course on Oracle Data Mining has been developed for...

Oracle Announces Availability of Oracle Advanced Analytics for Big Data

Oracle Announces Availability of Oracle Advanced Analytics for Big Data Oracle Integrates R Statistical Programming Language into Oracle Database 11g REDWOOD SHORES, Calif. - February 8, 2012 News Facts Oracle today announced the availability of     Oracle Advanced Analytics, a new option for Oracle Database 11g that bundles Oracle R Enterprise together with Oracle Data Mining. Oracle R Enterprise delivers enterprise class performance for users of the R statistical programming language, increasing the scale of data that can be analyzed by orders of magnitude using Oracle Database 11g. R has attracted over two million users since its introduction in 1995, and Oracle R Enterprise dramatically advances capability for R users. Their existing R development skills, tools, and scripts can now also run transparently, and scale against data stored in Oracle Database 11g. Customer testing of Oracle R Enterprise for Big Data analytics on Oracle Exadata has shown up to 100x increase in performance in comparison to their current environment. Oracle Data Mining, now part of Oracle Advanced Analytics, helps enable customers to easily build and deploy predictive analytic applications that help deliver new insights into business performance. Oracle Advanced Analytics, in conjunction with Oracle Big Data Appliance, Oracle Exadata Database Machine and Oracle Exalytics In-Memory Machine, delivers the industry’s most integrated and comprehensive platform for Big Data analytics. Comprehensive In-Database Platform for Advanced Analytics Oracle Advanced Analytics brings analytic algorithms to data stored in Oracle Database 11g and Oracle Exadata as opposed to the traditional approach of extracting data to laptops or specialized servers. With Oracle Advanced Analytics, customers have a comprehensive platform for real-time analytic applications that deliver insight into key business subjects such as churn prediction, product recommendations, and fraud alerting. By providing direct and controlled access to data stored in Oracle Database 11g, customers can accelerate data analyst productivity while maintaining data security throughout the enterprise. Powered by decades of Oracle Database innovation, Oracle R Enterprise helps enable analysts to run a variety of sophisticated numerical techniques on billion row data sets in a matter of seconds making iterative, speed of thought, and high-quality numerical analysis on Big Data practical. Oracle R Enterprise drastically reduces the time to deploy models by eliminating the need to translate the models to other languages before they can be deployed in production. Oracle R Enterprise integrates the extensive set of Oracle Database data mining algorithms, analytics, and access to Oracle OLAP cubes into the R language for transparent use by R users. Oracle Data Mining provides an extensive set of in-database data mining algorithms that solve a wide range of business problems. These predictive models can be deployed in Oracle Database 11g and use Oracle Exadata Smart Scan to rapidly score huge volumes of data. The tight integration between R, Oracle Database 11g, and Hadoop enables R users to write one R script that can run in three different environments: a laptop running open source R, Hadoop running with Oracle Big Data Connectors, and Oracle Database 11g. Oracle provides single vendor support for the entire Big Data platform spanning the hardware stack, operating system, open source R, Oracle R Enterprise and Oracle Database 11g. To enable easy enterprise-wide Big Data analysis, results from Oracle Advanced Analytics can be viewed from Oracle Business Intelligence Foundation Suite and Oracle Exalytics In-Memory Machine. Supporting Quotes “Oracle is committed to meeting the challenges of Big Data analytics. By building upon the analytical depth of Oracle SQL, Oracle Data Mining and the R environment, Oracle is delivering a scalable and secure Big Data platform to help our customers solve the toughest analytics problems,” said Andrew Mendelsohn, senior vice president, Oracle Server Technologies. “We work with leading edge customers who rely on us to deliver better BI from their Oracle Databases. The new Oracle R Enterprise functionality allows us to perform deep analytics on Big Data stored in Oracle Databases. By leveraging R and its library of open source contributed CRAN packages combined with the power and scalability of Oracle Database 11g, we can now do that,” said Mark Rittman, co-founder, Rittman Mead. Supporting Resources About Oracle Big Data About Oracle Advanced Analytics About Oracle Big Data Appliance About Oracle Big Data Connectors About Oracle Exadata Database Machine About Oracle Exalytics In-Memory Machine About Oracle Business Intelligence Foundation Suite About Oracle Database 11g Connect with Oracle Database via Blog, Facebook and Twitter About Oracle Oracle engineers hardware and software to work together in the cloud and in your data center. For more information about Oracle (NASDAQ: ORCL), visit http://www.oracle.com. Trademarks Oracle and Java are registered trademarks of Oracle and/or its affiliates. Other names may be trademarks of their respective owners. Contact Info Eloy OntiverosOracle+1.650.607.6458eloy.ontiveros@oracle.com Joan LevyBlanc & Otus for Oracle+1.415.856.5110jlevy@blancandotus.com

Oracle Announces Availability of Oracle Advanced Analytics for Big Data Oracle Integrates R Statistical Programming Language into Oracle Database 11g REDWOOD SHORES, Calif. - February 8, 2012 News Facts Or...

Building Predictive Analytical Applications using Oracle Data Mining recorded webcast

I did a Building Predictive Analytical Applications using Oracle Data Mining recorded webcast for IOUG earlier this week.  If this interests you, you can either watch the streaming presentation and demo hosted by the Independent Oracle User Group at http://www.ioug.org/ in conjunction with the Oracle Business Intelligence, Wartehousing and Analytics Special Interest Group (BIWA SIG at www.oraclebiwa.org) or the download the 84 MB file by clicking on this link to the Building Predictive Analytical Applications using Oracle Data Mining.wmv file.   It included an overview of data mining, Oracle Data Mining, some demo slides, and then several example applications where we've factory-installed ODM predictive analytics methodologies into the Appls for self-learning and real-time deployment of ODM models.  Example Predictive Analytics Applications (partial list)Oracle Communications  & Retail Industry Models —factory installed data mining for specific industries Oracle Spend Classification Oracle Fusion Human Capital Management (HCM)  Predictive Workforce Oracle Fusion Customer Relationship Management (CRM) Sales Prediction Oracle Adaptive Access Manager real-time Security Oracle Complex Event Processing integrated with ODM models Predictive Incident Monitoring Service for Oracle Database customers Pretty cool stuff if you or your customers are interested in analytics.  Here's the link to the ppt slides.    

I did a Building Predictive Analytical Applications using Oracle Data Mining recorded webcast for IOUG earlier this week.  If this interests you, you can either watch the streaming presentation and...

SAIL-WORLD article - America's Cup: Oracle Data Mining supports crew and BMW ORACLE Racing

Originally printed at http://www.sail-world.com/UK/Americas-Cup:-Oracle-Data-Mining-supports-crew-and-BMW-ORACLE-Racing/68834America's Cup: Oracle Data Mining supports crew and BMW ORACLE Racing   'USA-17 on her way to winning the 33rd America’s Cup, use of Oracle’s datamining technology and Oracle Database 11g and Oracle Application Express'    BMW Oracle Racing © Photo Gilles Martin-Raget    Click Here to view large photo BMW ORACLE Racing won the 33rd America’s Cup yacht race in February 2010, beating the Swiss team, Alinghi, decisively in the first two races of the best-of-three contest.BMW ORACLE Racing’s victory in the America’s Cup challenge was a lesson in sailing skill, as one of the world’s most experienced crews reached speeds as fast as 30 knots. But if you listen to the crew in their postrace interviews, you’ll notice that what they talk about is technology. The wrist PDA displays worn by five of the USA-17 crew - where they could read actual and predictive data fed back from the onboard systems -  .. .   Click Here to view large photo 'The story of this race is in the technology,' says Ian Burns, design coordinator for BMW ORACLE Racing.From the drag-resistant materials encasing its hulls to its unprecedented 223-foot wing sail, the BMW ORACLE Racing’s trimaran, named USA, is a one-of-a-kind technological juggernaut. No less impressive are the electronics used to guide the vessel and fine-tune its performance.Each crewmember is equipped with a PDA on his wrist that has customized data for his job: what the load balance is on a particular rope, for example, or the current aerodynamic performance of the wing sail. The helmsman’s sunglasses display graphical and numeric data to help him fine-tune the boat’s direction while he keeps two hands on the wheel and visually scans the sea, the boat, the crew, the sails, and the wing.The America’s Cup is a challenge-based competition in which the winning yacht club hosts the next event and, within certain guidelines, makes the rules. For the 33rd America’s Cup, the competing teams could not agree on a set of rules, so the event defaulted to an unrestricted format for boat design and cost.'All we knew were the length of the boat and the course configuration,' says Burns. The boats were allowed a maximum length of 90 feet, and the course would be 20 miles out windward and 20 miles back. 'Within those parameters,' says Burns, 'you could build as fast a thing as you can think of.'Learning by DataThe no-holds-barred rules for this race created what Burns calls an 'open playground' for boat designers. The innovative and costly vessels that resulted were one-of-a-kind creations with unpredictable sailing characteristics that would require a steep learning curve and lots of data. 33rd America’s Cup - BMW ORACLE Racing - Training in Valencia - collecting data via 250 sensors, managing it and analysing it were handled on the yacht, on the tender and ashore in Valencia and in the Austin Data Centre, USA. -  BMW Oracle Racing © Photo Gilles Martin-Raget   Click Here to view large photo 'One of the problems we faced at the outset was that we needed really high accuracy in our data because we didn’t have two boats,' says Burns. 'Generally, most teams have two boats, and they sail them side by side. Change one thing on one boat, and it’s fairly easy to see the effect of a change with your own eyes.'With only one boat, BMW ORACLE Racing’s performance analysis had to be done numerically by comparing data sets. To get the information needed, says Burns, the team had to increase the amount of data collected by nearly 40 times what they had done in the past.The USA holds 250 sensors to collect raw data: pressure sensors on the wing; angle sensors on the adjustable trailing edge of the wing sail to monitor the effectiveness of each adjustment, allowing the crew to ascertain the amount of lift it’s generating; and fiber-optic strain sensors on the mast and wing to allow maximum thrust without overbending them. 33rd America’s Cup - BMW ORACLE Racing - Day 1 - The difference between the wingsail and softsail is evident - even though the softsail has more area -  BMW Oracle Racing: Guilain Grenier   Click Here to view large photo But collecting data was only the beginning. BMW ORACLE Racing also had to manage that data, analyze it, and present useful results. The team turned to Oracle Data Mining in Oracle Database 11g.Peter Stengard, a principal software engineer for Oracle Data Mining and an amateur sailor, became the liaison between the database technology team and BMW ORACLE Racing. 'Ian Burns contacted us and explained that they were interested in better understanding the performance-driving parameters of their new boat,' says Stengard. 'They were measuring an incredible number of parameters across the trimaran, collected 10 times per second, so there were vast amounts of data available for analysis. An hour of sailing generates 90 million data points.'After each day of sailing the boat, Burns and his team would meet to review and share raw data with crewmembers or boat-building vendors using a Web application built with Oracle Application Express. 'Someone in the meeting would say, 'Wouldn’t it be great if we could look at some new combination of numbers?’ and we could quickly build an Oracle Application Express application and share the information during the same meeting,' says Burns. Later, the data would be streamed to Oracle’s Austin Data Center, where Stengard and his team would go to work on deeper analysis. BMW Oracle USA-17 powers thru Alinghi - America’s Cup 2010 Race 1 -  BMW Oracle Racing © Photo Gilles Martin-Raget   Click Here to view large photo Because BMW ORACLE Racing was already collecting its data in an Oracle database, Stengard and his team didn’t have to do any extract, transform, and load (ETL) processes or data conversion. 'We could just start tackling the analytics problem right away,' says Stengard. 'We used Oracle Data Mining, which is in Oracle Database. It gives us many advanced data mining algorithms to work with, so we have freedom in how we approach any specific task.'Using the algorithms in Oracle Data Mining, Stengard could help Burns and his team learn new things about how their boat was working in its environment. 'We would look, for example, at mast rotations—which rotation works best for certain wind conditions,' says Stengard. 'There were often complex relationships within the data that could be used to model the effect on the target—in this case something called velocity made good, or VMG. Finding these relationships is what the racing team was interested in.' BMW Oracle Racing Technology team -  Richard Gladwell   Click Here to view large photo Stengard and his team could also look at data over time and with an attribute selection algorithm to determine which sensors provided the most-useful information for their analysis. 'We could identify sensors that didn’t seem to be providing the predictive power they were looking for so they could change the sensor location or add sensors to another part of the boat,' Stengard says.Burns agrees that without the data mining, they couldn’t have made the boat run as fast. 'The design of the boat was important, but once you’ve got it designed, the whole race is down to how the guys can use it,' he says. 'With Oracle database technology, we could compare our performance from the first day of sailing to the very last day of sailing, with incremental improvements the whole way through. With data mining we could check data against the things we saw, and we could find things that weren’t otherwise easily observable and findable.' BMW Oracle Racing made 4000 data measurements 10 times a second -  BMW Oracle Racing: Guilain Grenier   Click Here to view large photo Flying by DataThe greatest challenge of this America’s Cup, according to Burns, was managing the wing sail, which had been built on an unprecedented scale. 'It is truly a massive piece of architecture,' Burns says. 'It’s 20 stories high; it barely fits under the Golden Gate Bridge. It’s an amazing thing to see.'The wing sail is made of an aeronautical fabric stretched over a carbon fiber frame, giving it the three-dimensional shape of a regular airplane wing. Like an airplane wing, it has a fixed leading edge and an adjustable trailing edge, which allows the crew to change the shape of the sail during the course of a race. Oracle wing under maintenance - standing 70 metres high it is the longest wing ever build for a plane or yacht -  Jean Philippe Jobé   Click Here to view large photo Next Steps'The crew of the USA was the best group of sailors in the world, but they were used to working with sails,' says Burns, 'Then we put them under a wing. Our chief designer, Mike Drummond, told them an airline pilot doesn’t look out the window when he’s flying the plane; he looks at his instruments, and you guys have to do the same thing.'A second ship, known as the performance tender, accompanied the USA on the water. The tender served in part as a floating datacenter and was connected to the USA by wireless LAN. USA-17 about to round the windward mark, Race 1, 33rd America’s Cup. Under performing senors on the boat were moved to provide better information. -  Richard Gladwell  'The USA generates almost 4,000 variables 10 times a second,' says Burns. 'Sometimes the analysis requires a very complicated combination of 10, 20, or 30 variables fitted through a time-based algorithm to give us predictions on what will happen in the next few seconds, or minutes, or even hours in terms of weather analysis.'Like the deeper analysis that Stengard does back at the Austin Data Center, this real-time data management and near-real-time data analysis was done in Oracle Database 11g. 'We could download the data to servers on the tender ship, do some quick analysis, and feed it right back to the USA,' says Burns.'We started to do better when the guys began using the instruments,' Burns says. 'Then we started to make small adjustments against the predictions and started to get improvements, and every day we were making gains.'Those gains were incremental and data driven, and they accumulated over years—until the USA could sail at three times the wind speed. Ian Burns is still amazed by the spectacle.'It’s an awesome thing to watch,' he says. 'Even with all we have learned, I don’t think we have met the performance limits of that beautiful wing.' USA-17 pursues Alinghi 5 - Race 1, 33rd America’s Cup, Valencia. Her crew flew her off the instruments 'a pilot doesn’t fly a plane by looking out the window'. -  BMW Oracle Racing: Guilain Grenier   Click Here to view large photo Read more about Oracle Data MiningHear a podcast interview with Ian BurnsDownload Oracle Database 11g Release 2Story republished from: www.oracle.com/technology/oramag/oracle/10-may/o30racing.htmlby Jeff Erickson Share   11:41 PM Sat 24 Apr 2010 GMT

Originally printed at http://www.sail-world.com/UK/Americas-Cup:-Oracle-Data-Mining-supports-crew-and-BMW-ORACLE-Racing/68834 America's Cup: Oracle Data Mining supports crew and BMW ORACLE Racing   ]]]]> ]]> 'USA-...

Oracle Fusion Human Capital Management Application uses Oracle Data Mining for Workforce Predictive Analytics

Oracle's new Fusion Human Capital Management (HCM) Application now embeds predictive analytic models automatically generated by Oracle Data Mining to enrich dashboards and manager's portals with predictions about the likelihood that an employee with voluntarily leave the organization and a prediction about the employee's likely future performance. Armed with this new information that is based on historical patterns and relationships found by Oracle Data Mining, enterprises can more proactively manage their valuable employee assets and better compete. The integrated Oracle Fusion HCM Application requires the Oracle Data Mining Option to the Oracle Database. With custom predictive models generated using the customer's own data, Oracle Fusion HCM enables managers to better understand the employees, understand the key factors for each individual and even perform "What if?" analysis to see the likely impact on an employee by adjusting a critical HR factor e.g. bonus, vacation time, amount of travel, etc. Excerpting from the Oracle Fusion HCM website and collateral: "Every day organizations struggle to answer essential questions about their workforce. How much money are we losing by not having the right talent in place and how is that impacting current projects? What skills will we need in the next 5 years that we don’t have today? How will business be impacted by impending retirements and are we prepared? Fragmented systems and bolt-on analytics are only some of the barriers that HR faces today. The consequences include missed opportunities, lost productivity, attrition, and uncontrolled operational costs. To address these challenges, Oracle Fusion Human Capital Management (HCM)puts information at your fingertips, helps you predict future trends, and enables you to turn insight into action. You will eliminate unnecessary costs, increase workforce productivity and retention, and gain a strategic advantage over your competition. Oracle Fusion HCM has been designed from the ground up so that you can work naturally and intuitively with analytics woven right into the fabric of your business processes."  This exceprt from the Solution Brief http://www.oracle.com/us/products/applications/fusion/fusion-hcm-know-your-people-356192.pdf describes the Predictive Analytics features and benefits: "Every day organizations struggle to answer essential questions about their workforce. How much money are we losing by not having the right talent in place and how is that impacting current projects? What skills will we need in the next 5 years that we don’t have today? How will business be impacted by impending retirements and are we prepared? Fragmented systems and bolt-on analytics are only some of the barriers that HR faces today. The consequences include missed opportunities, lost productivity, attrition, and uncontrolled operational costs. To address these challenges, Oracle Fusion Human Capital Management (HCM) puts information at your fingertips, helps you predict future trends, and enables you to turn insight into action. You will eliminate unnecessary costs, increase workforce productivity and retention, and gain a strategic advantage over your competition. Oracle Fusion HCM has been designed from the ground up so that you can work naturally and intuitively with analytics woven right into the fabric of your business processes." .... "Predictive Analysis Imagine if you could look ahead and be prepared for upcoming workforce trends. Most organizations do not have the analytic capability to do predictive human capital analysis, yet the worker information needed to make educated forecasts already exists today. Aging populations, shifting demographics, rising and falling economies, and multi-generational issues can have a significant impact on workforce decisions – for employees, managers and HR professionals. Not being able to accurately predict how all the moving parts fit together, and where you really have potential problems, can make or break an organization. Oracle Fusion HCM gives you the ability to finally see into the future, analyzing worker performance potential, risk of attrition, and enabling what-if analysis on ways to improve your workforce. Additionally, modeling capabilities provide you with extra power to bring together information from sources unthinkable in the past. For example, imagine understanding which recruiting agencies are providing the highest-quality recruits by comparing first year performance ratings with sources of hire. Being able to see potential problems before they occur and take immediate action will increase morale, save money, and boost your competitive edge. Result: You are able to look ahead and be prepared for upcoming workforce trends." There is a great demo of Oracle Fusion HCM Workforce Predictive Analytics that highlights the Oracle Data Mining.  This is one of the latest examples of Applications "powered by Oracle Data Mining". When you change your paradigm and move the algorithms to the data rather than the traditional approach of extracting the data and moving it to the algorithms for analysis, it CHANGES EVERYTHING. Keep watching for additional Applications powered by Oracle's in-database advanced analytics.

Oracle's new Fusion Human Capital Management (HCM) Application now embeds predictive analytic models automatically generated by Oracle Data Mining to enrich dashboards and manager's portals with...

Recorded Oracle Data Mining Demo and Presentation

I recently delivered an Oracle Data Mining demo and presentation that was recorded for Bio-IT.   You may have to register with some simple identification information, but you will be able to see and hear the full Oracle Data Mining demo and presentation.  See link http://www.inetpresent.com/w/Oracle/134/reg/ and details below.  REGISTER BELOW TO VIEW A REPLAY OF THIS LIVE WEBCAST!Enabling Better Data Relationships Utilizing Oracle 11g:  New Oracle Data Miner GUI, and Applications "Powered by ODM"IT departments of all sizes increasingly rely on self-managing systems to help overcome challenges with lower cost and risk. By using embedded Oracle technology as transparent building blocks in applications or devices, ISV and OEM solution developers can offer life sciences robust data management capabilities. Software developers viewing this live webinar can learn about:· How Oracle embeddable products make it easier to develop, manage, and deploy secure, reliable and scalable customer solutions. · Ways these solutions are being used to provide long-term return on investment for both developers, and users.· How partnering with Oracle can help satisfy critical research needs in the life sciences community while expanding your market and building a leadership position for your organization. Oracle Database technologies and applications are running in all the top 20 life sciences companies as well as the top 10 medical device companies. Oracle Data Mining (ODM) automatically discovers relationships hidden in data. Users viewing this live webinar can learn how predictive models and insights discovered with ODM address life sciences, healthcare and business problems such as: · Finding genes more correlated with a disease · Predicting customer behavior · Detecting medical fraud · Analyzing market baskets · Profiling targeted patients Oracle Data Mining embeds 12 traditional and cutting edge algorithms (Clustering, Decision Trees, Naïve Bayes, Support Vector Machines, "text mining", etc! .) in the kernel of the Oracle Database. This eliminates data movement, delivers scalability and maintains security. ODM can mine star schema data, models are first-class objects and in Exadata configurations, ODM models are "scored" at the hardware layer for 2-5x performance gains. ODM results can be accessed by OBI EE or any dashboard or application to automate and deploy advanced analytical methodologies that can be easily integrated into predictive applications.This webcast includes an overview presentation of ODM, a demonstration of the new SQL Developer 3.0/ ODMr GUI and several example predictive applications "Powered by Oracle Data Mining.Who should view this presentation? ISV and OEM manufacturers who build solutions for the Life Sciences Industry. Plus, end-users interested in learning more about how these embedded solutions can help them better utilize scientific data within their organizations.

I recently delivered an Oracle Data Mining demo and presentation that was recorded for Bio-IT.   You may have to register with some simple identification information, but you will be able to see and...

How to Deploy an Oracle Data Miner 11gR2 Workflow Model

You've been using Oracle Data Miner have explored your data, built and evaluated a few ODM models and now you would like to automate and or deploy your model(s).  There are several options from the simple using the SQL Developer 3.0 with Oracle Data Miner extension GUI to the complex using the Oracle Data Mining APIs.  Here are a few suggestions:1.  Using the SQL Developer 3.0/Oracle Data Miner GUI:  First, any Oracle Data Miner workflow can be saved and shared with others for the simplist form of deployment.  Assuming that you have already built an ODM model, you can also right-mouse-click (rmc) on the Apply node and it offers options to deploy the code for that node as SQL scripts, etc.  See screen shot.2.  Data Right mouse clicking on Data Transform node flows, will allow you to Create Lineage of all the upstream SQL steps which is useful for deploying your analytical methodology.  I'm copying and pasting a bit from the ODM'r Help below.DeploySave SQL or SQL script for apply, create table, explore data, or model details. You can either copy to the Microsoft Windows Clipboard or save to a file. Deploy provides this choices: SQL to Clipboard SQL to File SQL script to Clipboard SQL script to File The SQL that is saved consists of SQL generated by the current node and all of its parents that are data providers. The lineage ends when it encounters nodes that represent persisted items such as tables or models. See Executing Scripts for information about how to run the generated SQL.Executing Scripts If you run generated scripts using either SQL*Plus or SQL Worksheet and you prompt users for input, this command before the generated SQL: set define offYou can either execute this new line separately before you execute the generated SQL or you can generate the new line along with the generated SQL.  2.  Using the Oracle Data Mining APIs:  As the ODM models are 1st class objects of the Database, you don't need to install the ODMr repository on the production machines.  Instead, you can export the model(s) and associated tranforms to another instance and run them there.See Oracle® Data Mining Application Developer's Guide 11g Release 2 (11.2) Part Number E12218-05 documentation which further explains Scoring and DeploymentSee Also:  Oracle Data Mining Administrator's Guide for information about exporting and importing data mining models which details for example:Using Oracle Data PumpUsing EXPORT_MODEL and IMPORT_MODELDatabase Privileges for Export/ImportDirectory Object for EXPORT_MODEL and IMPORT_MODELTables Created By EXPORT_MODEL and IMPORT_MODELTablespace for IMPORT_MODELImporting From PMMLThere is some discussion of deploying models in this Oracle Data Mining 11g Release 2 Mining Star Schemas A Telco Churn Case Study white paper.   We are considering enhancing the Oralce Data Miner deployment capabiltiies as in the next release and are considering a range of possibilities including from the simple idea of adding a Scheduler Setting to the work flow object.  If you have ideas/suggestions, please forward them along (charlie.berger@oracle.com)

You've been using Oracle Data Miner have explored your data, built and evaluated a few ODM models and now you would like to automate and or deploy your model(s).  There are several options from the...

OpenWorld 2011 Call for Papers: Deadline March 27

OpenWorld 2011 is now open for the public to submit session proposals, especially submissions that include Oracle Data Mining, the new SQL Developer 3.0/Oracle Data Miner GUI, Exadata, Oracle Business Intelligence EE (OBIEE) and novel and interesting use cases that highlight the Oracle "red stack" better together.Oracle customers and partners are encouraged to submit proposals to present at this year's Oracle OpenWorld conference, which will be held October 2-6, 2011 at the Moscone Center in San Francisco. Details and submission guidelines are available on the Oracle OpenWorld Call for Papers web site. The deadline for submissions is Sunday, March 27 2011 at 11:59 pm PDT. Please share the information provided below with your friends and contacts.General InformationConference Location: Moscone Convention Center, San Francisco, CA. Conference Date: Sunday - Thursday, October 2 - 6, 2011 Conference Website: http://www.oracle.com/us/openworld CFP Website: https://oracleus.wingateweb.com/portal/cfp/ Paper submission key dates: Deliverables Due DatesCall for Papers Begins Wednesday, March 9Call for Papers Ends Sunday, March 27 - 11:59 pm PDTNotifications for Accepted and Declined Submissions Sent End of MayQuestions regarding the Call for Papers, send an email to speaker-services_ww@oracle.com

OpenWorld 2011 is now open for the public to submit session proposals, especially submissions that include Oracle Data Mining, the new SQL Developer 3.0/Oracle Data Miner GUI, Exadata,...

Oracle Data Mining a Star Schema: Telco Churn Case Study

There is a complete and detailed Telco Churn case study "How to" Blog Series just posted by Ari Mozes, ODM Dev. Manager.  In it, Ari provides detailed guidance in how to leverage various strengths of Oracle Data Mining including the ability to: mine Star Schemas and join tables and views together to obtain a complete 360 degree view of a customercombine transactional data e.g. call record detail (CDR) data, etc.define complex data transformation, model build and model deploy analytical methodologies inside the Database  His blog is posted in a multi-part series.  Below are some opening excerpts for the first 3 blog entries.  This is an excellent resource for any novice to skilled data miner who wants to gain competitive advantage by mining their data inside the Oracle Database.  Many thanks Ari!Mining a Star Schema: Telco Churn Case Study (1 of 3) One of the strengths of Oracle Data Mining is the ability to mine star schemas with minimal effort.  Star schemas are commonly used in relational databases, and they often contain rich data with interesting patterns.  While dimension tables may contain interesting demographics, fact tables will often contain user behavior, such as phone usage or purchase patterns.  Both of these aspects - demographics and usage patterns - can provide insight into behavior.Churn is a critical problem in the telecommunications industry, and companies go to great lengths to reduce the churn of their customer base.  One case study1 describes a telecommunications scenario involving understanding, and identification of, churn, where the underlying data is present in a star schema.  That case study is a good example for demonstrating just how natural it is for Oracle Data Mining to analyze a star schema, so it will be used as the basis for this series of posts......Mining a Star Schema: Telco Churn Case Study (2 of 3) This post will follow the transformation steps as described in the case study, but will use Oracle SQL as the means for preparing data.  Please see the previous post for background material, including links to the case study and to scripts that can be used to replicate the stages in these posts.1) Handling missing values for call data recordsThe CDR_T table records the number of phone minutes used by a customer per month and per call type (tariff).  For example, the table may contain one record corresponding to the number of peak (call type) minutes in January for a specific customer, and another record associated with international calls in March for the same customer.  This table is likely to be fairly dense (most type-month combinations for a given customer will be present) due to the coarse level of aggregation, but there may be some missing values.  Missing entries may occur for a number of reasons: the customer made no calls of a particular type in a particular month, the customer switched providers during the timeframe, or perhaps there is a data entry problem.  In the first situation, the correct interpretation of a missing entry would be to assume that the number of minutes for the type-month combination is zero.  In the other situations, it is not appropriate to assume zero, but rather derive some representative value to replace the missing entries.  The referenced case study takes the latter approach.  The data is segmented by customer and call type, and within a given customer-call type combination, an average number of minutes is computed and used as a replacement value.In SQL, we need to generate additional rows for the missing entries and populate those rows with appropriate values.  To generate the missing rows, Oracle's partition outer join feature is a perfect fit.  select cust_id, cdre.tariff, cdre.month, minsfrom cdr_t cdr partition by (cust_id) right outer join     (select distinct tariff, month from cdr_t) cdre     on (cdr.month = cdre.month and cdr.tariff = cdre.tariff); .......Mining a Star Schema: Telco Churn Case Study (3 of 3) Now that the "difficult" work is complete - preparing the data - we can move to building a predictive model to help identify and understand churn.The case study suggests that separate models be built for different customer segments (high, medium, low, and very low value customer groups).  To reduce the data to a single segment, a filter can be applied:create or replace view churn_data_high asselect * from churn_prep where value_band = 'HIGH';It is simple to take a quick look at the predictive aspects of the data on a univariate basis.  While this does not capture the more complex multi-variate effects as would occur with the full-blown data mining algorithms, it can give a quick feel as to the predictive aspects of the data as well as validate the data preparation steps.  Oracle Data Mining includes a predictive analytics package which enables quick analysis.begin  dbms_predictive_analytics.explain(   'churn_data_high','churn_m6','expl_churn_tab');end;/select * from expl_churn_tab where rank <= 5 order by rank;ATTRIBUTE_NAME       ATTRIBUTE_SUBNAME EXPLANATORY_VALUE RANK-------------------- ----------------- ----------------- ----------LOS_BAND                                      .069167052          1MINS_PER_TARIFF_MON  PEAK-5                   .034881648          2REV_PER_MON          REV-5                    .034527798          3DROPPED_CALLS                                 .028110322          4MINS_PER_TARIFF_MON  PEAK-4                   .024698149          5From the above results, it is clear that some predictors do contain information to help identify churn (explanatory value > 0).  The strongest uni-variate predictor of churn appears to be the customer's (binned) length of service.  The second strongest churn indicator appears to be the number of peak minutes used in the most recent month.  The subname column contains the interior piece of the DM_NESTED_NUMERICALS column described in the previous post.  By using the object relational approach, many related predictors are included within a single top-level column...... NOTE:  These are just EXCERPTS.  Click here to start reading the Oracle Data Mining a Star Schema: Telco Churn Case Study from the beginning.  

There is a complete and detailed Telco Churn case study "How to" Blog Series just posted by Ari Mozes, ODM Dev. Manager.  In it, Ari provides detailed guidance in how to leverage various strengths of...

Effective Hypertensive Treatment using Oracle Data Mining in Saudi Arabia

A new paper referecing the use of Oracle Data Mining studying the effective use of hypertensive treatment in Saudia Arabia has recently been published.  See link and abstract below.  Journal of Clinical Monitoring and ComputingDOI: 10.1007/s10877-010-9260-2  Springer 2010.ABSTRACT. In the present investigation, the data sets of NCD (Non Communicable Diseases) risk factors, a standard report of Saudi Arabia 2005, in collaboration with WHO (World Health Organisation) were employed. The Oracle Data Miner (ODM) tool was used for the analysis and prediction of data. The data sets for different age groups in case of blood pressure treatment for hypertension for male using different modes had been studied. The age group was in between of 15 and 64 years. Data mining had been an appropriate and sufficiently sensitive method to analyze the outcomes of which mode of treatment is more effective to which age group. The five age group of NCD data had been put into two age groups of young and old denoted as 'Y' and 'O' respectively. Data mining showed that all the five modes of treatments were effective for older people due to hypertension at older ages.Effective hypertensive treatment using data mining in Saudi Arabia .pdf

A new paper referecing the use of Oracle Data Mining studying the effective use of hypertensive treatment in Saudia Arabia has recently been published.  See link and abstract below.  Journal of...

The Meaning of Probability

This entry is based on material from the book "Howthe Mind Works" by Steven Pinker.Yogi Berra famously said that "it is hard to make predictions, especially about the future". However, as Pinker says, "in a world with any regularities at all, decisions informed by the past are better than decisions made at random".  If the above statement seems like a truism, then why do people who follow this advice so often appear to flout the elementary cannons of probability theory?Why do basketball fans believe in players having a "hot hand"? Their strings of hits and misses are indistinguishable from coin flips. Why do people feel that if a roulette wheel has stopped at black six times in a row, it's due to stop at red? The wheel has no memory and every spin is independent.On the other hand, let's imagine you are on vacation. It has been raining all week; your vacation is fast approaching its end; everyone is unhappy; everyone blames you for the weather; everyone wants to pack and go home. This is when you decide to boldly predict sunny weather for the remaining two days of your stay.  Are you nuts? As far as basic probability theory is concerned, you are. This situation is no different than the "hot hand" or roulette examples given above. They all represent  the same notorious class of problems called the "gambler's fallacy": expecting that a run of heads increases the chance of a tail. Basic probability theory tells you, that the coin, basketball hit, roulette wheel, as well as weather (at least as far as your knowledge of weather prediction is concerned), has no memory and no desire to be fair. Hence, if you bet money on a "hot player" or on red at the next spin of the roulette wheel, you are very likely to lose.  How about your very bold and seemingly mindless weather prediction? Well, next day the Sun appears.  All is forgotten; your kids are happy; your wife spends more time with you than with her book. Life is good. So, were you merely lucky, despite ignoring the basic tenets of probability theory, or did you actually truly make a decision informed by the past? How about the latter? As Pinker remarks, "rain clouds aren't removed from the sky at day's end and replaced with new ones the next morning.  A cloud cover has some average size, speed, and direction, and it should not be a surprise that a week of clouds should predict that the trailing edge was near and the sun was about to show up, just as the hundredth railroad car on a passing train portends the caboose with greater likelihood than the third car". Many events work like that. They have a characteristic life history and a changing probability of occurring over time (which statisticians call a hazard function)".The above examples also illustrate the different meanings of probability. One is the single event probability. Pinker observes that, "the probability that a penny will land heads is 0.5 would mean that on a scale of 0 to 1, your confidence the next flip will be heads is halfway between certainty that it will happen and certainty that it won't. Another one is relative frequency in the long run.  The probability that a penny will land heads is 0.5 would mean that in a hundred coin flips, fifty will be heads".As Pinker notices, "numbers referring to the probability of a single event, which only makes sense as estimates of subjective confidence", are common today. For example, there is a 70% percent chance my flight will be leaving on time tomorrow or that the odds of the Red Sox winning tonight are three to two. Says Pinker:The interesting question is: what does the probability of a single event even mean? A colleague tells me that there is a ninety-five percent chance he will show up at a meeting tomorrow. He doesn't come. Was he lying?You may be thinking: granted, a single-event probability is just subjective confidence, but isn't it rational to calibrate confidence by relative frequency? Ah, but the relative frequency of what? To count frequencies you have to decide on a class of events to count up, and a single event belongs to an infinite number of classes.Richard von Mises, a probability theorist, gives an example.In a sample of American women between the ages of 35 and 50, 4 out of 100 develop breast cancer within a year. Does Mrs. Smith, a 49-year old American women, therefore have a 4% chance of getting cancer in the next year? There is no answer. Suppose that in a sample of women between the ages of 45 and 90 - a class to which Mrs. Smith also belongs - 11 out of 100 develop breast cancer in a year. Are Mrs. Smith's chances 4%, or are they 11%? Suppose that her mother had breast cancer, and 22 out 100 women between 45 and 90 whose mother had the disease will develop it. Are her chances 4%, 11%, or 22%? She also smokes, lives in California, had two children before the age of 25 and one after 40, is of Greek descent ... What group should we compare her with to figure out the "true" odds? You might think, the more specific the class, the better - but the more specific the class, the smaller its size and the less reliable the frequency. If there were only two people in the world very much like Mrs. Smith, and one developed breast cancer, would anyone say that Mrs. Smith's chances are 50%? In the limit, the only class that is truly comparable with Mrs. Smith in all her details is the class containing Mrs. Smith herself. But in a class of one, "relative frequency" makes no sense.As Pinker observes:These philosophical questions about the meaning of probability are not purely academic; they affect every decision we make. During the murder trial of O.J. Simpson in 1995, the lawyer Alan Dershowitz, said on television that among men who batter their wives, only one-tenth of one percent go on to murder them. A statistician then pointed out that among men who batter their wives and whose wives are then murdered by someone, more than half are the murderers.Another interesting element of the concept of probability is the belief in a stable world. As Pinker notices, "a probabilistic inference is a prediction today based on frequencies gathered yesterday. But that was then, this is now. How do you know that the world hasn't changed in the interim?" Again, the question is not of a purely academic or philosophical character. Think of the following situations, as Pinker describes it:A person avoids buying a car after hearing that a neighbor's model broke down yesterday. Another person avoids letting his child play in the river with no previous fatalities after hearing that a neighbor's child was attacked there by a crocodile that morning.  The difference between the scenarios (aside from the drastic consequences) is that we judge that the car world is stable, in the US this was a true until last year, so the old statistics apply, but the river world has changed, so the old statistics are moot.This just goes to prove that probability is a very tricky and complex concept. Every time one thinks one understands it, a new wrinkle appears. It is like the old joke about intelligence - as Pinker quotes it, " the average man's IQ is 107, the average trout's IQ is 4. So why can't a man catch a trout"

This entry is based on material from the book "How the Mind Works" by Steven Pinker. Yogi Berra famously said that "it is hard to make predictions, especially about the future". However, as Pinker...

Oracle Data Miner 11g Release 2 Update: Now Extension to SQL Developer

News:  The Oracle Data Miner 11g Release 2 New "Work Flow" GUI is now being packaged as an Extension to SQL Developer and will be available to external customers as part of the SQL Dev. 3.0 next release Early Adopter program.  SQL Developers will be able to access Oracle Data Miner's data mining GUI from within the familiar SQL Developer environments.  This tight integration will provide a number of significant advantages for data analyst, developers and DBAs including:Everything - Data access, SQL querying and data transformations and Data Mining functionality all in a complete, unified environment - inside the Oracle DatabaseElimination of data movement, loss of security, and information latency to extract data to traditional external data analysis servers e.g. SAS, SPSS.Ability to create and deploy complex predictive analytics methodolgies within the Oracle SQL Developer environmentAbility to Check for Updates and get the latest version of Oracle Data Miner 11g Release 2 GUIAccess to Oracle By Examples (OBE) posted on OTNStayed tuned to Oracle SQL Developer on the Oracle Technology Network (OTN) and Oracle Data Mining on OTN web site and this for updates and more information. Sample Oracle Data Miner 11g Release 2 New "Work Flow" GUI screen shots:       Explore Relationships nodeView image 

News:  The Oracle Data Miner 11g Release 2 New "Work Flow" GUI is now being packaged as an Extension to SQL Developer and will be available to external customers as part of the SQL Dev. 3.0 next...

To sample or not to sample... Part 4

This post continues "To sample or notto sample..." Part1, Part2, and Part3.  In Part 1, we looked at the generalmotivation for sampling. In Part 2, we looked at simple randomsampling without replacement, the most common sampling technique.In Part 3, we looked at stratified sampling, which helps to ensure arepresentative sample. In this post, we focus on simplerandom sampling with replacement, which has the followingcharacteristics:records are selected at random, each with equal probability each selected record is immediately available to be selectedagain in the same samplea record can be selected multiple times the population of records remains constant, i.e., none areexcludedOne data mining technique that uses simple random sampling withreplacement is bootstrap aggregating, also known as bagging.The technique involves combining the predictions of multiple modelsusually from the same algorithm within either classification or regression, e.g., classification decision trees. Each model is built from a different sample of the original dataset using simple random sampling with replacement. These samples are called bootstrap samples. A model composedof multiple other models is often referred to as an ensemble model.For regression models, the predictions of each component model areaveraged. For classification models, voting is used to select theprediction agreed upon by the majority of models. Bagging oftenincreases model accuracy and stability by reducing variance andavoiding overfitting.  This, however,  is a topic for another post.There are many possible ways to produce a random sample withreplacement. One approach is to identifyeach row by its row number and then to use a hash function to select therecords for our sample.  The following script samples 1200records from the CUSTOMERS table. We first create a view appending therow_number column using the rownum pseudo column. Iterating for thenumber of records in our sample, we populate the table CUSTOMERS_MAPwith the hash value, which is based on theiterator i and ranges from 1 to the number of records in CUSTOMERS.The final step is to create the sample view by joining theCUSTOMERS_MAP table to the CUSTOMERS_V table using the sample_numberand row_number in the join condition. CREATE VIEW CUSTOMERS_V AS   select rownum row_number, o.* fromCUSTOMERS o;CREATE TABLE CUSTOMERS_MAP (sample_number NUMBER);DECLARE  v_row_count NUMBER;  c_sample_size NUMBER := 1200;BEGIN  SELECT count(*) INTO v_row_count FROM CUSTOMERS;  FOR i IN 1..c_sample_size+1 LOOP    INSERT INTO CUSTOMERS_MAP (sample_number) VALUES (ORA_HASH(i,v_row_count, 12345));  END LOOP;  COMMIT;END;CREATE VIEW CUSTOMERS_SAMPLE_V ASSELECT v.* FROM CUSTOMERS_V v, CUSTOMERS_MAP m WHERE v.row_number = m.sample_number;To obtain a different sample, change the rows returned by modifying the seedvalue provided to ORA_HASH. Note that the sample view is dependent on thecontents of CUSTOMERS_MAP. Materializing the sample view ensures itwill not change if CUSTOMERS_MAP is reused or if the order of rows in the original table is not maintained.To conveniently produce multiple samples, we can extend the CUSTOMERS_MAP table as follows:CREATE TABLE CUSTOMERS_MAP (sample_number_1 NUMBER, sample_number_2NUMBER, sample_number_3 NUMBER);The INSERT statement populates each of these sample numbers using adifferent seed value for each ORA_HASH invocation:INSERT INTO CUSTOMERS_MAP (sample_number_1, sample_number_2,sample_number_3) VALUES (ORA_HASH(i, v_row_count, 12345),ORA_HASH(i,v_row_count, 23456),ORA_HASH(i, v_row_count, 34567));Lastly, the sample views can be created each using a different column from theCUSTOMERS_MAP table. Here is an example for sample 2:CREATE VIEW CUSTOMERS_SAMPLE_2_V ASSELECT v.* FROM CUSTOMERS_V v, CUSTOMERS_MAP m WHERE v.row_number = m.sample_number_2;This extension avoids having to maintain multiple map tables or materializingeach of the sample tables. Of course, as stated above, since the rows in the initialCUSTOMER table are not guaranteed to maintain the same order, eitherthe CUSTOMERS_V view or each of the sample views must be materializedto guarantee reproducible results.

This post continues "To sample or not to sample..." Part 1, Part 2, and Part 3.  In Part 1, we looked at the general motivation for sampling. In Part 2, we looked at simple randomsampling without...

To sample or not to sample... Part 3

This post continues "To sample or notto sample..." Part1 and Part2. In Part 1, we looked at the generalmotivation for sampling. In Part 2, we looked at simple randomsampling.In this post, we focus on stratified sampling. Recall that one of the pitfalls of simple random sampling is that not all values for a given column may be represented.Stratified sampling overcomes this by ensuring each value for a specified column (usually the target) isrepresented.The query below illustrates how to get a 60% stratified sample onthe CUSTOMERS table with target column BUYER, using the original distribution of BUYER column values 0and 1. This means that we areinterested in roughly 60% of the 1s and 60% of the 0s. Note that we could have tried a simple random sample, and chances are we would achieve a similar distribution, but it is possible to obtain a sample that contains no 1s. However, stratified sampling ensures that we obtain a representative set of 1s and 0s.WITH TARGET_COUNT AS (SELECT BUYER, count(*) CNT                      FROM CUSTOMERS                      WHERE BUYER IS NOT NULL                      GROUP BY BUYER)SELECT *    -- or filter out partition_row_numFROM  (SELECT row_number() over(partition by BUYER ORDER BY ORA_HASH(CUST_ID)) partition_row_num, t.*   FROM CUSTOMERS t   WHERE BUYER IS NOT NULL)WHERE partition_row_num = 1OR ( BUYER=1     AND ORA_HASH(partition_row_num,((SELECT CNT FROM TARGET_COUNT WHERE BUYER=1)-1),12345) <         (SELECT CNT FROM TARGET_COUNT WHERE BUYER=1) * 60 / 100)OR ( CD_BUYER=0     AND ORA_HASH(partition_row_num,((SELECT CNT FROM TARGET_COUNT WHERE BUYER=0)-1),12345) <         (SELECT CNT FROM TARGET_COUNT WHERE BUYER=0) * 60 / 100)In the WITH clause, we first countthe number of each target value, excluding nulls.The next key step is to assign a row number to each record partitionedby the target column. In the subquery, all records with target value 1have row numbers from 1..N and those with 0 have row numbers 1..M,where N and M are the counts of records with 1 and 0 values,respectively. Finally, we include the first row to ensure at least one record isreturned, and allow records where the partition_row_num hashes to avalue less than 60% of N for 1s or M for 0s. This can be extended to support multi-classcolumns simply be adding additional "OR" clauses. In some cases, we may need to alter the distribution of class values,say from 90% 0s and 10% 1s, to 50% of each, producing a balancedstratified sample. This is desirable in, for example, scenarios involving fraud detection, where incidences of fraud can berare, and we need to ensure there are sufficient representative casesof each class. Since thereare only 10% 1s, the largest sample we can have for balanced targetclasses is 20% (all of the 1s, and ~11% of the 0s). The following query produces a balanced sample. WITH TARGET_COUNT AS (SELECT BUYER, count(*) CNT                      FROM CUSTOMERS t                      WHERE BUYER IS NOT NULL                      GROUP BY BUYER)SELECT *    -- or filter out partition_row_numFROM (SELECT row_number() over(partition by BUYER ORDER BY ORA_HASH(CUST_ID)) partition_row_num, t.*      FROM CUSTOMERS t      WHERE BUYER IS NOT NULL)WHERE partition_row_num = 1OR ( BUYER=1 AND ORA_HASH(partition_row_num,((SELECT CNT FROM TARGET_COUNT WHERE BUYER=1)-1),12345) <     .20 * (SELECT SUM(CNT) FROM TARGET_COUNT)) / (SELECT COUNT(*) FROM TARGET_COUNT)OR ( BUYER=0 AND ORA_HASH(partition_row_num,((SELECT CNT FROM TARGET_COUNT WHERE BUYER=0)-1),12345) <     .20 * (SELECT SUM(CNT) FROM TARGET_COUNT)) / (SELECT COUNT(*) FROM TARGET_COUNT)The key difference occurs in the top level WHERE clause. The selection criteriadiffers after the '<' sign, where we take the sample percentage timesthe number of records divided by the number of target values. As before, this can be extended to support multi-classcolumns simply be adding additional "OR" clauses.

This post continues "To sample or not to sample..." Part 1 and Part 2. In Part 1, we looked at the general motivation for sampling. In Part 2, we looked at simple random sampling.In this post, we...

To sample or not to sample... Part 2

In my previous post To sample or not to sample, we discussed how sampling can beused to scale down a large dataset for exploratory mining, as wellas to produce train and test datasets. One type of sampling is called simple random sampling, which means that each record in the tablehas an equal probability of being selected. Simple random sampling can be with replacement, or without replacement. Without replacementmeans that each record will be selected at most once.  With replacement means that selected records are returned to the population so they can be reselected. Simple randomsampling without replacement is applicable to most situations where thedata is reasonably distributed, e.g, a binary target containing a reasonable split between the target values such as 50-50, 30-70. We consider two methods for simple random sampling without replacement: the SAMPLE SQL clause, and ORA_HASH. SAMPLE SQL ClauseOracle Database provides the SAMPLE clause that can be issued with aSELECT statement over a table. In the following query, we're randomly selectingrecords from the CUSTOMERS table with a 20% probability. SELECT * FROM customersSAMPLE (20);This meansthat each record has a 20% chance of being selected. Perhaps contrary to expectations, this willnormally not produce a result with 20% of the records fromthe CUSTOMERS table. This approximate sample size is quite adequate for most situations. A variant of the SAMPLE clause is SAMPLE BLOCK, where each block ofrecords has the same chance of being selected, 20% in our example. Since records are selected at the block level, this offers a performance improvement forlarge tables and should not adversely impact the randomness of the sample.SAMPLE and SAMPLE BLOCK allow the sample percent to range from .000001to, but not including, 100. An optional second parameter is the seedvalue, used to help ensure repeatability between executions. The seedvalue can range from 0 to 4294967295. SELECT * FROM customersSAMPLE BLOCK (20, 8621);Repeatability is important if, e.g., you want to compare models or testresults across model builds, where the only variation is model settings. While SAMPLE is built into the SQL syntax, it has limitations whenapplied to views or complex joins. SAMPLE relies on the existence of primary keys. If the underlying tables havethe necessary primary keys, you may be able to use a view, but if theprimary keys are absent, the SAMPLE clause may not work.To address this limitation, we can use a technique that uses ORA_HASH. ORA_HASHThe ORA_HASH technique can be used with both tables and views, whether or not primary keys are specified. The Oracle Data Miner user interface uses this technique to support sampling of both tables andviews. The ORA_HASHfunction computes a hash value for a given expression. The followingexample produces a sample of 60% of the CUSTOMERS table.WITH row_count AS (SELECT count(*) count FROM customers )SELECT c.*FROM customers cWHEREORA_HASH(CUST_ID,(SELECT count FROM ROW_COUNT)-1,12345) <     (SELECT count FROM ROW_COUNT) * 60 / 100 Note that the first argument to ORA_HASH is the expression on which to compute ahash value, in this case, the unique identifier CUST_ID. The secondargument determines the maximum value returned by the hash function.The third argument is an optional seed value, similar to that used onthe SAMPLE clause. The WHERE clause specifies to select records where the hash value isless than 60% of the number of records. As with the SAMPLE clause, thenumber of records returned is likely not to be 60% of the total records, since each record has a 60% probability of being selected.In my next post, we'll look at stratified sampling.

In my previous post To sample or not to sample, we discussed how sampling can be used to scale down a large dataset for exploratory mining, as well as to produce train and test datasets.One type of...

To sample or not to sample...

Ideally, we would know the exact answer to every question. Howmany people support presidential candidate A vs. B? How many peoplesuffer from H1N1 in a given state? Does this batch of manufacturedwidgets have any defective parts? Knowing exact answers is expensive interms of timeand money and, in mostcases, is impractical if not impossible. Consider asking every personin a region for theircandidate preference, testing every person with flu symptomsfor H1N1 (assuming every person reported when they had flu symptoms),or destructively testing widgets to determine if they are "good"(leaving no product to sell). Knowing exact answers, fortunately, isn'tnecessary or even useful in many situations. Understanding thedirection of a trend or statistically significant results may be sufficient to answer the underlyingquestion: who is likely to win the election, have we likely reached acriticalthreshold for flu, or is this batch of widgets good enough to ship?Statistics help us to answer these questions with a certain degree ofconfidence. This focuses on how we collect data. In data mining, we focus on the use of data, that is data thathas already been collected. In some cases, we may have all the data (all purchases made by all customers), in others the data may have been collected using sampling (voters, their demographics and candidate choice). Building data mining models on all ofyour datacan be expensive in terms of time and hardware resources. Consider acompany with 40 million customers. Do we need to mine all 40 millioncustomers to get useful data mining models? The quality of models builton all data may be no better than models built on a relatively small sample. Determining how much is a reasonable amount of datainvolves experimentation. When starting the model building process on large datasets, it is oftenmore efficient tobegin with a small sample, perhaps 1000 - 10,000 cases(records)depending on the algorithm, source data, and hardware. This allows youto seequickly what issues might arise with choice of algorithm, algorithmsettings, data quality, and need for further data preparation. Instead ofwaiting for a model on a large dataset to build only to find thatthe results don't meet expectations, once you are satisfied with theresults onthe initial sample, you can  take a larger sample tosee if model quality improves, and to get a sense of how the algorithm scalesto the particular dataset. If model accuracy or quality continues toimprove, consider increasing the sample size.Sampling in data mining is also used to produce a held-asideor test dataset for assessing classification and regressionmodel accuracy. Here, we reserve some of the builddata (data that includes known target values) to be used for anhonest estimate of model error using datathe model has not seen before. This sampling transformation is oftencalled a splitbecause the build data is split into two randomly selected sets, oftenwith 60% of the records being used for model building and 40% fortesting.Sampling must be performed with care, as itcan adversely affect model quality and usability. Even a truly random sampledoesn'tguarantee that all values are represented in a given categorical attribute.This is particularly troublesome when the attribute with omitted valuesisthe classification target attribute. A classification model that has not seen any examples for aparticular target value can never predict that target value! For predictor attributes, values may consist of a single value (a constantattribute) orall unique values (an identifier attribute), each of which may beexcluded during mining. Values from categorical predictor attributes that didn'tappear in the training data are not used when testingor scoring datasets. In subsequent posts, we'll talk about three samplingtechniques using Oracle Database: simple random sampling withoutreplacement, stratified sampling, and simple random sampling withreplacement.

Ideally, we would know the exact answer to every question. How many people support presidential candidate A vs. B? How many people suffer from H1N1 in a given state? Does this batch of manufacturedwidg...

Deploying Data Mining Models using Model Export and Import, Part 2

In my last post, Deploying Data Mining Models using Model Export and Import, we explored using DBMS_DATA_MINING.EXPORT_MODEL and DBMS_DATA_MINING.IMPORT_MODEL to enable moving a model from one system to another. In this post, we'll look at two distributed scenarios that make use of this capability and a tip for easily moving models from one machine to another using only Oracle Database, not an external file transport mechanism, such as FTP.   The first scenario, consider a company with geographically distributed business units, each collecting and managing their data locally for the products they sell. Each business unit has in-house data analysts that build models to predict which products to recommend to customers in their space. A central telemarketing business unit also uses these models to score new customers locally using data collected over the phone. Since the models recommend different products, each customer is scored using each model. This is depicted in Figure 1. Figure 1: Target instance importing multiple remote models for local scoring In the second scenario, consider multiple hospitals that collect data on patients with certain types of cancer. The data collection is standardized, so each hospital collects the same patient demographic and other health / tumor data, along with the clinical diagnosis. Instead of each hospital building it's own models, the data is pooled at a central data analysis lab where a predictive model is built. Once completed, the model is distributed to hospitals, clinics, and doctor offices who can score patient data locally. Figure 2: Multiple target instances importing the same model from a source instance for local scoring Since this blog focuses on model export and import, we'll only discuss what is necessary to move a model from one database to another. Here, we use the package DBMS_FILE_TRANSFER, which can move files between Oracle databases. The script is fairly straightforward, but requires setting up a database link and directory objects. We saw how to create directory objects in the previous post. To create a database link to the source database from the target, we can use, for example: create database link SOURCE1_LINK connect to <schema> identified by <password> using 'SOURCE1'; Note that 'SOURCE1' refers to the service name of the remote database entry in your tnsnames.ora file. From SQL*Plus, first connect to the remote database and export the model. Note that the model_file_name does not include the .dmp extension. This is because export_model appends "01" to this name. Next, connect to the local database and invoke DBMS_FILE_TRANSFER.GET_FILE and import the model. Note that "01" is eliminated in the target system file name.  connect <source_schema>/<password>@SOURCE1_LINK; BEGIN  DBMS_DATA_MINING.EXPORT_MODEL ('EXPORT_FILE_NAME' || '.dmp',                                 'MY_SOURCE_DIR_OBJECT',                                 'name =''MY_MINING_MODEL'''); END; connect <target_schema>/<password>; BEGIN  DBMS_FILE_TRANSFER.GET_FILE ('MY_SOURCE_DIR_OBJECT',                               'EXPORT_FILE_NAME' || '01.dmp',                               'SOURCE1_LINK',                               'MY_TARGET_DIR_OBJECT',                               'EXPORT_FILE_NAME' || '.dmp' );  DBMS_DATA_MINING.IMPORT_MODEL ('EXPORT_FILE_NAME' || '.dmp',                                 'MY_TARGET_DIR_OBJECT'); END; To clean up afterward, you may want to drop the exported .dmp file at the source and the transferred file at the target. For example,   utl_file.fremove('&directory_name', '&model_file_name' || '.dmp');

In my last post, Deploying Data Mining Models using Model Export and Import, we explored using DBMS_DATA_MINING.EXPORT_MODEL and DBMS_DATA_MINING.IMPORT_MODEL to enable moving a model from one system...

New R Interface to Oracle Data Mining Available for Download

 The R Interface to Oracle Data Mining ( R-ODM) allows R users to access the power of Oracle Data Mining's in-database functions using the familiar R syntax. R-ODM provides a powerful environment for prototyping data analysis and data mining methodologies.R-ODM is especially useful for:Quick prototyping of vertical or domain-based applications where the Oracle Database supports the application Scripting of "production" data mining methodologiesCustomizing graphics of ODM data mining results (examples: classification, regression, anomaly detection) The R-ODM interface allows R users to mine data using Oracle Data Mining from the R programming environment. It consists of a set of function wrappers written in source R language that pass data and parameters from the R environment to the Oracle RDBMS enterprise edition as standard user PL/SQL queries via an ODBC interface. The R-ODM interface code is a thin layer of logic and SQL that calls through an ODBC interface. R-ODM does not use or expose any Oracle product code as it is completely an external interface and not part of any Oracle product. R-ODM is similar to the example scripts (e.g., the PL/SQL demo code) that illustrates the use of Oracle Data Mining, for example, how to create Data Mining models, pass arguments, retrieve results etc. R-ODM is packaged as a standard R source package and is distributed freely as part of the R environment's Comprehensive R Archive Network (CRAN). For information about the R environment, R packages and CRAN, see www.r-project.org. R-ODM is particularly intended for data analysts and statisticians familiar with R but not necessarily familiar with the Oracle database environment or PL/SQL. It is a convenient environment to rapidly experiment and prototype Data Mining models and applications. Data Mining models prototyped in the R environment can easily be deployed in their final form in the database environment, just like any other standard Oracle Data Mining model. What is R? R is a system for statistical computation and graphics. It consists of a language plus a run-time environment with graphics, a debugger, access to certain system functions, and the ability to run programs stored in script files. The design of R has been heavily influenced by two existing languages: Becker, Chambers & Wilks' S and Sussman's Scheme. Whereas the resulting language is very similar in appearance to S, the underlying implementation and semantics are derived from Scheme.R was initially written by Ross Ihaka and Robert Gentleman at the Department of Statistics of the University of Auckland in Auckland, New Zealand.Since mid-1997 there has been a core group (the "R Core Team") who can modify the R source code archive. Besides this core group many R users have contributed application code as represented in the near 1,500 publicly-available packages in the CRAN archive (which has shown exponential growth since 2001; R News Volume 8/2, October 2008). Today the R community is a vibrant and growing group of dozens of thousands of users worldwide. It is free software distributed under a GNU-style copyleft, and an official part of the GNU project ("GNU S"). Resources:R website / CRAN R-ODM

  The R Interface to Oracle Data Mining ( R-ODM) allows R users to access the power of Oracle Data Mining's in-database functions using the familiar R syntax. R-ODM provides a powerful environment for...

Deploying Data Mining Models using Model Export and Import

In this post, we'll take a look at how Oracle Data Mining facilitates model deployment. After building and testing models, a next step is often putting your data mining model into a production system -- referred to as model deployment. The ability to move data mining model(s) easily into a production system can greatly speed model deployment, and reduce the overall cost. Since Oracle Data Mining provides models as first class database objects, models can be manipulated using familiar database techniques and technology. For example, one or more models can be exported to a flat file, similar to a database table dump file (.dmp). This file can be moved to a different instance of Oracle Database EE, and then imported. All methods for exporting and importing models are based on Oracle Data Pump technology and found in the DBMS_DATA_MINING package. Before performing the actual export or import, a directory object must be created. A directory object is a logical name in the database for a physical directory on the host computer. Read/write access to a directory object is necessary to access the host computer file system from within Oracle Database. For our example, we'll work in the DMUSER schema. First, DMUSER requires the privilege to create any directory. This is often granted through the sysdba account. grant create any directory to dmuser; Now, DMUSER can create the directory object specifying the path where the exported model file (.dmp) should be placed. In this case, on a linux machine, we have the directory /scratch/oracle.  CREATE OR REPLACE DIRECTORY dmdir AS '/scratch/oracle'; If you aren't sure of the exact name of the model or models to export, you can find the list of models using the following query: select model_name from user_mining_models; There are several options when exporting models. We can export a single model, multiple models, or all models in a schema using the following procedure calls:  BEGIN   DBMS_DATA_MINING.EXPORT_MODEL ('MY_MODEL.dmp','dmdir','name =''MY_DT_MODEL'''); END; BEGIN   DBMS_DATA_MINING.EXPORT_MODEL ('MY_MODELS.dmp','dmdir',                    'name IN (''MY_DT_MODEL'',''MY_KM_MODEL'')'); END; BEGIN   DBMS_DATA_MINING.EXPORT_MODEL ('ALL_DMUSER_MODELS.dmp','dmdir'); END; A .dmp file can be imported into another schema or database using the following procedure call, for example: BEGIN   DBMS_DATA_MINING.IMPORT_MODEL('MY_MODELS.dmp', 'dmdir'); END; As with models from any data mining tool, when moving a model from one environment to another, care needs to be taken to ensure the transformations that prepare the data for model building are matched (with appropriate parameters and statistics) in the system where the model is deployed. Oracle Data Mining provides automatic data preparation (ADP) and embedded data preparation (EDP) to reduce, or possibly eliminate, the need to explicitly transport transformations with the model. In the case of ADP, ODM automatically prepares the data and includes the necessary transformations in the model itself. In the case of EDP, users can associate their own transformations with attributes of a model. These transformations are automatically applied when applying the model to data, i.e., scoring. Exporting and importing a model with ADP or EDP results in these transformations being immediately available with the model in the production system.  

In this post, we'll take a look at how Oracle Data Mining facilitates model deployment.After building and testing models, a next step is often putting your data mining model into a production system...

More details on America's Cup use of Oracle Data Mining

Updated (Sept. 10, 2010) Full Article http://www.sail-world.com/USA/Americas-Cup:-Oracle-Data-Mining-supports-crew-and-BMW-ORACLE-Racing/68834Related articles/blogs:  Oracle Data Mining Races with America's Cup  SAIL-WORLD article - America's Cup: Oracle Data Mining supports crew and BMW ORACLE Racing BMW Oracle Racing's America's Cup: A Victory for Database Technology BMW Oracle Racing's victory in the 33rd America's Cup yacht race in February showcased the crew's extraordinary sailing expertise. But to hear them talk, the real stars weren't actually human. "The story of this race is in the technology," says Ian Burns, design coordinator for BMW Oracle Racing. Gathering and Mining Sailing DataFrom the drag-resistant hull to its 23-story wing sail, the BMW Oracle USA trimaran is a technological marvel. But to learn to sail it well, the crew needed to review enormous amounts of reliable data every time they took the boat for a test run. Burns and his team collected performance data from 250 sensors throughout the trimaran at the rate of 10 times per second. An hour of sailing alone generates 90 million data points.BMW Oracle Racing turned to Oracle Data Mining in Oracle Database 11g to extract maximum value from the data. Burns and his team reviewed and shared raw data with crew members daily using a Web application built in Oracle Application Express (Oracle APEX). "Someone would say, 'Wouldn't it be great if we could look at some new combination of numbers?' We could quickly build an Oracle Application Express application and share the information during the same meeting," says Burns. Analyzing Wind and Other Environmental ConditionsBurns then streamed the data to the Oracle Austin Data Center, where a dedicated team tackled deeper analysis. Because the data was collected in an Oracle Database, the Data Center team could dive straight into the analytics problems without having to do any extract, transform, and load processes or data conversion. And the many advanced data mining algorithms in Oracle Data Mining allowed the analytics team to build vital performance analytics. For example, the technology team could remove masking elements such as environmental conditions to give accurate data on the best mast rotation for certain wind conditions. Without the data mining, Burns says the boat wouldn't have run as fast. "The design of the boat was important, but once you've got it designed, the whole race is down to how the guys can use it," he says. "With Oracle database technology we could compare the incremental improvements in our performance from the first day of sailing to the very last day. With data mining we could check data against the things we saw, and we could find things that weren't otherwise easily observable and findable."

Updated (Sept. 10, 2010) Full Article http://www.sail-world.com/USA/Americas-Cup:-Oracle-Data-Mining-supports-crew-and-BMW-ORACLE-Racing/68834 Related articles/blogs:   Oracle Data Mining Races with...

OpenWorld 2010 Call for Presentations is Now Open

OpenWorld 2010 Call for Presentations is Now Open TheOpenWorld team has offically opened the "Create or View yourSubmissions" tab on the OpenWorld site so now is the time to startsubmitting your ideas for presentations for this years OpenWorldConference. Ideally we are looking for papers from you that cover thefollowing areas/topics: data warehousing, data services, real-timeanalytics, predictive analytics/data mining, exadata/database machine, OLAP, data qualityand data integration. If you are stuck forideas and need some inspiration (for product areas, abstract titles etcetc), take a look at the list of presentations from last year'sconference:http://www.oracle.com/ocom/groups/public/documents/webcontent/034315.pdf Remember,there is no better place than Oracle OpenWorld for you to present yourdata warehouse ideas, experiences, and accomplishments to the world'sbiggest gathering of Oracle Customers, Developers, Partners, peers fromaround the world, and the most-influential members of the media. Tensof thousands of the world's most active developers and most demandingusers will be on hand to hear what you have to say.To start submitting your ideas follow this link here (OpenWorld 2010 Call for Papers) and click on the tab "Create or View your Submissions".If this is the first time you have submitted a paper for OpenWorld thenyou will need to complete a short registration process.If you have any questions, drop me an email at charlie.berger@oracle.com.  Good luck!

OpenWorld 2010 Call for Presentations is Now Open The OpenWorld team has offically opened the "Create or View your Submissions" tab on the OpenWorld site so now is the time to startsubmitting your...

Generating cluster names from a document clustering model, Part 3

My previous post Generatingcluster names from a document clustering model, Part 2 provided adeeper look at SQL constructs used to retrieve model details from ak-Meansmodel. In this post, we'll look at using the LEAD function to constructthe clusternames and the cursor ClusterLeafIds definition for selecting the leafclusters. The following SQL builds upon the id, term, and centroid_mean subqueryto concatenate the top five terms for each cluster. In this example,we've replaced the c.cluster_idvariable (from the original script) with the specific cluster id 4. SELECT'SESSION09_PRE92765_CL' model_name, cluster_name, 4 cluster_id FROM (      SELECT id, term || '-' ||           LEAD(term, 1) OVER (ORDER BY id) || '-' ||           LEAD(term, 2) OVER (ORDER BY id) || '-' ||           LEAD(term, 3) OVER (ORDER BY id) || '-' ||           LEAD(term, 4) OVER (ORDER BY id) cluster_name      FROM (          SELECT id, text term, centroid_mean        FROM (        SELECT rownum id, a.*        FROM (             SELECT cd.attribute_subname term,                    cd.mean              centroid_mean             FROM (               SELECT *               FROMTABLE(dbms_data_mining.get_model_details_km('SESSION09_PRE92765_CL', 4, null, 1, 0, 0)) ) a,               TABLE(a.centroid) cd             order by cd.mean desc) a        WHERE rownum < 6) x,         DM4J$VSESSION09_710479489 y       WHERE x.term=y.attribute_id       ORDER BY centroid_mean       )    )WHERE id=1;Which produces the result:SESSION09_PRE92765_CL  FINANCIALMANAGEMENT-PEOPLESOFT-PROJECT-FINANCIAL-SUPPLY 4from the subquery resultID        TERM          CENTROID_MEAN5    SUPPLY               0.143414    FINANCIAL            0.165193    PROJECT              0.169652    PEOPLESOFT           0.182001    FINANCIALMANAGEMENT  0.18574How does the LEADfunction work? LEAD is one of the analytical functions with itscompanion LAG. As noted in the documentation, these are used when therelative position of rows can be known reliably. The LEAD functionprovides access to a row at a given offset after the currentposition, whereas LAG provides access to a row at a given offset priorto the current position.In the example above, we specified LEAD(term, 1) OVER (ORDER BY id), where term refers to the value expression and 1 refers to the offset from the current row.The OVER clause allows us to specify that we are ordering the rows by the id column. We concatenate the first term and the 4 leading terms. Note, we could have just as easily used the LAG function by ordering on the CENTROID_MEAN:      SELECT id, term || '-' ||           LAG(term, 1) OVER (ORDER BY centroid_mean) || '-' ||           LAG(term, 2) OVER (ORDER BY centroid_mean) || '-' ||           LAG(term, 3) OVER (ORDER BY centroid_mean) || '-' ||           LAG(term, 4) OVER (ORDER BY centroid_mean) cluster_nameThe last part of the original script we'll discuss is the ClusterLeafIds cursor,which is used to select only the leaf clusters for assigning clusternames. Since Oracle Data Mining clustering algorithms are hierarchical,the results from dbms_data_mining.get_model_details_km provides details for all clusters in the model.  CURSOR ClusterLeafIds IS    --Obtain leaf clusters    SELECT CLUSTER_ID, RECORD_COUNT    FROM (      SELECT distinct clus.ID AS CLUSTER_ID,             clus.RECORD_COUNT RECORD_COUNT,             clus.DISPERSION DISPERSION,             clus.PARENT PARENT_CLUSTER_ID,             clus.TREE_LEVEL TREE_LEVEL,             CASE WHEN chl.id IS NULL THEN 'YES'                                      ELSE 'NO' END IS_LEAF       FROM (SELECT *             FROMTABLE(dbms_data_mining.get_model_details_km('SESSION09_PRE92765_CL'))) clus,             TABLE(clus.child) chl     )    WHERE is_leaf='YES'    ORDER BY cluster_id;The inner subquery produces the following results. Of course, record_count, dispersion, parent_cluster_id, and tree_level are not essential to the problem at hand, but included as an example of content available from get_model_details_km.CLUSTER_ID RECORD_COUNT  DISPERSION  PARENT_CLUSTER_ID TREE_LEVEL IS_LEAF28             70         0.717292         18               6       YES32             62         0.650029         17               6       YES31             33         0.598149         16               6       YES6             231         0.712922          3               3        NO12             76         0.668168          6               4        NO1            1115         0.815162          1            null        NOThe distinct clause removes duplicates caused by non-leaf nodes, which will have two children. Note the result without distinct:SELECT  clus.ID AS CLUSTER_ID, clus.RECORD_COUNT RECORD_COUNT,         chl.idFROM (SELECT *       FROM TABLE(dbms_data_mining.get_model_details_km('SESSION09_PRE92765_CL'))) clus,      TABLE(clus.child) chlCLUSTER_ID RECORD_COUNT /*Child*/ID1              1115            21              1115            32               699            42               699            53               416            6...24               75           4325               80         null26               43         null27               49         nullFinally, we select those cluster ids where is_leaf is 'YES' to produce the list of clusters that we ultimately want to generate cluster names for.

My previous post Generating cluster names from a document clustering model, Part 2 provided a deeper look at SQL constructs used to retrieve model details from a k-Meansmodel. In this post, we'll...

Oracle Recognized as a Leader in Data Mining

Redwood Shores, CA - February 24, 2010 News FactsAccording to a February 2010 report from independent analyst firmForrester Research, Oracle is a leader in predictive analytics and datamining (PA/DM). "The Forrester Wave™:Predictive Analytics And Data Mining Solutions, Q1 2010," written bySenior Analyst James G. Kobielus, states that "Oracle provides a PA/DMsolution portfolio that is built into its own widely adopted DBMS, DW,data integration, and BI platforms, with a wide range of prepackagedpredictive applications, and it provides a powerful assortment ofalgorithms for mining complex structured and unstructured informationtypes." An option to Oracle Database 11g Enterprise Edition, Oracle Data Miningenables customers to integrate actionable predictive information andbuild into business intelligence and other applications. Using the datamining functionality in Oracle Database 11g, customers can easily findpatterns and insights otherwise hidden in their data warehouses. The Sun Oracle Database Machine delivers increased Oracle Data Mining performance by performing scoring of data mining models in Oracle Exadata Storage Servers.Supporting Quote"We're pleased that Forrester's Wave recognizes Oracle's position as aleader in predictive analytics and data mining, and in particular, thestrength of Oracle Database 11g," said Ray Roccaforte, vice presidentof Data Warehousing and Analytics, Oracle. "By moving predictiveanalytics to their Oracle Database, customers save time and money byeliminating data movement and duplication into multiple repositoriesand servers. And now with the Sun Oracle Database Machine, customerscan offload data mining scoring to intelligent Oracle Exadata StorageServers for even faster predictive analysis." Supporting Resources Read The Forrester Wave: Predictive Analytics And Data Mining Solutions, Q1 2010 About Oracle Database 11g About Oracle Data Mining Oracle Data Mining Data Sheet About Sun Oracle Database Machine Follow Oracle Database on Twitter Blog: Oracle Database InsiderAbout OracleOracle(NASDAQ: ORCL) is the world's most complete, open, and integratedbusiness software and hardware systems company. For more informationabout Oracle, visit oracle.com.TrademarksOracleand Java are registered trademarks of Oracle and/or its affiliates.Other names may be trademarks of their respective owners.Contact InfoGreg LunsfordOracle+1.650.506. 6523greg.lunsford@oracle.comKristin ReevesBlanc & Otus for Oracle+1.415.856.5145kreeves@bando.com

Redwood Shores, CA - February 24, 2010 News Facts According to a February 2010 report from independent analyst firm Forrester Research, Oracle is a leader in predictive analytics and datamining...

Oracle Data Mining Races with America's Cup

Oracle Data Mining was used by the performance analysis team of the BMW/Oracle Racing team in their preparation to win the America's Cup race off the coast of Spain. The America's Cup has been away from U.S. shores for 15 years, the longest drought since 1851.  With the challenge of squeezing out every micro-joule of energy from the wind and with the goal of maximizing "velocity made good", the BMW Oracle Racing Team turned to Oracle Data Mining.  "Imagine standing under an avalanche of data- 2500 variables, 10 times per second and a sailing team demandinganswers to design and sailing variations immediately. This was thechallenge facing the BMW ORACLE Racing Performance Analysis Team everysailing day as they refined and improved their giant 90 foot wide, 115foot long trimaran sporting the largest hard-sail wing ever made. UsingORACLE DATA MINING accessing an ORACLE DATABASE and presenting resultsreal time using ORACLE APPLICATION EXPRESS the performance team managedto provide the information required to optimise the giant multihull tothe point that it not only beat the reigning America's Cup championsAlinghi in their giant Catamaran but resoundingly crushed them in apower display of high speed sailing. After two races - and two massivewinning margins - the America's Cup was heading back to America - atriumph for the team, ORACLE and American technology." --Ian Burns, Performance Director, BMW ORACLE Racing Team Visit the http://www.sail-world.com/USA/Americas-Cup:-Oracle-Data-Mining-supports-crew-and-BMW-ORACLE-Racing/68834 for pictures, videos and full information.

Oracle Data Mining was used by the performance analysis team of the BMW/Oracle Racing team in their preparation to win the America's Cup race off the coast of Spain. The America's Cup has been away...

Generating cluster names from a document clustering model, Part 2

My previous post, Generatingcluster names from a document clustering model, included a SQLscript that involved advanced SQL constructs. In this post, we'll look at those constructs used to retrieve model details from a k-Meansmodel. These model details provide the top terms for naming eachcluster. Let's get started... The following SQL allows us to retrieve the text terms in each centroidin decreasing order of their centroid value, i.e., the importance ofeach term in describing the given cluster. In the script, we used thisas a basis to skim off the top 5 terms for constructing the clustername. Recall that the query below was within a FOR loop and was executedfor each of the relevant cluster ids (c.cluster_id). SELECT cd.attribute_subname  term,       cd.mean               centroid_meanFROM (      SELECT *      FROMTABLE(dbms_data_mining.get_model_details_km('SESSION09_PRE92765_CL',  c.cluster_id, null, 1, 0, 0)) ) a,      TABLE(a.centroid) cdORDER BY cd.mean DESCThe table function dbms_data_mining.get_model_details_kmreturns a set of rows that provide k-Means clusteringmodel details. This function allows us to specify several parameters: model_name - the name of the clustering modelcluster_id - the id of the cluster we want details for -- aninvalid cluster_id returns details for all clustersattribute - the name of the attribute we want details for -- wespecified null since we don't care about a specific attributecentroid - we specify 1 since we want details about the centroidshistogram - we specify 0 since we don't want histogram detailsrules - we specify 0 since we don't want rulesThe return type of get_model_details_km is DM_CLUSTERS. Thiscontains rows of type DM_CLUSTER, which have the following columns available: id NUMBER cluster_id VARCHAR2(4000) record_count NUMBER parent NUMBER tree_level NUMBER dispersion NUMBER split_predicate DM_PREDICATES child DM_CHILDREN centroid DM_CENTROIDS histogram DM_HISTOGRAMS rule DM_RULEWe first convert the result of get_model_details_km to a tableusing the TABLE operator and then extract the centroid column (a.centroid)which has type DM_CENTROIDS. The rows, which are of type DM_CENTROID,have the following columns available: attribute_name VARCHAR2(4000) attribute_subname VARCHAR2(4000) mean NUMBER mode_value VARCHAR2(4000) variance NUMBERSince we provided a single column in the build (training) dataset, we're notinterested in the attribute_name column. However, for text, weuse the attribute_subname column to obtain the text term itself. Wealso use the centroid mean column to order the text termsfrom most important to least. At this stage, our results are fairly cryptic since the algorithm deals with numeric term ids, not the actual text:TERM    CENTROID_MEAN1447    0.185740488979750011471    0.1820034405159761453    0.169655784210727991460    0.165198757439860011418    0.143412264783544991500    0.1418260452896321429    0.139553071123386To obtain the corresponding text terms, we join this result with themapping data table obtained from the Text step of the Oracle Data MinerActivity. In our example, this was the auto-generated table name DM4J$VSESSION09_710479489. SELECT id, text term, centroid_mean FROM (SELECT rownum id, a.*      FROM (SELECT cd.attribute_subname      term,                   cd.mean                   centroid_mean            FROM (SELECT *                  FROM TABLE(dbms_data_mining.get_model_details_km('SESSION09_PRE92765_CL',4, null,1,0,0)) ) a,                 TABLE(a.centroid) cd            order by cd.mean desc) a       WHERE rownum < 6) x,        DM4J$VSESSION09_710479489 yWHERE x.term=y.attribute_idORDER BY centroid_meanA couple other features to note from this query include selecting the top five terms using the statement WHERE rownum < 6, and the use of rownum to generate an id. This id column will be used for the LEAD functions to construct the cluster name. The result now looks like:ID   TERM                 CENTROID_MEAN5    SUPPLY               0.143414    FINANCIAL            0.165193    PROJECT              0.169652    PEOPLESOFT           0.182001    FINANCIALMANAGEMENT  0.18574Next, we'll look at using the LEAD function to construct the cluster names and the cursor ClusterLeafIds definition for selecting the leaf clusters.

My previous post, Generating cluster names from a document clustering model, included a SQLscript that involved advanced SQL constructs. In this post, we'll look at those constructs used to retrieve...

Oracle BIWA SIG Announces Survey Results

The Oracle Business Intelligence, Warehousing, and Analytics (BIWA) SIG announced their 2010 Annual Membership Survey results today. BIWA SIG focuses on providing members with the latest information about BIWA trends, and opportunities to network with the best of the industry professionals, other Oracle User Groups, and other like-minded Oracle users. See also the BIWA SIG Blog. From the Oracle Data Mining perspective, survey respondents (73%) cited data mining and predictive analytics as a major technology interest. More broadly, respondents were asked to consider their interest in the three BIWA SIG areas: Business Intelligence, Warehousing, and Analytics--currently, and in three years. Consistent with the recent TDWI survey, respondents see an increasing primary interest in analytics, going from 22% to 26% and overtaking interest in warehousing, which went from 28% to 21%. Business Intelligence increased slightly, going from 53% to 55%.Some highlights of the survey include (from today's email announcement):The BIWA Board of Directors conducted this 2010 Annual BIWA MembershipSurvey to understand better the needs and interests of our membership.It was a huge success both in terms of number of responses and valuableinput from our membership. With nearly 200 respondents, we had roughlya 10% response rate, which for surveys in general is excellent. The top 6 ideas considered for the 3 "best idea" gift certificatesincluded (first three awarded):TechCast TracksPeriodic Newsletter to Increase Website TrafficPersonal Invitation for TechCastsMembers vote on which TechCasts to run, like Slideshare.comMatch members with similar interests/concerns, like Match.comRun BIWA-specific ContestsHere are a few highlights from the survey. A more complete discussionof results is located here.What BIWA features do you use the most?60% use the OracleBIWA.org website repeatedly throughout theyear44% have viewed four or more live Techcasts34% having viewed recorded TechCastsWhat types of TechCasts interest you the most?95% "Best Practices" and "Tips and Tricks"90% "Case Studies"How much would you be willing to pay for BIWA membership?30% would pay $50 per year45% would pay $50 or more per year41% would not pay for membershipSome of the top interests for our respondents included: Applications: Hyperion (55%), EBS (43%)Products:       Business Intelligence (95%), EPM(43%)Technology:  Business Intelligence (95%),Warehousing (75%), Data Mining (73%), OLAP (65%)The BIWA Board of Directors continues to review the results of thesurvey and incorporate ideas and suggestions to improve BIWA SIG. TheBIWA Board of Directors will be actively seeking volunteers andsub-committee members to help implement these. If you are interested,contact Shyam Varan Nath (shyamvaran@gmail.com), BIWA SIG President.

The Oracle Business Intelligence, Warehousing, and Analytics (BIWA) SIG announced their 2010 Annual Membership Survey results today. BIWA SIG focuses on providing members with the latest information...

Get Ready for the New Oracle Data Miner 11gR2 GUI!

The new Oracle Data Miner 11g Release 2 Graphical UserInterface (GUI) is now available internally for Oracle Sales Consultants to give hosted demos to interested customers.  The new ODM'r 11g Release 2 Graphical UserInterface (GUI) provides more graphics, the ability to define, save andshare analytical "work flows" to solve business problems, andprovides more automation and simplicity.  For example, instead ofbuilding a single classification data mining model as with Oracle DataMiner "Classic" GUI, ODM'r 11gR2 new GUI automatically builds four (4)predictive models using all four ODM classification algorithms: decision tree, logistic regression (GLM), support vector machine (SVM)andNaive Bayes.  The Oracle Data Miner 11gR2 new GUI adds a wide rangeof graphical model viewers such as the decision tree, support vectormachine and clustering model viewers shown below. The new Oracle Data Miner 11g Release 2 work flow GUI will be availablenow internally only for Sales Consultants to givecontrolled customer demos using our hosted server and demo datasets. Sales Consultants will be able to download the new 11gR2 GUI andestablish a connection to the hosted ODM 11gR2 demo server.  Customers who are interested in participating in the beta release program should contact their Oracle Sales Rep, stay tuned to this ODM Blog, watch the ODM page(s) on OTN, or visit ODM on Facebook or follow my CharlieDataMine Tweets.  Oracle Data Miner 11gR2 Work Flow screen shot:Oracle Data Miner 11gR2 Data Explore screen shot:Oracle Data Miner 11gR2 Decision Tree model viewer screen shot:Oracle Data Miner 11gR2 Cluster Tree model viewer screen shot:

The new Oracle Data Miner 11g Release 2 Graphical UserInterface (GUI) is now available internally for Oracle Sales Consultants to give hosted demos to interested customers.  The new ODM'r 11g Release...

Generating cluster names from a document clustering model

Text mining is a hot topic, especially for document clustering. Say you have apotentially large set of documents that you'd like to sort into some number ofrelated groups. Sometimes it is enough to know which documents are in the samegroup (or cluster) and be able to assign new documents to the existing set ofgroups. However, you may also want a description of the clusters to helpunderstand what types of documents are in those clusters. Automatically generating cluster names would be much easier thanexamining cluster centroids or reading a sample of documents in each cluster. Oracle Data Mining supports this use case and below is ascript that generates cluster names from a clustering model. To use this script, you first need a clustering model and atext mapping table. These are easily produced using the Oracle Data Minergraphical user interface to automatically transform the data and then build themodel. To get started, provide a data table with two columns: a numeric id column and a VARCHAR2 column containing the document text. Here are a few key screen captures to guide you. I'musing a dataset from Oracle Open World that includes all the session text(title and abstract concatenated). By the way, this session document clustering was part of the process forproducing the Session Recommendation Engine for Oracle Open World 2008 and 2009.In Oracle Data Miner, start a build activity for clusteringusing k-Means. Then, select the dataset and the unique identifier, and clickNext. (Click images to enlarge.)Check the SESSION_TEXT attributes as "input" and change the "miningtype" to "text."Click advanced settings at the end of the wizard to reveal settingsyou can tailor. Since we have a single TEXT column, click on the tabs for "OutlierTreatment," "Missing Values," and "Normalize" and disable each step by clickingthe box in the upper left-hand corner. Whereas these are often necessary fork-Means, our single text column and text transformation eliminate the need these. Clicking the "Text" tab, you may specify various text-specific settings. Forexample, you may have a custom stopword list or lexer that you want to use, asshown below.Clicking the "Feature Extraction" sub-tab allows you tospecify maximum number of terms to represent each document and the maximumnumber of terms to represent all documents. Click the "Build" tab to specify the number of clusters(groups) you want to have. For text, we recommend the "cosine" distancefunction. Depending on your needs, you may want to specify the split criterionto "size" to have clusters of more equal size. For a better model, set maximuminteractions to 20. Oracle Data Miner now generates an activity that performsthe text transformation and model building. To obtain the model name from the Build step, copy the textnext to "Model Name." To obtain the mapping table, click the "Output Data" linkunder the Text step. Click the "Mapping Data" link and copy the name of thetable at the top of the window.Now, you're nearly ready to invoke the following script togenerate the cluster names. Create a table like CLUSTER_NAME_MAP below to store theresults. Then, replace the model name used below ('SESSION09_PRE92765_CL')with your model name, and the mapping table name used below (DM4J$VSESSION09_710479489)with your mapping table name. createtable cluster_name_map (model_name   VARCHAR(40),                               cluster_nameVARCHAR2(1999),                                cluster_id   NUMBER,                                record_countNUMBER); Run this script on your model and table. Look below to seesome sample output from the Open World session data. (Note that some columns are included in the script below, even though not required, to highlight data available in the model.)DECLARE  CURSOR ClusterLeafIds IS    --Obtain leaf clusters    SELECT CLUSTER_ID, RECORD_COUNT    FROM (      SELECT distinct clus.ID AS CLUSTER_ID,              clus.RECORD_COUNT RECORD_COUNT,              clus.DISPERSION DISPERSION,              clus.PARENT PARENT_CLUSTER_ID,              clus.TREE_LEVEL TREE_LEVEL,              CASE WHEN chl.id IS NULL THEN 'YES'                                       ELSE 'NO' END IS_LEAF        FROM (SELECT *              FROMTABLE(dbms_data_mining.get_model_details_km('SESSION09_PRE92765_CL'))) clus,                 table(clus.child) chl     )    WHERE is_leaf='YES'    ORDER BY cluster_id;BEGIN  FOR c IN ClusterLeafIds LOOP    INSERT INTO cluster_name_map (model_name,cluster_name,                                   cluster_id, record_count)    SELECT 'SESSION09_PRE92765_CL' model_name,cluster_name,             c.cluster_id cluster_id,c.record_count record_count     FROM (      SELECT id, term || '-' ||           LEAD(term, 1) OVER (ORDER BY id) ||'-' ||           LEAD(term, 2) OVER (ORDER BY id) ||'-' ||           LEAD(term, 3) OVER (ORDER BY id) ||'-' ||           LEAD(term, 4) OVER (ORDER BY id)cluster_name      FROM (         SELECT id, text term, centroid_mean        FROM (        SELECT rownum id, a.*        FROM (             SELECT cd.attribute_subname  term,                    cd.mean               centroid_mean             FROM (               SELECT *               FROMTABLE(dbms_data_mining.get_model_details_km('SESSION09_PRE92765_CL',  c.cluster_id, null, 1, 0, 0, null)) ) a,               TABLE(a.centroid) cd             order by cd.mean desc) a        WHERE rownum < 6) x,         DM4J$VSESSION09_710479489 y      WHERE x.term=y.attribute_id      ORDER BY centroid_mean      )    )    WHERE id=1;  END LOOP;END; Each cluster name is the concatenation of the top 5 terms (words with the highest ranking centroid values) that represent the cluster. The the image below, the second column is the cluster id, and the third column is the count of documents assigned to that cluster.Cluster names can also be assigned to the model clusters directly in the model.Assigning cluster names and the advanced SQL in the script will be covered infuture blog posts.  

Text mining is a hot topic, especially for document clustering. Say you have a potentially large set of documents that you'd like to sort into some number ofrelated groups. Sometimes it is enough to...

Readable rules from a Decision Tree model

One of the many algorithms supported by Oracle Data Mining (ODM) is the decision tree algorithm.  This algorithm is popular, in large part, due to the transparency of its internals.  ODM provides model details for its algorithms, and decision tree is no exception.  The dbms_data_mining.get_model_details_xml function is used to retrieve an XML representation of the tree (PMML compliant) which is a complete description needed for scoring.  Even though the xml is complete, it is not easy to read - it is not merely a simple table of rules.So how can we produce something that is easy to understand?  Oracle has been busy adding support for XML to its query processing engine, and this functionality can be used to parse the xml document and translate it to relational form.  In addition, since trees are heirarchical in nature, Oracle's heirarchical processing (connect_by functionality) can be leveraged to roll up information along a path in the tree.Given a decision tree model named DT_SH_CLAS_SAMPLE (as created by provided ODM sample code), the Oracle sql engine can be used to translate the xml into readable rules.The distribution of target class values in each node can be generated with:SELECT * FROM    XMLTable('for $s in /PMML/TreeModel//ScoreDistribution              return                <scores id="{$s/../@id}"                        tvalue="{$s/@value}"                        tcount="{$s/@recordCount}"                />'      passing dbms_data_mining.get_model_details_xml('DT_SH_CLAS_SAMPLE')            COLUMNS              node_id      NUMBER PATH '/scores/@id',              target_value VARCHAR2(4000) PATH '/scores/@tvalue',              target_count NUMBER PATH '/scores/@tcount')ORDER BY node_id, target_value;This code uses XMLTable to parse the xml and convert the results to relational form, which are then simply returned without much further processing.  The only thing that needs to be changed to apply the above query to a different model is to change the name of the model that is passed to the get_model_details_xml function.To generate the readable rules requires quite a bit more code, but can also be used for new models with the same replacement:WITH X as(SELECT * FROM XMLTable('for $n in /PMML/TreeModel//Node            let $rf :=              if (count($n/CompoundPredicate) > 0) then                $n/CompoundPredicate/*[1]/@field              else                if (count($n/SimplePredicate) > 0) then                  $n/SimplePredicate/@field                else                  $n/SimpleSetPredicate/@field            let $ro :=              if (count($n/CompoundPredicate) > 0) then                if ($n/CompoundPredicate/*[1] instance of                    element(SimplePredicate)) then                  $n/CompoundPredicate/*[1]/@operator                else if ($n/CompoundPredicate/*[1] instance of                    element(SimpleSetPredicate)) then                  ("in")                else ()              else                if (count($n/SimplePredicate) > 0) then                  $n/SimplePredicate/@operator                else if (count($n/SimpleSetPredicate) > 0) then                  ("in")                else ()            let $rv :=              if (count($n/CompoundPredicate) > 0) then                if ($n/CompoundPredicate/*[1] instance of                    element(SimplePredicate)) then                  $n/CompoundPredicate/*[1]/@value                else                  $n/CompoundPredicate/*[1]/Array/text()              else                if (count($n/SimplePredicate) > 0) then                  $n/SimplePredicate/@value                else                  $n/SimpleSetPredicate/Array/text()            let $sf :=              if (count($n/CompoundPredicate) > 0) then                $n/CompoundPredicate/*[2]/@field              else ()            let $so :=              if (count($n/CompoundPredicate) > 0) then                if ($n/CompoundPredicate/*[2] instance of                    element(SimplePredicate)) then                  $n/CompoundPredicate/*[2]/@operator                else if ($n/CompoundPredicate/*[2] instance of                    element(SimpleSetPredicate)) then                  ("in")                else ()              else ()            let $sv :=              if (count($n/CompoundPredicate) > 0) then                if ($n/CompoundPredicate/*[2] instance of                    element(SimplePredicate)) then                  $n/CompoundPredicate/*[2]/@value                else                  $n/CompoundPredicate/*[2]/Array/text()              else ()            return              <pred id="{$n/../@id}"                    score="{$n/@score}"                    rec="{$n/@recordCount}"                    cid="{$n/@id}"                    rf="{$rf}"                    ro="{$ro}"                    rv="{$rv}"                    sf="{$sf}"                    so="{$so}"                    sv="{$sv}"              />'      passing dbms_data_mining.get_model_details_xml('DT_SH_CLAS_SAMPLE')            COLUMNS              parent_node_id   NUMBER PATH '/pred/@id',              child_node_id    NUMBER PATH '/pred/@cid',              rec              NUMBER PATH '/pred/@rec',              score            VARCHAR2(4000) PATH '/pred/@score',              rule_field       VARCHAR2(4000) PATH '/pred/@rf',              rule_op          VARCHAR2(20) PATH '/pred/@ro',              rule_value       VARCHAR2(4000) PATH '/pred/@rv',              surr_field       VARCHAR2(4000) PATH '/pred/@sf',              surr_op          VARCHAR2(20) PATH '/pred/@so',              surr_value       VARCHAR2(4000) PATH '/pred/@sv'))select pid parent_node, nid node, rec record_count,      score prediction, rule_pred local_rule, surr_pred local_surrogate,      rtrim(replace(full_rule,'$O$D$M$'),' AND') full_simple_rule from (select row_number() over (partition by nid order by rn desc) rn, pid, nid, rec, score, rule_pred, surr_pred, full_rule from ( select rn, pid, nid, rec, score, rule_pred, surr_pred,   sys_connect_by_path(pred, '$O$D$M$') full_rule from (  select row_number() over (partition by nid order by rid) rn,    pid, nid, rec, score, rule_pred, surr_pred,    nvl2(pred,pred || ' AND ',null) pred from(   select rid, pid, nid, rec, score, rule_pred, surr_pred,     decode(rn, 1, pred, null) pred from (    select rid, nid, rec, score, pid, rule_pred, surr_pred,     nvl2(root_op, '(' || root_field || ' ' || root_op || ' ' || root_value || ')', null) pred,     row_number() over (partition by nid, root_field, root_op order by rid desc) rn from (     SELECT       connect_by_root(parent_node_id) rid,       child_node_id nid,       rec, score,       connect_by_root(rule_field) root_field,       connect_by_root(rule_op) root_op,       connect_by_root(rule_value) root_value,       nvl2(rule_op, '(' || rule_field || ' ' || rule_op || ' ' || rule_value || ')',  null) rule_pred,       nvl2(surr_op, '(' || surr_field || ' ' || surr_op || ' ' || surr_value || ')',  null) surr_pred,       parent_node_id pid       FROM (        SELECT parent_node_id, child_node_id, rec, score, rule_field, surr_field, rule_op, surr_op,               replace(replace(rule_value,'&quot; &quot;', ''', '''),'&quot;', '''') rule_value,               replace(replace(surr_value,'&quot; &quot;', ''', '''),'&quot;', '''') surr_value        FROM (          SELECT parent_node_id, child_node_id, rec, score, rule_field, surr_field,                 decode(rule_op,'lessOrEqual','<=','greaterThan','>',rule_op) rule_op,                 decode(rule_op,'in','('||rule_value||')',rule_value) rule_value,                 decode(surr_op,'lessOrEqual','<=','greaterThan','>',surr_op) surr_op,                 decode(surr_op,'in','('||surr_value||')',surr_value) surr_value          FROM X)       )       CONNECT BY PRIOR child_node_id = parent_node_id     )    )   )  )  CONNECT BY PRIOR rn = rn - 1         AND PRIOR nid = nid  START WITH rn = 1))where rn = 1;If this query is being run from sqlplus, make sure to:set define offto avoid replacing the & characters in the query.For each node in the tree, this query will provide the id of the parent node, the number of records in the node, the top predicted class for the node, the rule followed to get to the node from its parent, the surrogate rule that would be followed if the main rule cannot be used (e.g., the attribute is null), and the entire, simplified, rule from the root to the node.The simplified rule ignores surrogates and combines multiple rule pieces for the same attribute into a single piece to increase readability.Note that these queries provide information for branches and leaves of the tree, which is useful since ODM can score a record as stopping in a branch if there is not enough information to continue further down the tree.

One of the many algorithms supported by Oracle Data Mining (ODM) is the decision tree algorithm.  This algorithm is popular, in large part, due to the transparency of its internals.  ODM provides...

Real-time scoring with nested predictive models for missing value imputation

Let's suppose that you work at a bank that has been on a acquision spree acquiring several smaller banks.  That's great, but now you need to reach out to these new customers with proactive marketing programs to keep them as customers with your new, larger mega-bank.  Suppose that you have already used Oracle Data Mining's in-database data mining functions to build an attrition model, attrition_model  and now you want to reach out to the customers who are most likely to leave.  The problem is introduced by the fact that not all of the acquired banks were consistent about the information they gathered and stored and you predictive model relies heavily on several key attributes, including most importantly, annual income.  So what do you do?  Easy!  You build another model, estim_income and nest that model inside the ODM  attrition_model  when scoring.  See this example where we select the 10 customers who are most likely to attrite based solely on: age, gender, annual_income, and zipcode.  In addition, since annual_income is often missing, perform null/missing value imputation for the annual_income attribute using all of the customer demographics.SELECT * FROM (  SELECT cust_name, cust_contact_info,         rank() over (ORDER BY      PREDICTION_PROBABILITY(attrition_model, 'attrite'       USING age, gender, zipcode,         NVL(annual_income,              PREDICTION(estim_income USING *))            as annual_income) DESC) as cust_rank    FROM customers)WHERE cust_rank < 11;   As the first model scores, the second model estim_income performs Oracle's null/missing value imputation for the annual_income attribute using all of the customer demographics/  Voilà! A complex multi-model problem made easy with Oracle Data Mining and its 12 in-database data mining functions combined with the power of SQL and the Oracle Database.

Let's suppose that you work at a bank that has been on a acquision spree acquiring several smaller banks.  That's great, but now you need to reach out to these new customers with proactive marketing...

Fraud and Anomaly Detection Made Simple

Here is a quick and simple application for fraud and anomaly detection.  To replicate this on your own computer, download and install the Oracle Database 11g Release 1 or 2.  (See http://www.oracle.com/technology/products/bi/odm/odm_education.html for more information).  This small application uses the Automatic Data Preparation (ADP) feature that we added in Oracle Data Mining 11g.  Click here to download the CLAIMS data table.  [Download the .7z file and save it somwhere, unzip to a .csv file and then use SQL Developer data import wizard to import the claims.csv file into a table in the Oracle Database.] First, we instantiate the ODM settings table to override the defaults.  The default value for Classification data mining function is to use our Naive Bayes algorithm, but since this is a different problem, looking for anomalous records amongst a larger data population, we want to change that to SUPPORT_VECTOR_MACHINES.  Also, as the 1-Class SVM does not rely on a Target field, we have to change that parameter to "null".  See http://download.oracle.com/docs/cd/B28359_01/datamine.111/b28129/anomalies.htm for detailed Documentation on ODM's anomaly detection. drop table CLAIMS_SET; exec dbms_data_mining.drop_model('CLAIMSMODEL'); create table CLAIMS_SET (setting_name varchar2(30), setting_value varchar2(4000)); insert into CLAIMS_SET values('ALGO_NAME','ALGO_SUPPORT_VECTOR_MACHINES'); insert into CLAIMS_SET values ('PREP_AUTO','ON'); commit; Then, we run the dbms_data_mining.create_model function and let the in-database Oracle Data Mining algorithm run through the data, find patterns and relationships within the CLAIMS data, and infer a CLAIMS data mining model from the data.   begin dbms_data_mining.create_model('CLAIMSMODEL','CLASSIFICATION', 'CLAIMS', 'POLICYNUMBER', null,'CLAIMS_SET'); end; / After that, we can use the CLAIMS data mining model to "score" all customer auto insurance policies, sort them by our prediction_probability and select the top 5 most unusual claims.   -- Top 5 most suspicious fraud policy holder claims select * from (select POLICYNUMBER, round(prob_fraud*100,2) percent_fraud, rank() over (orderby prob_fraud desc) rnk from (select POLICYNUMBER, prediction_probability(CLAIMSMODEL, '0' using *) prob_fraud from CLAIMS where PASTNUMBEROFCLAIMS in ('2 to 4', 'more than 4'))) where rnk <= 5 order by percent_fraud desc; Leave these results inside the database and you can create powerful dashboards using Oracle Business Intelligence EE (or any reporting or dashboard tool that can query the Oracle Database) that multiple ODM's probability of the record being anomalous times (x) the dollar amount of the claim, and then use stoplight color coding (red, orange, yellow) to flag only the more suspicious claims.  Very automated, very easy, and all inside the Oracle Database!

Here is a quick and simple application for fraud and anomaly detection.  To replicate this on your own computer, download and install the Oracle Database 11g Release 1 or 2. ...

Oracle

Integrated Cloud Applications & Platform Services