Thursday Jun 17, 2010
Friday Jun 11, 2010
By Charlie Berger, Advanced Analytics-Oracle on Jun 11, 2010
Thursday Jun 03, 2010
By Mark Hornick-Oracle on Jun 03, 2010
Thursday May 27, 2010
By Charlie Berger, Advanced Analytics-Oracle on May 27, 2010
Wednesday May 19, 2010
New Communications Industry Data Model with "Factory Installed" Predictive Analytics using Oracle Data Mining
By Charlie Berger, Advanced Analytics-Oracle on May 19, 2010
Monday May 17, 2010
By Mark Hornick-Oracle on May 17, 2010
Tuesday Apr 13, 2010
By Mark Hornick-Oracle on Apr 13, 2010
Tuesday Apr 06, 2010
By Charlie Berger, Advanced Analytics-Oracle on Apr 06, 2010
Updated (Sept. 10, 2010) Full Article http://www.sail-world.com/USA/Americas-Cup:-Oracle-Data-Mining-supports-crew-and-BMW-ORACLE-Racing/68834
Gathering and Mining Sailing Data
From the drag-resistant hull to its 23-story wing sail, the BMW Oracle USA trimaran is a technological marvel. But to learn to sail it well, the crew needed to review enormous amounts of reliable data every time they took the boat for a test run. Burns and his team collected performance data from 250 sensors throughout the trimaran at the rate of 10 times per second. An hour of sailing alone generates 90 million data points.
BMW Oracle Racing turned to Oracle Data Mining in Oracle Database 11g to extract maximum value from the data. Burns and his team reviewed and shared raw data with crew members daily using a Web application built in Oracle Application Express (Oracle APEX).
"Someone would say, 'Wouldn't it be great if we could look at some new combination of numbers?' We could quickly build an Oracle Application Express application and share the information during the same meeting," says Burns.
Analyzing Wind and Other Environmental Conditions
Burns then streamed the data to the Oracle Austin Data Center, where a dedicated team tackled deeper analysis. Because the data was collected in an Oracle Database, the Data Center team could dive straight into the analytics problems without having to do any extract, transform, and load processes or data conversion. And the many advanced data mining algorithms in Oracle Data Mining allowed the analytics team to build vital performance analytics. For example, the technology team could remove masking elements such as environmental conditions to give accurate data on the best mast rotation for certain wind conditions.
Without the data mining, Burns says the boat wouldn't have run as fast. "The design of the boat was important, but once you've got it designed, the whole race is down to how the guys can use it," he says. "With Oracle database technology we could compare the incremental improvements in our performance from the first day of sailing to the very last day. With data mining we could check data against the things we saw, and we could find things that weren't otherwise easily observable and findable."
Wednesday Mar 31, 2010
By Mark Hornick-Oracle on Mar 31, 2010
Tuesday Mar 23, 2010
By Mark Hornick-Oracle on Mar 23, 2010
Monday Mar 08, 2010
Friday Feb 26, 2010
By Mark Hornick-Oracle on Feb 26, 2010
By Mark Hornick-Oracle on Feb 26, 2010
Wednesday Feb 24, 2010
Thursday Feb 18, 2010
By Charlie Berger, Advanced Analytics-Oracle on Feb 18, 2010
The America's Cup has been away from U.S. shores for 15 years, the longest drought since 1851. With the challenge of squeezing out every micro-joule of energy from the wind and with the goal of maximizing "velocity made good", the BMW Oracle Racing Team turned to Oracle Data Mining.
"Imagine standing under an avalanche of data - 2500 variables, 10 times per second and a sailing team demanding answers to design and sailing variations immediately. This was the challenge facing the BMW ORACLE Racing Performance Analysis Team every sailing day as they refined and improved their giant 90 foot wide, 115 foot long trimaran sporting the largest hard-sail wing ever made. Using ORACLE DATA MINING accessing an ORACLE DATABASE and presenting results real time using ORACLE APPLICATION EXPRESS the performance team managed to provide the information required to optimise the giant multihull to the point that it not only beat the reigning America's Cup champions Alinghi in their giant Catamaran but resoundingly crushed them in a power display of high speed sailing. After two races - and two massive winning margins - the America's Cup was heading back to America - a triumph for the team, ORACLE and American technology."
--Ian Burns, Performance Director, BMW ORACLE Racing Team
Visit the http://www.sail-world.com/USA/Americas-Cup:-Oracle-Data-Mining-supports-crew-and-BMW-ORACLE-Racing/68834 for pictures, videos and full information.
Wednesday Feb 17, 2010
By Mark Hornick-Oracle on Feb 17, 2010
Thursday Feb 11, 2010
Wednesday Feb 10, 2010
By Charlie Berger, Advanced Analytics-Oracle on Feb 10, 2010
Sunday Feb 07, 2010
By Charlie Berger, Advanced Analytics-Oracle on Feb 07, 2010
Friday Jan 29, 2010
By Mark Hornick-Oracle on Jan 29, 2010
Text mining is a hot topic, especially for document clustering. Say you have a potentially large set of documents that you'd like to sort into some number of related groups. Sometimes it is enough to know which documents are in the same group (or cluster) and be able to assign new documents to the existing set of groups. However, you may also want a description of the clusters to help understand what types of documents are in those clusters. Automatically generating cluster names would be much easier than examining cluster centroids or reading a sample of documents in each cluster.
Oracle Data Mining supports this use case and below is a script that generates cluster names from a clustering model.
To use this script, you first need a clustering model and a text mapping table. These are easily produced using the Oracle Data Miner graphical user interface to automatically transform the data and then build the model. To get started, provide a data table with two columns: a numeric id column and a VARCHAR2 column containing the document text.
Here are a few key screen captures to guide you. I'm using a dataset from Oracle Open World that includes all the session text (title and abstract concatenated). By the way, this session document clustering was part of the process for producing the Session Recommendation Engine for Oracle Open World 2008 and 2009.
In Oracle Data Miner, start a build activity for clustering using k-Means. Then, select the dataset and the unique identifier, and click Next. (Click images to enlarge.)
Check the SESSION_TEXT attributes as "input" and change the "mining type" to "text."
Click advanced settings at the end of the wizard to reveal settings
you can tailor. Since we have a single TEXT column, click on the tabs for "Outlier
Treatment," "Missing Values," and "Normalize" and disable each step by clicking
the box in the upper left-hand corner. Whereas these are often necessary for
k-Means, our single text column and text transformation eliminate the need these.
Clicking the "Text" tab, you may specify various text-specific settings. For example, you may have a custom stopword list or lexer that you want to use, as shown below.
Clicking the "Feature Extraction" sub-tab allows you to
specify maximum number of terms to represent each document and the maximum
number of terms to represent all documents.
Click the "Build" tab to specify the number of clusters
(groups) you want to have. For text, we recommend the "cosine" distance
function. Depending on your needs, you may want to specify the split criterion
to "size" to have clusters of more equal size. For a better model, set maximum
interactions to 20.
Oracle Data Miner now generates an activity that performs
the text transformation and model building.
To obtain the model name from the Build step, copy the text next to "Model Name." To obtain the mapping table, click the "Output Data" link under the Text step. Click the "Mapping Data" link and copy the name of the table at the top of the window.
Now, you're nearly ready to invoke the following script to generate the cluster names.
Create a table like CLUSTER_NAME_MAP below to store the
results. Then, replace the model name used below ('SESSION09_PRE92765_CL')
with your model name, and the mapping table name used below (DM4J$VSESSION09_710479489)
with your mapping table name.
create table cluster_name_map (model_name VARCHAR(40),
Run this script on your model and table. Look below to see some sample output from the Open World session data. (Note that some columns are included in the script below, even though not required, to highlight data available in the model.)
CURSOR ClusterLeafIds IS
--Obtain leaf clusters
SELECT CLUSTER_ID, RECORD_COUNT
SELECT distinct clus.ID AS CLUSTER_ID,
CASE WHEN chl.id IS NULL THEN 'YES'
ELSE 'NO' END IS_LEAF
FROM (SELECT *
FROM TABLE(dbms_data_mining.get_model_details_km('SESSION09_PRE92765_CL'))) clus,
ORDER BY cluster_id;
FOR c IN ClusterLeafIds LOOP
INSERT INTO cluster_name_map (model_name, cluster_name,
SELECT 'SESSION09_PRE92765_CL' model_name, cluster_name,
c.cluster_id cluster_id, c.record_count record_count
SELECT id, term || '-' ||
LEAD(term, 1) OVER (ORDER BY id) || '-' ||
LEAD(term, 2) OVER (ORDER BY id) || '-' ||
LEAD(term, 3) OVER (ORDER BY id) || '-' ||
LEAD(term, 4) OVER (ORDER BY id) cluster_name
SELECT id, text term, centroid_mean
SELECT rownum id, a.*
SELECT cd.attribute_subname term,
FROM TABLE(dbms_data_mining.get_model_details_km('SESSION09_PRE92765_CL', c.cluster_id, null, 1, 0, 0, null)) ) a,
order by cd.mean desc) a
WHERE rownum < 6) x,
ORDER BY centroid_mean
Each cluster name is the concatenation of the top 5 terms (words with the highest ranking centroid values) that represent the cluster. The the image below, the second column is the cluster id, and the third column is the count of documents assigned to that cluster.
Cluster names can also be assigned to the model clusters directly in the model.
Assigning cluster names and the advanced SQL in the script will be covered in future blog posts.
Thursday Jan 28, 2010
Saturday Jan 23, 2010
By Charlie Berger, Advanced Analytics-Oracle on Jan 23, 2010
Monday Jan 18, 2010
By Charlie Berger, Advanced Analytics-Oracle on Jan 18, 2010
Here is a quick and simple application for fraud and anomaly detection. To replicate this on your own computer, download and install the Oracle Database 11g Release 1 or 2. (See http://www.oracle.com/technology/products/bi/odm/odm_education.html for more information). This small application uses the Automatic Data Preparation (ADP) feature that we added in Oracle Data Mining 11g. Click here to download the CLAIMS data table. [Download the .7z file and save it somwhere, unzip to a .csv file and then use SQL Developer data import wizard to import the claims.csv file into a table in the Oracle Database.]
First, we instantiate the ODM settings table to override the defaults. The default value for Classification data mining function is to use our Naive Bayes algorithm, but since this is a different problem, looking for anomalous records amongst a larger data population, we want to change that to SUPPORT_VECTOR_MACHINES. Also, as the 1-Class SVM does not rely on a Target field, we have to change that parameter to "null". See http://download.oracle.com/docs/cd/B28359_01/datamine.111/b28129/anomalies.htm for detailed Documentation on ODM's anomaly detection.
drop table CLAIMS_SET;
create table CLAIMS_SET (setting_name varchar2(30), setting_value varchar2(4000));
insert into CLAIMS_SET values ('ALGO_NAME','ALGO_SUPPORT_VECTOR_MACHINES');
insert into CLAIMS_SET values ('PREP_AUTO','ON');
Then, we run the dbms_data_mining.create_model function and let the in-database Oracle Data Mining algorithm run through the data, find patterns and relationships within the CLAIMS data, and infer a CLAIMS data mining model from the data.
'CLAIMS', 'POLICYNUMBER', null, 'CLAIMS_SET');
After that, we can use the CLAIMS data mining model to "score" all customer auto insurance policies, sort them by our prediction_probability and select the top 5 most unusual claims.
-- Top 5 most suspicious fraud policy holder claims
select * from
(select POLICYNUMBER, round(prob_fraud*100,2) percent_fraud,
rank() over (order by prob_fraud desc) rnk from
(select POLICYNUMBER, prediction_probability(CLAIMSMODEL, '0' using *) prob_fraud
where PASTNUMBEROFCLAIMS in ('2 to 4', 'more than 4')))
where rnk <= 5
order by percent_fraud desc;
Leave these results inside the database and you can create powerful dashboards using Oracle Business Intelligence EE (or any reporting or dashboard tool that can query the Oracle Database) that multiple ODM's probability of the record being anomalous times (x) the dollar amount of the claim, and then use stoplight color coding (red, orange, yellow) to flag only the more suspicious claims. Very automated, very easy, and all inside the Oracle Database!
By Charlie Berger, Advanced Analytics-Oracle on Jan 18, 2010
Thursday Jan 07, 2010
Everything about Oracle Data Mining, a component of the Oracle Advanced Analytics Option - News, Technical Information, Opinions, Tips & Tricks. All in One Place
- Links to Presentations: BIWA Summit'16 - Big Data + Analytics User Conference Jan 26-28, @ Oracle HQ Conference Center
- BIWA's Got Talent YouTube Demo Contest! - Enter and Win $500!!!
- NHS Business Services Authority Gains Better Insight into Data, Identifies circa GBP100 Million (US$156 Million) in Potential Savings in Just Three Months
- BIWA 2016 Here are some of our early accepted presentations!!
- Oracle Advanced Analytics at Oracle Open World 2015
- Oracle Advanced Analytics Oracle University (OU) Classes in Cambridge, MA. September 28-Oct. 1, 2015
- Big Data Analytics with Oracle Advanced Analytics: Making Big Data and Analytics Simple white paper
- 2015 BIWA SIG Virtual Conference - Two Days of "Live" Talks by Experts - FREE
- Call for Abstracts at BIWA Summit'16 - The Oracle Big Data + Analytics User Conference
- Oracle Data Miner 4.1, SQL Developer 4.1 Extension Now Available!