By Charlie Berger, Advanced Analytics-Oracle on Jan 18, 2010
Here is a quick and simple application for fraud and anomaly detection. To replicate this on your own computer, download and install the Oracle Database 11g Release 1 or 2. (See http://www.oracle.com/technology/products/bi/odm/odm_education.html for more information). This small application uses the Automatic Data Preparation (ADP) feature that we added in Oracle Data Mining 11g. Click here to download the CLAIMS data table. [Download the .7z file and save it somwhere, unzip to a .csv file and then use SQL Developer data import wizard to import the claims.csv file into a table in the Oracle Database.]
First, we instantiate the ODM settings table to override the defaults. The default value for Classification data mining function is to use our Naive Bayes algorithm, but since this is a different problem, looking for anomalous records amongst a larger data population, we want to change that to SUPPORT_VECTOR_MACHINES. Also, as the 1-Class SVM does not rely on a Target field, we have to change that parameter to "null". See http://download.oracle.com/docs/cd/B28359_01/datamine.111/b28129/anomalies.htm for detailed Documentation on ODM's anomaly detection.
drop table CLAIMS_SET;
create table CLAIMS_SET (setting_name varchar2(30), setting_value varchar2(4000));
insert into CLAIMS_SET values ('ALGO_NAME','ALGO_SUPPORT_VECTOR_MACHINES');
insert into CLAIMS_SET values ('PREP_AUTO','ON');
Then, we run the dbms_data_mining.create_model function and let the in-database Oracle Data Mining algorithm run through the data, find patterns and relationships within the CLAIMS data, and infer a CLAIMS data mining model from the data.
'CLAIMS', 'POLICYNUMBER', null, 'CLAIMS_SET');
After that, we can use the CLAIMS data mining model to "score" all customer auto insurance policies, sort them by our prediction_probability and select the top 5 most unusual claims.
-- Top 5 most suspicious fraud policy holder claims
select * from
(select POLICYNUMBER, round(prob_fraud*100,2) percent_fraud,
rank() over (order by prob_fraud desc) rnk from
(select POLICYNUMBER, prediction_probability(CLAIMSMODEL, '0' using *) prob_fraud
where PASTNUMBEROFCLAIMS in ('2 to 4', 'more than 4')))
where rnk <= 5
order by percent_fraud desc;
Leave these results inside the database and you can create powerful dashboards using Oracle Business Intelligence EE (or any reporting or dashboard tool that can query the Oracle Database) that multiple ODM's probability of the record being anomalous times (x) the dollar amount of the claim, and then use stoplight color coding (red, orange, yellow) to flag only the more suspicious claims. Very automated, very easy, and all inside the Oracle Database!