X

News and Views: Drive Smart Decisions with Cloud Analytics, Machine Learning and More

Retrieve Valuable Machine Learning Model Info - Part I

Philippe Lions
Senior Director

Crafting machine learning algorithms is the first step toward automatically creating data models you can use to answer business questions. But being able to retrieve information from those machine learning models will help you save time and speed up your processes.

Oracle Database offers powerful machine learning algorithms that enable data analysts to discover hidden patterns and insights. With Oracle Machine Learning (OML), you can build and apply predictive models inside the database that help predict customer behavior, identify cross-selling opportunities, build customer profiles, and detect anomalies. These algorithms are implemented as SQL functions that can be directly invoked inside the database without moving the data and thereby leveraging the inherent strengths of the database.

OML offers a broad range of in-database algorithms to solve different kinds of business problems. For the classification type of problem, there are algorithms like Naïve Bayes, Decision Tree, and Support Vector Machine. For clustering, there are Enhanced K Means, O Clusters, and so on.

For every machine learning model that is created in the database, different kinds of information related to the model are stored in several specific DB Tables and Views within the database.

Subscribe to the Oracle Analytics Advantage blog and get the latest posts sent to your inbox


In this first of two blogs, let's start by creating a simple OML model and explore the various mining model views that are created with the model in the database and how these can be leveraged to get valuable information about the model itself.

Sample Problem 

Let's assume we are given a demographic data set about a set of customers and we would like to predict the customer response to an affinity card program using classification function based on a Decision Tree algorithm. This dataset contains 3,000 records with customer attributes like cust_gender, cust_marital_status, education, occupation, household_size, and so on. The grain of this dataset is the customer's ID, so every record has a unique cust_id.

Machine Learning Model Info

 

Create a Machine Learning Model

Let's create a classification model using the Decision Tree algorithm to address this problem. The database offers several options to enhance the overall accuracy of a model like how to handle missing value for predictors, outlier treatment for predictors, a method for binning high cardinality data, and so on. For the purpose of this blog,  let us use the basic minimum steps to build a Decision Tree model.

The default mining algorithm used to build a classification model is Naïve Bayes. In order to build a Decision Tree classification model, we need to first create and populate a settings table and use this settings table as input to the model-building procedure.

Here is the sample syntax to create and populate a model settings table

CREATE TABLE dt_sh_sample_settings

(setting_name  VARCHAR2(30),

setting_value VARCHAR2(4000));

 

                  INSERT INTO dt_sh_sample_settings VALUES

    (dbms_data_mining.algo_name, dbms_data_mining.algo_decision_tree);

 

Here is the sample syntax to create a Classification model using the Decision Tree algorithm.

                BEGIN

                DBMS_DATA_MINING.CREATE_MODEL(

model_name          => 'DT_TEST,

mining_function     => dbms_data_mining.classification,

data_table_name     => 'mining_data_build_v',

case_id_column_name => 'cust_id',

target_column_name  => 'affinity_card',

settings_table_name => 'dt_sh_sample_settings');

END;

 

Query Model Related Information

Every time a machine learning model is built, there are several details about the model that are stored in a few data mining related views. The following section describes the important mining views in the database that can be leveraged to understand the model details. The information from these views can be easily visualized using Oracle Analytics Cloud.

a)       ALL_MINING_MODELS – This view in the database describes all the mining models accessible to the current user. It has details about the model owner, model_name, mining_function used in the model algorithm (Classification, Regression, Clustering, etc.), used to build the model (Naïve Bayes, Decision Tree, etc.), model creation date, model size, and so on.  This view has one record for every OML model created.
There are two related views: 
             i) DBA_MINING_MODELS that describes all the mining models in the database.
             ii) USER_MINING_MODELS  that describes the models owned by the current user.

Information in this view can be visualized within Oracle Analytics Cloud by creating a dataset sourcing from this view.

Important Note: When you navigate to the schema in the data set creation screen, these data mining views are not listed as objects. However, you can enter a custom SQL similar to 'Select * from ALL_MINING_MODELS' to retrieve the model details.

 

Machine Learning Model Info

 

Here's an example of the output of the ALL_MINING_MODELS view related to the model called DT_TEST.

Machine Learning Model Info

 

b)       ALL_MINING_MODEL_ATTRIBUTES – This is an important view and contains information about the columns in the training data that were used to build the model. For each attribute, details about the attribute name, data type, whether the attribute is Target or another attribute is available in this view. When this model is applied on a data set for scoring, columns with the same names and data types should be available in the scoring data set. In this case, as well, there are two related views DBA_MINING_MODEL_ATTRIBUTES and USER_MINING_MODEL_ATTRIBUTES. 

Here's a sample output of this view for a model called DT_TEST as visualized within Oracle Analytics Cloud. 
 

Machine Learning Model Info

 

c)     ALL_MINING_MODEL_SETTINGS – This view describes the settings of the mining model. Here's an example of the model settings for a model called DT_TEST visualized using Oracle Analytics Cloud.

 

Machine Learning Model Info

 

d)       ALL_MINING_MODEL_VIEWS - This is another important view that provides a slightly different kind of information. Every time a machine learning model is created in the database, there are several underlying model views that get created relevant to this model. The number and kind of these model views created differ from one algorithm to another. A machine learning model built using the Naïve Bayes algorithm will have a set of related model views. A machine learning model built using Decision Tree will have another set of related model views created and so on. For example, consider a Decision Tree model called DT_TEST. The following graphic shows the model views created:              

Machine Learning Model Info


Note that the view names have a fixed format. It begins with DM$V followed by an alphabet that is specific to the model view followed by the model name. DM$V<Alphabet><Model Name>. 

We will refer to the details on these specific model views for various mining algorithms in an upcoming Part II blog entry on this topic. 

Conclusion

For every machine learning model that is created in the Oracle Database, there are a lot of metadata about the model available as part of the dictionary views. If you are interested in knowing more about this topic, kindly refer to Oracle Documentation, or stay tuned for the second part of this blog entry coming soon.

To learn how you can benefit from Oracle Analytics, visit Oracle.com/analytics, and follow us on Twitter @OracleAnalytics.

Be the first to comment

Comments ( 0 )
Please enter your name.Please provide a valid email address.Please enter a comment.CAPTCHA challenge response provided was incorrect. Please try again.