X

EMEA A&C CCOE Partner Technology Cloud Engineering

More value for your APEX application with Oracle Machine Learning in Autonomous Databases (Part I)

Evgeny Pleskach
Cloud Adoption & Implementation Consultant

In the past few years Oracle Application Express has been established as a low code platform  with many organisations actively using this today, to build incredible apps and solve real problems.

Oracle Application Express is a great and valuable development tool available as part of the Oracle Databases for many years now. For over a year now, Oracle APEX is also available pre-installed and pre-configured with Oracle Autonomous Database, delivering a new quality of business application development and deployments.

At the same time, Autonomous Databases are now equipped with Oracle Machine Learning tools, with customers and partners around the world taking advantage of Oracle’s in-database capabilities to solve complex and important data-driven problems. Oracle Machine Learning accelerates the creation of machine learning models for data scientists by eliminating the need to move data to dedicated machine learning systems and free Apache Zeppelin Notebooks to make data scientist's job more effective and productive.

APEX on Autonomous Database with Machine LearningBut there you have a great new opportunity if you bring both tools together. Based on Autonomous Database, APEX developers and solution providers can combine with the Machine Learning functionality into their APEX application and create new value for their customers.

In this blog post, I would like to make simple “CRM”-style APEX application based on customer table powered with classification Machine Learning model. The actual machine learning logic takes place in the database and the APEX application will pick up the predictions. As an example, we will apply machine learning on a sales scenario. We will predict which customers are most likely to buy a certain product. The product called "Y Box Games". Our fictional company has been successful in selling this product so far. Although some of our customers already own this product, we believe we could do much better. The problem is that it is very difficult to understand what type of customer is interested in it. We are going to use machine learning to find right customers.

In first part of my blog post, I am going to make historical sale data analysis based on Classification Machine Learning model. I assume that you have already provisioned Oracle Autonomous Database (ATP or ADW), created Machine Learning users and initiated the APEX environment where we will be creating a new APEX workspace for our APEX application. Further instructions on all of the above can be found on the Documentation https://docs.oracle.com/en/cloud/paas/autonomous-data-warehouse-cloud/index.html and remember you can always use the Oracle Autonomous Database in Cloud Free Tier.

Now, on your Autonomous Database, from “Service Console” you have to login as a Machine Learning user. (Note: Oracle ML User Administration is also under Tools" or you can use it from “Service Console”. )  On the Service Console under "Development", access the Oracle Machine Learning Notebooks environment.
 

From ML user menu, select Notebooks and then create a new Notebook. Name the new notebook "Predict Y Box Games".

Now, we have to build a Machine learning model and letting it learn from existing customers.

Review the following table with customer information that we will use as input data for the model:

SELECT * FROM SH.SUPPLEMENTARY_DEMOGRAPHICS

See the column "Y_BOX_GAMES", which tells us whether the customer already owns Y Box Games. In addition, we see many other attributes of which we believe they may have some influence on whether a customer owns Y Box Games. For example, the level of education might be an influencing factor for Y Box Games ownership, and so may Occupation, Household Size, etc

The magic of machine learning is that it will find out exactly what the relationships are between these variables and our target variable, Y Box Games.

At this step, split the input data into two sets: 60% for training and 40% for testing.

 

CREATE TABLE N1_TRAIN_DATA AS SELECT * FROM SH.SUPPLEMENTARY_DEMOGRAPHICS SAMPLE (60) SEED(1);

CREATE TABLE N1_TEST_DATA AS SELECT * FROM SH.SUPPLEMENTARY_DEMOGRAPHICS MINUS SELECT * 
FROM N1_TRAIN_DATA;

The model will contain the definition of the relationship between the driving attributes and the target attribute (Y Box Games). Creating those relationships done during the training phase. Defining a model requires several parameters. We first store those parameters in a table. This table can have any name. In our case, the only parameter is the type of algorithm, in this case, a decision tree model.

Enter the following SQL to create the parameters table..

CREATE TABLE N1_BUILD_SETTINGS (SETTING_NAME VARCHAR2(30), SETTING_VALUE VARCHAR2(4000));

INSERT INTO N1_BUILD_SETTINGS (SETTING_NAME, SETTING_VALUE) VALUES ('ALGO_NAME', 'ALGO_DECISION_TREE');

Now we are ready to create and train the model. Run the following PL/SQL to do this.

%script 
CALL DBMS_DATA_MINING.CREATE_MODEL('N1_CLASS_MODEL', 'CLASSIFICATION', 
  'N1_TRAIN_DATA', 'CUST_ID', 'Y_BOX_GAMES', 'N1_BUILD_SETTINGS');

The parameters mean the following :

  1. The name that our model will have. This is stored in the database as a special type of object.

  2. Whether it's a classification or a regression algorithm. In this case, it's a classification algorithm, because we're predicting a class (ownership yes/no) rather than a continuous value.

  3. The name of the table to use for training. We created this earlier when we split the dataset.

  4. The name of the column that uniquely identifies each row in the training data. In this case the customer ID.

  5. The name of the target column that we want to find the driving factors for, Y Box Games.

  6. Lastly, the name of the hyperparameters table. In this case, it only contains a parameter with the type of model (decision tree).

The result:

https://github.com/oracle/cloudtestdrive/raw/master/DataScience/lab400/images/create_model.png

Validate the results and the quality of the model

We would like to know in what percentage of the cases, the model makes a correct prediction of Y Box Games ownership. This is where the test set, that we created earlier, comes in handy. Since the test set contains real customers, we know whether they actually own Y Box Games. We will verify the performance by letting our model predict Y Box Games for those same records. This will allow us to verify if the predicted value of Y Box Games is the same as the actual value.

First, create a new placeholder column in the test set that will hold the predicted value.

ALTER TABLE N1_TEST_DATA ADD Y_BOX_GAMES_PRED NUMBER(1);

Next, actually, make the prediction.

UPDATE N1_TEST_DATA SET Y_BOX_GAMES_PRED = PREDICTION(N1_CLASS_MODEL USING *);https://github.com/oracle/cloudtestdrive/raw/master/DataScience/lab400/images/run_prediction_on_testset.png

You see that this uses special SQL syntax. The above means that we want to predict the value using the model 'N1_CLASS_MODEL' and all of the driving columns in the dataset will be used.

Let's see the result:

SELECT CUST_ID, Y_BOX_GAMES, Y_BOX_GAMES_PRED FROM N1_TEST_DATA;https://github.com/oracle/cloudtestdrive/raw/master/DataScience/lab400/images/review_predicted_values.png

Let's see in what percentage of cases our prediction is correct.

SELECT TO_CHAR(((SELECT COUNT(*) FROM N1_TEST_DATA WHERE Y_BOX_GAMES = Y_BOX_GAMES_PRED) 
                / (SELECT COUNT(*) FROM N1_TEST_DATA)) * 100, '999.99') CORRECT_PRED_PERCENTAGE FROM DUAL;
https://github.com/oracle/cloudtestdrive/raw/master/DataScience/lab400/images/check_percentage_correct.png

The result has an accuracy of about 90%.

We can look into this number in more detail with a confusion matrixThis can easily be created by grouping on the two Y Box Games columns.

SELECT Y_BOX_GAMES, Y_BOX_GAMES_PRED, COUNT(*) 
FROM N1_TEST_DATA GROUP BY Y_BOX_GAMES, Y_BOX_GAMES_PRED ORDER BY 1, 2; 

https://github.com/oracle/cloudtestdrive/raw/master/DataScience/lab400/images/create_confusion_matrix.png

We see, from top to bottom: 1. The true negatives, 2. The false positives, 3. The false negatives and 4. The true positives.

Run the prediction

So far, we have built and validated the model. You can run the prediction in two ways:

  1. In batch. Make a prediction for ALL customers at a certain point in time.
  2. Case by case. Make a prediction per customer when it is needed. For example when opening, the customer details screen.

In our case, we are going to do this in batch.

We select all the customers that don't have Y Box Games yet, then predict whether they are likely to be interested in a purchase.

CREATE TABLE CUST_PREDICTION AS 
  SELECT CUST_ID, PREDICTION(N1_CLASS_MODEL USING *) PREDICTION, 
  PREDICTION_PROBABILITY(N1_CLASS_MODEL USING *) PRED_PROBABILITY 
  FROM SH.SUPPLEMENTARY_DEMOGRAPHICS WHERE Y_BOX_GAMES = 0;

https://github.com/oracle/cloudtestdrive/raw/master/DataScience/lab400/images/create_prediction.png

Note that we could go a step further and schedule this prediction, but this is not part of this guide at the moment.


On the next part of this blog post we will take a closer look on how to integrate the results into an APEX application as we discussed, combining the benefits of our existing analysis for our solution users. 

 

Be the first to comment

Comments ( 0 )
Please enter your name.Please provide a valid email address.Please enter a comment.CAPTCHA challenge response provided was incorrect. Please try again.