Building Oracle Machine Learning Models with OML4Py and Deploying in Oracle Analytics Cloud

May 28, 2024 | 8 minute read
Ravi Bhuma
Principal Solutions Architect, Oracle Analytics
Text Size 100%:

REDWOOD

Oracle Analytics Cloud (OAC) offers various approaches to integrate machine learning models, ensuring flexibility and efficiency in your data analysis processes. The three primary approaches are:

  1. Oracle Analytics Cloud Predictive Models: Train, test, and apply models in OAC.
  2. Oracle Machine Learning Models: Train and test models in the database and apply them in OAC.
  3. OCI Machine Learning Models: Use prebuilt models and apply them in OAC.

Each approach offers unique benefits and tools. This article guides you through the second approach (Oracle Machine Learning Models). Specifically, how to use Oracle Machine Learning for Python (OML4Py) to train and test models in the database and deploy them in OAC.
 

OML4Py

 

What's the Objective?

This blog describes developing and deploying a machine learning model for anomaly detection. Anomaly detection is essential in various fields such as finance, manufacturing, and IT for identifying rare events or observations that deviate significantly from the norm. The primary challenge in anomaly detection is the lack of labeled training data, which makes supervised learning approaches less effective.

The Example Case 

The blog uses a dataset of insurance claims to identify fraudulent claims. Fraudulent claims are anomalies in this dataset, as they’re rare compared to most legitimate claims. Since there isn't labeled data indicating which claims are fraudulent or sufficient data to train a model, it's necessary to use an unsupervised learning approach.


Choosing the Right Algorithm

The blog illustrates using the Support Vector Machine (SVM) algorithm configured for anomaly detection. SVM is effective in high-dimensional spaces, robust to outliers, and flexible with different kernel functions. It’s well-suited for scenarios where the data has a clear margin of separation between normal and anomalous observations. Additionally, SVMs are scalable for large datasets and seamlessly integrate with Oracle Machine Learning, optimizing performance in Oracle Autonomous Data Warehouse (ADW).

Getting Started

To run Python commands in a notebook, you must use the Python interpreter. This occurs when you run your first %python paragraph. To use OML4Py, you must import the oml module, which automatically establishes a connection to your database.

Using the default interpreter bindings, OML Notebooks automatically establishes a database connection for the notebook. To verify the Python interpreter has established a database connection through the oml module, run the command:

%python
oml.isconnected()

This command returns True when your notebook is connected.

 

Part 1 Train the Model

Step 1: Set Up the Environment

Ensure you have the necessary libraries and an active connection to your Oracle ADW instance.

%python
import pandas as pd
import oml
  • Explanation:
    • pandas is used for data manipulation.
    • oml is the Oracle Machine Learning library for Python, allowing seamless interaction with Oracle ADW.

Step 2: Load the Dataset

Load the dataset from an existing table INSURANCECLAIM in Oracle ADW.

Note: The 'INSURANCECLAIM' file contains sample data to create a table in Oracle ADW.

%python
dataset = oml.sync(table='INSURANCECLAIM')
print(dataset.head())
  • Explanation:
    • oml.sync(table='INSURANCECLAIM'): Syncs the INSURANCECLAIM table from Oracle ADW to a Python DataFrame.
    • print(dataset.head()): Displays the first few rows of the dataset to verify successful loading.


Step 3: Split the Dataset

Split the dataset into training and testing sets.

%python

train, test = dataset.split(ratio=(0.8, 0.2), hash_cols='POLICYNUMBER', seed=32)
  • Explanation:
    • dataset.split(ratio=(0.8, 0.2), hash_cols='POLICYNUMBER', seed=32): Splits the dataset into training (80%) and testing (20%) sets using the POLICYNUMBER column for consistent splitting and a seed for reproducibility.

 

Step 4: Save Training and Testing Data to Separate Tables

Save the training and testing datasets to separate tables.

%python

# Pull training and testing datasets to Pandas DataFrames
train_df = train.pull()
test_df = test.pull()

# Save the training dataset to a new table in the database
train_df.to_sql('INSURANCE_CLAIM_TRAIN', con=oml.connection(), if_exists='replace', index=False)
# INSURANCE_CLAIM_TRAIN: Contains the training data used to train the model.

# Save the testing dataset to a new table in the database
test_df.to_sql('INSURANCE_CLAIM_TEST', con=oml.connection(), if_exists='replace', index=False)

INSURANCE_CLAIM_TEST: Contains the testing data used to evaluate the model.

# Verify the saved training and testing datasets
print("Training Data:")
print(train_df.head())

print("Testing Data:")
print(test_df.head())

 

Step 5: Set Anomaly Detection Parameters

Define the settings for the SVM model. These settings specify how the model behaves but doesn’t include the model name.

%python

odm_settings = {'svms_outlier_rate': 0.01}
print(odm_settings)
  • Explanation:
    • odm_settings = {'svms_outlier_rate': 0.01}: Specifies that 1% of the data must be treated as outliers.
    • print(odm_settings): Displays the settings dictionary to verify correctness.

 

Step 6: Instantiate the Anomaly Detection Model

Create an instance of the SVM model using the settings defined in odm_settings.

%python

model = oml.svm("anomaly_detection", **odm_settings)
  • Explanation:
    • oml.svm("anomaly_detection", **odm_settings): Initializes the SVM anomaly detection model with the specified settings by unpacking the odm_settings dictionary and passing its items as keyword arguments directly to the function.

 

Step 7: Train the Model

Train the model using the training dataset and assign a name to the trained model.

%python
model.fit(train, None, 'InsuranceClaimDetection')
  • Explanation:
    • model.fit(train, None, 'InsuranceClaimDetection'): Trains the SVM model using the training dataset and names the model 'InsuranceClaimDetection'.
    • print(model): Displays details of the trained model, including settings, computed settings, and attributes.
    • InsuranceClaimDetection: The trained model itself. This is what you register in OAC and use to apply the model and make predictions.

 

Part 2 Test the Model

Step 1: Make Predictions on the Test Dataset

Use the trained model to predict anomalies in the test dataset.

%python

predictions = model.predict(
    test,
    supplemental_cols=test[:, [
        'WEEKOFMONTH', 'DAYOFWEEK', 'MAKE', 'ACCIDENTAREA',
        'DAYOFWEEKCLAIMED', 'MONTHCLAIMED', 'WEEKOFMONTHCLAIMED', 'SEX',
        'MARITALSTATUS', 'FAULT', 'VEHICLECATEGORY', 'VEHICLEPRICE',
        'POLICYNUMBER', 'REPNUMBER', 'DEDUCTIBLE', 'DRIVERRATING',
        'DAYSPOLICYACCIDENT', 'DAYSPOLICYCLAIM', 'PASTNUMBEROFCLAIMS',
        'AGEOFPOLICYHOLDER', 'AGEOFVEHICLE', 'POLICEREPORTFILED',
        'WITNESSPRESENT', 'AGENTTYPE', 'NUMBEROFSUPPLIMENTS',
        'ADDRESSCHANGECLAIM'
    ]],
    proba=True
)
  • Explanation:
    • model.predict(test, supplemental_cols=..., proba=True): Predicts anomalies on the test dataset, including additional columns in the output, and returns the probability of each prediction.


Step 2: Apply the Model to the Entire Dataset and Save Predictions

Use the model to make predictions on the entire dataset and save the predictions to a new table in the database.

# Apply the ML Model to the entire dataset
prediction_dataset = model.predict(
    dataset,
    supplemental_cols=dataset[:, [
        'WEEKOFMONTH', 'DAYOFWEEK', 'MAKE', 'ACCIDENTAREA',
        'DAYOFWEEKCLAIMED', 'MONTHCLAIMED', 'WEEKOFMONTHCLAIMED', 'SEX',
        'MARITALSTATUS', 'FAULT', 'VEHICLECATEGORY', 'VEHICLEPRICE',
        'POLICYNUMBER', 'REPNUMBER', 'DEDUCTIBLE', 'DRIVERRATING',
        'DAYSPOLICYACCIDENT', 'DAYSPOLICYCLAIM', 'PASTNUMBEROFCLAIMS',
        'AGEOFPOLICYHOLDER', 'AGEOFVEHICLE', 'POLICEREPORTFILED',
        'WITNESSPRESENT', 'AGENTTYPE', 'NUMBEROFSUPPLIMENTS',
        'ADDRESSCHANGECLAIM'
    ]],
    proba=True
)

 

# Save predictions in the database
oml_prediction_dataset = prediction_dataset.pull()

# Create a new table in the database to save the predictions
oml_current_prediction_obj = oml.create(oml_prediction_dataset, table='INSURANCE_CLAIMS_PREDICTIONS')


INSURANCE_CLAIMS_PREDICTIONS: Contains the predictions made by the model on the entire dataset or new data. This table stores the results of applying the model.

# Verify the saved predictions
print(oml_current_prediction_obj.head())

Part 3 Apply the Model in OAC

Register Oracle Machine Learning Models in Oracle Analytics

You must register Oracle machine learning models in Oracle Analytics before you can use them to predict data. You can register and use models that reside in your Oracle Database or Oracle Autonomous Data Warehouse data sources. For detailed steps, refer to the Oracle documentation.

Apply the Model in OAC

  1. Register the Model in OAC:
    • In OAC, navigate to the machine learning model management section.
    • Register the model InsuranceClaimDetection. This makes the model available for use in OAC.
  2. Use the Model for Predictions:
    • In OAC, create a data flow.
    • Add a step to use the registered model InsuranceClaimDetection.
    • Apply the model to a new dataset to get predictions.
  3. Analyze Predictions:
    • The output dataset, which contains predictions, can be stored back in the database or used directly in OAC for further analysis and visualization.

By following these steps, you ensure that the model is appropriately used within OAC to make predictions and analyze results.

Conclusion

This blog post demonstrates how to use Oracle Machine Learning for Python (OML4Py) to develop a model for anomaly detection using the Support Vector Machines (SVM) algorithm. The blog describes how to train, test the model in the database and apply the model in Oracle Analytics Cloud. By following these steps, you can seamlessly integrate predictive models into your applications, incorporating machine learning (ML) and artificial intelligence (AI) capabilities without the need for extensive ML or AI expertise.

Call to Action

Now that you have a detailed understanding of how to build and deploy SVM-based anomaly detection models using OML4Py in Oracle Analytics Cloud, it's time to apply this knowledge to your own datasets. Here are a few steps you can take:

  1. Experiment with Different Datasets: Try using different datasets to see how SVMs perform in various scenarios. Adjust the parameters and observe the results.
  2. Optimize Parameters: Fine-tune the svms_outlier_rate and other model parameters to improve the accuracy and reliability of your anomaly detection.
  3. Explore Other Algorithms: While SVMs are powerful, explore other anomaly detection algorithms available in Oracle Machine Learning to find the best fit for your data, such as Isolation Forest, DBSCAN, or Autoencoders.
  4. Share Your Findings:  Connect with other data analysts in the Oracle Analytics Community and share your experiences and insights.

REDWOOD

Ravi Bhuma

Principal Solutions Architect, Oracle Analytics

Oracle Analytics Service Excellence, CEAL Team


Previous Post

Usage Insights for Oracle Analytics Cloud using OCI Logging

Srinivasa Pula | 8 min read

Next Post


Incremental Migration of Catalog Objects in Oracle Analytics

Amarpreet Nagra | 4 min read