You may know that machine learning models can be built using Oracle Autonomous Database, but did you know you could do so using the output of Fusion Applications-based Analytics Publisher (formerly BI Publisher) report? These reports are one of the approved ways of extracting data from Fusion and pull data from various Fusion modules like Finance, Order Management, Supply Chain, etc., making them effective data sources for custom machine learning models for various use cases—including track delays in payments, invoices where follow-up is required, products that could potentially be out of stock etc. This blog demonstrates the steps needed to use Analytics Publisher (AP) reports to build machine learning models.

Note – As a prerequisite, please refer to blog for full details of how to access the Analytics Publisher’s report output and save it as a table in Autonomous Database.

Details about the example AP Report and Data

The report is built using data retrieved from the Oracle Transactional Business Intelligence (OTBI) subject area: Receivables – Payment Schedules Real Time – as shown below. It includes additional formulas to this query along with an extra column — “Followup_Required” – to indicate whether further follow-up is required for the respective transaction; valid values are “Yes” or “No.”

The data used for this report is from a demo pod using Vision Corp data (sample fictitious data provided with Fusion Applications based on a fictitious company called Vision Corp). The actual report output consists of 10 columns, and the details are as follows:

Column Name	Description
Transaction_ID	Unique identifier of the transaction
Aging_Bucket	Defines the overdue bucket as ‘Overdue 1-30 Days’, ‘Overdue 31-60 Days’, ‘Overdue 61-90 Days’ or ‘Overdue 90+ Days’
Days_since_due	Total number of days since the amount is due
Amount_Remaining	Remaining amount to be paid
Expected_Uncollected_amount	Expected amount that is likely not to be collected based on the formula matching business requirements
AR_Overdue_1_30_Amt AR_Overdue_31_60_Amt AR_Overdue_61_90_Amt AR_Overdue_90plus_Amt	Amount due that falls under the respective bucket. As each transaction is unique, the amount would be non-zero for 1 of these 4 buckets and 0 for the remaining buckets
Followup_Required	This column is filled using a formula based on the business requirement, to determine whether further follow-up on this transaction is needed or not. The goal of the machine learning model we are building is to predict this column value.

In our example, we’ll use Logistic Regression to predict “Followup_Required”. As the “Followup_Required” (our target column) is set to “Yes” or “No” mainly based on the respective amount value from various aging buckets and the type of bucket they are in, we’ll drop “Transaction_ID”, “Days_since_due”, “Expected_Uncollected_amount”, “Amount_Remaining” columns. Rest of the columns will be used as features for the prediction.

The following is the sample query used to build the report in Analytics Publisher, in case you want to try in your environment. Please note that this query is based on the OTBI subject area Receivables—Payment Schedules Real Time. As you can see, the column “Followup_Required” is added to this based on the amount due and “Aging_Bucket” (under which bucket the amount is currently due).

SELECT "Receivables - Payment Schedules Real Time"."Payment Schedules Details"."Transaction ID" Transaction_ID,
   CASE WHEN TIMESTAMPDIFF(SQL_TSI_DAY,"Receivables - Payment Schedules Real Time"."Payment Schedules Details"."Due Date",CURRENT_DATE) BETWEEN 1 AND 30 THEN 'Overdue 1-30 Days' WHEN TIMESTAMPDIFF(SQL_TSI_DAY,"Receivables - Payment Schedules Real Time"."Payment Schedules Details"."Due Date",CURRENT_DATE) BETWEEN 31 AND 60 THEN 'Overdue 31-60 Days' WHEN TIMESTAMPDIFF(SQL_TSI_DAY,"Receivables - Payment Schedules Real Time"."Payment Schedules Details"."Due Date",CURRENT_DATE) BETWEEN 61 AND 90 THEN 'Overdue 61-90 Days' WHEN TIMESTAMPDIFF(SQL_TSI_DAY,"Receivables - Payment Schedules Real Time"."Payment Schedules Details"."Due Date",CURRENT_DATE)>90 THEN 'Overdue 90+ Days' ELSE 'Other (Error)' END Aging_Bucket,
   TIMESTAMPDIFF(SQL_TSI_DAY,"Receivables - Payment Schedules Real Time"."Payment Schedules Details"."Due Date",CURRENT_DATE) Days_since_due,
   "Receivables - Payment Schedules Real Time"."Payment Schedules"."Line Entered Amount Remaining" Amount_Remaining,
   CASE WHEN TIMESTAMPDIFF(SQL_TSI_DAY, "Payment Schedules Details"."Due Date", CURRENT_DATE) BETWEEN 1 AND 30 THEN .01 * "Payment Schedules"."Line Entered Amount Remaining" WHEN TIMESTAMPDIFF(SQL_TSI_DAY, "Payment Schedules Details"."Due Date", CURRENT_DATE) BETWEEN 31 AND 60 THEN .03 * "Payment Schedules"."Line Entered Amount Remaining" WHEN TIMESTAMPDIFF(SQL_TSI_DAY, "Payment Schedules Details"."Due Date", CURRENT_DATE) BETWEEN 61 AND 90 THEN .10 * "Payment Schedules"."Line Entered Amount Remaining" WHEN TIMESTAMPDIFF(SQL_TSI_DAY, "Payment Schedules Details"."Due Date", CURRENT_DATE) > 90 THEN .40 * "Payment Schedules"."Line Entered Amount Remaining" ELSE 0 END Expected_Uncollected_amout,
   case (CASE WHEN TIMESTAMPDIFF(SQL_TSI_DAY,"Receivables - Payment Schedules Real Time"."Payment Schedules Details"."Due Date",CURRENT_DATE) BETWEEN 1 AND 30 THEN 'Overdue 1-30 Days' WHEN TIMESTAMPDIFF(SQL_TSI_DAY,"Receivables - Payment Schedules Real Time"."Payment Schedules Details"."Due Date",CURRENT_DATE) BETWEEN 31 AND 60 THEN 'Overdue 31-60 Days' WHEN TIMESTAMPDIFF(SQL_TSI_DAY,"Receivables - Payment Schedules Real Time"."Payment Schedules Details"."Due Date",CURRENT_DATE) BETWEEN 61 AND 90 THEN 'Overdue 61-90 Days' WHEN TIMESTAMPDIFF(SQL_TSI_DAY,"Receivables - Payment Schedules Real Time"."Payment Schedules Details"."Due Date",CURRENT_DATE)>90 THEN 'Overdue 90+ Days' ELSE 'Other (Error)' END) when 'Overdue 1-30 Days' then "Receivables - Payment Schedules Real Time"."Payment Schedules"."Line Entered Amount Remaining" else 0 end AR_Overdue_30_Amount,
   case (CASE WHEN TIMESTAMPDIFF(SQL_TSI_DAY,"Receivables - Payment Schedules Real Time"."Payment Schedules Details"."Due Date",CURRENT_DATE) BETWEEN 1 AND 30 THEN 'Overdue 1-30 Days' WHEN TIMESTAMPDIFF(SQL_TSI_DAY,"Receivables - Payment Schedules Real Time"."Payment Schedules Details"."Due Date",CURRENT_DATE) BETWEEN 31 AND 60 THEN 'Overdue 31-60 Days' WHEN TIMESTAMPDIFF(SQL_TSI_DAY,"Receivables - Payment Schedules Real Time"."Payment Schedules Details"."Due Date",CURRENT_DATE) BETWEEN 61 AND 90 THEN 'Overdue 61-90 Days' WHEN TIMESTAMPDIFF(SQL_TSI_DAY,"Receivables - Payment Schedules Real Time"."Payment Schedules Details"."Due Date",CURRENT_DATE)>90 THEN 'Overdue 90+ Days' ELSE 'Other (Error)' END) when 'Overdue 31-60 Days' then "Receivables - Payment Schedules Real Time"."Payment Schedules"."Line Entered Amount Remaining" else 0 end AR_Overdue_3160_Amount,
   case (CASE WHEN TIMESTAMPDIFF(SQL_TSI_DAY,"Receivables - Payment Schedules Real Time"."Payment Schedules Details"."Due Date",CURRENT_DATE) BETWEEN 1 AND 30 THEN 'Overdue 1-30 Days' WHEN TIMESTAMPDIFF(SQL_TSI_DAY,"Receivables - Payment Schedules Real Time"."Payment Schedules Details"."Due Date",CURRENT_DATE) BETWEEN 31 AND 60 THEN 'Overdue 31-60 Days' WHEN TIMESTAMPDIFF(SQL_TSI_DAY,"Receivables - Payment Schedules Real Time"."Payment Schedules Details"."Due Date",CURRENT_DATE) BETWEEN 61 AND 90 THEN 'Overdue 61-90 Days' WHEN TIMESTAMPDIFF(SQL_TSI_DAY,"Receivables - Payment Schedules Real Time"."Payment Schedules Details"."Due Date",CURRENT_DATE)>90 THEN 'Overdue 90+ Days' ELSE 'Other (Error)' END) when 'Overdue 61-90 Days' then "Receivables - Payment Schedules Real Time"."Payment Schedules"."Line Entered Amount Remaining" else 0 end AR_Overdue_6190_Amount,
   case (CASE WHEN TIMESTAMPDIFF(SQL_TSI_DAY,"Receivables - Payment Schedules Real Time"."Payment Schedules Details"."Due Date",CURRENT_DATE) BETWEEN 1 AND 30 THEN 'Overdue 1-30 Days' WHEN TIMESTAMPDIFF(SQL_TSI_DAY,"Receivables - Payment Schedules Real Time"."Payment Schedules Details"."Due Date",CURRENT_DATE) BETWEEN 31 AND 60 THEN 'Overdue 31-60 Days' WHEN TIMESTAMPDIFF(SQL_TSI_DAY,"Receivables - Payment Schedules Real Time"."Payment Schedules Details"."Due Date",CURRENT_DATE) BETWEEN 61 AND 90 THEN 'Overdue 61-90 Days' WHEN TIMESTAMPDIFF(SQL_TSI_DAY,"Receivables - Payment Schedules Real Time"."Payment Schedules Details"."Due Date",CURRENT_DATE)>90 THEN 'Overdue 90+ Days' ELSE 'Other (Error)' END) when 'Overdue 90+ Days' then "Receivables - Payment Schedules Real Time"."Payment Schedules"."Line Entered Amount Remaining" else 0 end AR_Overdue_90_Amount
FROM "Receivables - Payment Schedules Real Time"
WHERE
((("Payment Schedules"."Line Entered Amount Remaining" > 0) AND (TIMESTAMPDIFF(SQL_TSI_DAY, "Payment Schedules Details"."Due Date", CURRENT_DATE) > 0) AND ("Payment Schedules Details"."Transaction Type" = 'Invoice')))

Tools Used

The example uses OML Notebooks in Autonomous Database along with Python language paragraphs with Oracle Machine Learning for Python (OML4Py). Conda environment was created using a third-party library called Zeep. The Zeep library is used for handling the SOAP-based webservices in Python. For more details on the Zeep library, see this GitHub page.

Pull and Prepare Data

As a first step, download and activate your Conda environment containing zeep library, as discussed in full details in the previous blog. For additional details, refer to Announcing custom third-party Python and R packages for use on Autonomous Database and documentation. For the sample used here, the Conda environment was created with the name “mypyenv”.

1. Import the libraries needed to access the Analytics Publisher report.

%conda

download mypyenv --overwrite
activate mypyenv

%python

import requests
from zeep.transports import Transport
from zeep import Client
from zeep import helpers

2. Define a report request, like the following, but replace <server_name>, user_name, and passwd with your values. The chunking requirement and attribute format-related details are discussed here. Please also read the Best Practices document of BI Publisher web services in Fusion Applications.

%python

WSDL = "https://<server_name>/xmlpserver/services/ExternalReportWSSService?wsdl"
user_name = "???"
passwd = "***"

bip_report_request = {
    'reportAbsolutePath': '/Custom/ADB_OML/RP_AR_Inv.xdo',
    'sizeOfDataChunkDownload': '10000',
    'byPassCache': 'False',
    'flattenXML': 'False',
    'attributeFormat': 'csv'
}

3. Invoke the runReport method to perform the following steps:

Obtain a valid session through requests.Session().
Authorized the session with the user_name and passwd.
Get a transport object by passing the authorized session to it.
Initiate the bip_client object by providing the WSDL and the transport object.
Invoke the runReport method of Analytics Publisher web service ExternalReportWSSService by passing bip_report_request dictionary declared above and report-specific parameter values (none required for this specific case).

%python

bip_session = requests.Session()
bip_session.auth = (user_name, passwd)
bip_transport = Transport(session=bip_session)

bip_client = Client(wsdl=WSDL, transport=bip_transport)
bip_rep_result = bip_client.service.runReport(reportRequest=bip_report_request, appParams="")

4. The result obtained from the web service invocation is converted into a Python dictionary with helpsers.serialize_object. The Python dictionary variable bip_rep_result_dict now holds various key/value pairs returned by the web service call. Among them, the ‘reportBytes’ key contains the report output in the form of encoded bytes. Capture the decoded report output into the string variable rows_str. With this, the rows_str now contains the report output in a human-readable string with CSV format. As the last step of this paragraph, close the bip_session.

%python

bip_rep_result_dict = helpers.serialize_object(bip_rep_result, target_cls=dict)

rows_byte = bip_rep_result_dict['reportBytes']
rows_str = rows_byte.decode()

bip_session.close()
z.show(rows_str)

5. Print the report output retrieved from Analytics Publisher:

RANSACTION_ID,AGING_BUCKET,DAYS_SINCE_DUE,AMOUNT_REMAINING,EXPECTED_UNCOLLECTED_AMOUNT,AR_OVERDUE_1_30_AMT,AR_OVERDUE_31_60_AMT,AR_OVERDUE_61_90_AMT,AR_OVERDUE_90PLUS_AMT,FOLLOWUP_REQUIRED
1285059,"Overdue 1-30 Days",26,34698,347,34698,0,0,0,No
1284978,"Overdue 31-60 Days",44,46982,1409,0,46982,0,0,Yes
1284963,"Overdue 61-90 Days",71,86942,8694,0,0,86942,0,Yes
1284954,"Overdue 90+ Days",632,1304130,521652,0,0,0,1304130,Yes
1286115,"Overdue 1-30 Days",15,607719,6077,607719,0,0,0,Yes
1284561,"Overdue 1-30 Days",25,10430,104,10430,0,0,0,No
1286108,"Overdue 90+ Days",554,786969,314788,0,0,0,786969,Yes
1284551,"Overdue 31-60 Days",58,10061,302,0,10061,0,0,Yes
1286101,"Overdue 61-90 Days",63,74140,7414,0,0,74140,0,Yes
1284540,"Overdue 90+ Days",179,4182,1673,0,0,0,4182,Yes
1286092,"Overdue 1-30 Days",22,46982,470,46982,0,0,0,No
1286208,"Overdue 31-60 Days",35,148354,4451,0,148354,0,0,Yes
1284528,"Overdue 61-90 Days",80,4253,425,0,0,4253,0,Yes
1286202,"Overdue 90+ Days",99,154790,61916,0,0,0,154790,Yes
1284517,"Overdue 90+ Days",120,4756,1902,0,0,0,4756,Yes
1284505,"Overdue 1-30 Days",12,2800,28,2800,0,0,0,No
1284494,"Overdue 31-60 Days",45,6385,192,0,6385,0,0,No
1286037,"Overdue 61-90 Days",68,285386,28539,0,0,285386,0,Yes
1284481,"Overdue 90+ Days",96,4852,1941,0,0,0,4852,Yes
1286030,"Overdue 1-30 Days",17,103975,1040,103975,0,0,0,Yes
1284664,"Overdue 1-30 Days",22,761,8,761,0,0,0,No
1286177,"Overdue 1-30 Days",29,30958,310,30958,0,0,0,No

6. Convert the report output into a Panda’s DataFrame and display a few rows of the data. There are 150 rows in the full report:

%python

import pandas as pd
from io import StringIO

inv_df = pd.read_csv(StringIO(rows_str))
z.show(inv_df.head())

View contents of DataFrame

Categorical Data

The OML in-database algorithms automatically handle encoding of categorical variables. In our sample data, two columns (“AGING_BUCKET” and “FOLLOWUP_REQUIRED”) are strings, which we do not have to specifically handle this in our code to use them with in-database algorithms.

Train and Predict

7. Next, build the model and predict the result using the in-database algorithm with the following steps:.

Create a table using the Pandas DataFrame “invdf”:First check whether the table exist. If it exists, drop it and then call oml.create method by passing the Pandas DataFrame “inv_df”.
Load the table data into OML DataFrame “data_prep” by using the split function. The split function uses 80:20 ratio for training and test data.
Declare X_train to hold all columns from the training set portion except “FOLLOWUP_REQUIRED” (predictors only). For y_train we only use “FOLLOWUP_REQUIRED” (target column) column from the training set portion. We declare test_data to hold all the test set portion of data.
Declare setting variable to be of type dictionary with key as “GLMS_SOLVER” (Generalized Linear Model) and value as “dbms_data_mining.GLMS_SOLVER_QR”. More info on Generalized Linear Model is available through documentation.
While invoking olm.glm function we use “classification” option to indicate we are interested in the linear regression. We then train the model by calling glm_mod.fit.
After training the model, we invoke the score function to obtain the predictions using the test data set. While calling glm_mod.score, for predictors we use test_data.drop(“FOLLOWUP_REQUIRED”) (all columns except the column meant as target) and, for target we just use the “FOLLOWUP_REQUIRED”.

%python

import oml

try:
    oml.drop('AR_INV')
except:
    pass

oml_ar_inv = oml.create(inv_df, table = 'AR_INV')

data_prep = oml.sync(table = 'AR_INV').split(ratio=(.80, .20), seed = 1)
X_train   = data_prep[0].drop('FOLLOWUP_REQUIRED')
y_train   = data_prep[0]['FOLLOWUP_REQUIRED']
test_data = data_prep[1]

setting = {'GLMS_SOLVER': 'dbms_data_mining.GLMS_SOLVER_QR'}
glm_mod = oml.glm("classification", **setting)
glm_mod = glm_mod.fit(X_train, y_train)
z.show('Score: ' + str(glm_mod.score(test_data.drop('FOLLOWUP_REQUIRED'), test_data[:, ['FOLLOWUP_REQUIRED']])))

suppl_cols = ['AGING_BUCKET', 'AR_OVERDUE_1_30_AMT', 'AR_OVERDUE_31_60_AMT', 'AR_OVERDUE_61_90_AMT', 'AR_OVERDUE_90PLUS_AMT', 'FOLLOWUP_REQUIRED']
z.show(glm_mod.predict(test_data.drop('FOLLOWUP_REQUIRED'), supplemental_cols = test_data[:, suppl_cols]))

8. For larger data sets, the training data needs to cover maximum scenarios possible. It is because the performance of the model could change with more data provided for training or with use of a different data set for testing. Depending on the test results, the model may need tuning. As a last step of this OML Notebook paragraph, we display the test data along with the predicted data. The variable suppl_cols contains all the columns that we want to be part of the output, along with the original values of “FOLLOWUP_REQUIRED” in the test set, and the model predicted values (for “FOLLOWUP_REQUIRED).

View results in DataFrame

Analyze the Results

9. The next step is to analyze the results through confusion matrix, score, classification report etc. As the last paragraph of this OML Notebook, deactivate the Conda environment.

%conda

deactivate

Conda environment 'conda' deactivated

Conclusion

Oracle Machine Learning and Autonomous Database provide a robust environment for machine learning with the full power of the database combined with Python (through OML4Py). External data brought into Autonomous Database can easily be converted into database tables, which can be accessed through SQL, R, and Python with the help of OML Notebooks.

Building Machine Learning Models using data through Analytics Publisher

Details about the example AP Report and Data

Tools Used

Pull and Prepare Data

Categorical Data

Train and Predict

Analyze the Results

Conclusion

Shankar Duvvuri

Principal Technical Support Engineer

Accessing Analytics Publisher Reports through OML Notebooks

Announcing Additional AI Providers for Oracle Autonomous Database Select AI

Building Machine Learning Models using data through Analytics Publisher

Details about the example AP Report and Data

Tools Used

Pull and Prepare Data

Categorical Data

Train and Predict

Analyze the Results

Conclusion

Authors

Shankar Duvvuri

Principal Technical Support Engineer

Accessing Analytics Publisher Reports through OML Notebooks

Announcing Additional AI Providers for Oracle Autonomous Database Select AI