MySQL is the most popular open-source database and developers use it for building applications in various domains. Often developers need to apply machine learning to data stored in the database and in order to do so, they need to extract the data out of the database and then use machine learning libraries. MySQL HeatWave is a fully managed service for the MySQL database from Oracle and has built in support for machine learning (HeatWave ML). HeatWave ML fully automates the process to train a model, generate inferences and invoke explanations, all without extracting data or model out of the database. The user can use familiar SQL interfaces to invoke all the machine learning capabilities. HeatWave ML leverages Oracle AutoML which automates model generation by replacing complex and time-consuming tasks such as data preprocessing, algorithm selection, feature selection and hyperparameter optimization that a data scientist is otherwise expected to perform.
ML_TRAIN produces a trained model as well as an explainer model (which can provide insight into model’s behavior), and stores both into the Model Catalog. The time taken for training a model depends on the size of user data (minutes to hours), number of HeatWave nodes as well as some user defined options in the ML_TRAIN Interface.
mysql> CALL sys.ML_TRAIN('mlcorpus.census_train', 'revenue', JSON_OBJECT('task', 'classification'), @model);
Query OK, 0 rows affected (2 min 31.36 sec)
The above shows training of a classification model on the census data set and it took approximately 1 min and 31 seconds.
ML_PREDICT_ROW is a stored function that runs in-line inference on a single row of data using the above trained model. The user provides input row of data as JSON for which to do prediction. Prior to calling ML_PREDICT_ROW, ML_MODEL_LOAD must be called, and the loaded model's training columns must match the ML_PREDICT_ROW input columns.
mysql> CALL sys.ML_MODEL_LOAD(@model, NULL);
Query OK, 0 rows affected (1.12 sec)
mysql> SELECT sys.ML_PREDICT_ROW('{"index": 1,"age": 38,"workclass": "Private","fnlwgt": 89814,"education": "HS-grad","education-num": 9,"marital-status": "Married-civ-spouse","occupation": "Farming-fishing","relationship": "Husband","race": "White","sex": "Male","capital-gain": 0,"capital-loss": 0,"hours-per-week": 50,"native-country": "United-States"}', @model);
+---------------------------------------------------------------------------+
{"age": 38, "sex": "Male", "race": "White", "index": 1, "fnlwgt": 89814, "education": "HS-grad", "workclass": "Private", "Prediction": "<=50K", "occupation": "Farming-fishing", "capital-gain": 0, "capital-loss": 0, "relationship": "Husband", "education-num": 9, "hours-per-week": 50, "marital-status": "Married-civ-spouse", "native-country": "United-States"}
1 row in set (2.36 sec)
The above shows that the model predicted that the salary will be less than 50k in 2.36 seconds
ML_EXPLAIN_ROW is a stored function that provides the user with an interface to create in-line explanations from a single row of input data. Explanations help the user perform knowledge discovery by explaining which features matter most to the model, and which features contribute the most to individual predictions. The explanations are conveyed through weights which show how much each feature influenced the final prediction- higher values mean a stronger influence.
mysql> SELECT sys.ML_EXPLAIN_ROW('{"index": 1,"age": 38,"workclass": "Private","fnlwgt": 89814,"education": "HS-grad","education-num": 9,"marital-status": "Married-civ-spouse","occupation": "Farming-fishing","relationship": "Husband","race": "White","sex": "Male","capital-gain": 0,"capital-loss": 0,"hours-per-week": 50,"native-country": "United-States"}', @model);
+---------------------------------------------------------------------------------+
| {"age": 38, "sex": "Male", "race": "White", "index": 1, "fnlwgt": 89814, "education": "HS-grad", "workclass": "Private", "Prediction": "<=50K", "occupation": "Farming-fishing", "capital-gain": 0, "capital-loss": 0, "relationship": "Husband", "education-num": 9, "hours-per-week": 50, "marital-status": "Married-civ-spouse", "native-country": "United-States", "age_attribution": -0.2234, "sex_attribution": -0.0241, "race_attribution": -0.0011, "index_attribution": 0.0, "fnlwgt_attribution": 0.003, "education_attribution": 0.0, "workclass_attribution": -0.0126, "occupation_attribution": 0.1111, "capital-gain_attribution": 0.0, "capital-loss_attribution": 0.0, "relationship_attribution": -0.0928, "education-num_attribution": -0.1305, "hours-per-week_attribution": -0.1806, "marital-status_attribution": -0.0676, "native-country_attribution": 0.0001} |
1 row in set (4.41 sec)
The above explanation indicates that the occupation, with a weight of 0.1111 is an important attribute.
ML_SCORE computes the model quality by generating predictions on given test data and comparing it vs the ground truth labels. The ML_SCORE API requires a string argument that specifies the scoring metric to be used. HeatWave ML supports the standard scikit-learn scoring metrics as described here for classification and regression –
https://dev.mysql.com/doc/heatwave/en/hwml-ml-score.html
mysql> CALL sys.ML_SCORE('mlcorpus_v4.census_test', 'revenue', @model, 'balanced_accuracy', @score);
Query OK, 0 rows affected (5.34 sec)
mysql> SELECT @score;
+--------------------+
| @score |
+--------------------+
| 0.7961280941963196 |
+--------------------+
1 row in set (0.00 sec)
The model above had a balanced accuracy of 0.79.
In summary, MySQL HeatWave ML enables a database user to do in-database machine learning training, inference and explanation using simple SQL interfaces. It takes care of all the complicated steps involved in machine learning that a typical data scientist is expected to do, thus making it easy for a new user to quickly adopt the platform.
Salil Pradhan is a Product Manager in MySQL HeatWave team. His interests include distributed data processing, machine learning, cloud computing, middleware technologies as well as application areas such as Marketing Automation and Supply Chain Management.