Matthew Rowe

Data Scientist Senior Manager

Helping your model know what it doesn't know is crucial in data science. That's why we've released the open source MACEst (Model Agnostic Confidence Estimator) on GitHub, along with a research paper on arXiv describing the theory and application of MACEst for calibrating confidence estimates in classification tasks. MACEst is a Python library that calibrates machine learning models’ confidence estimates to be more representative of models’ true confidence, thereby correcting for over- and under-confident predictions. It works seamlessly with scikit-learn machine learning models, making it an easy fit for machine learning and data science engineers.

Machine learning models help with classification and regression tasks for many industries and use cases. Primarily, their job is to make predictions, such as class labels or estimated continuous values, based on unseen data. Accompanying such predictions are confidence scores that express the probability of the returned value. A high probability means high confidence in the result, while low probability means low confidence. Even so, these algorithms are often too confident for their own good.

Most machine learning algorithms will still produce a prediction even if this is in a part of the feature space that the algorithm has no information about. This could be because the feature vector is unlike anything seen during training, or because the feature vector falls in a part of the feature space where there is a large amount of uncertainty such as an overlapping border between two classes. **Aleatoric uncertainty** is due to the intrinsic randomness inherent in any system, while **epistemic uncertainty** arises due to a lack of knowledge (that is, data) to inform the model’s predictions, as seen in this image:

In such fields as finance, infrastructure, or healthcare, a single bad prediction can have dire consequences. It is important in these situations that a model be able to understand how likely any prediction it makes is to be correct before acting upon it. It is often even more important in these situations that any model *knows what it doesn't know* so that it will not blindly make bad predictions.

To tackle the challenge of calibrating confidence estimations, we have developed and released as open source MACEst (Model Agnostic Confidence Estimator). This is an open source Python library that creates a confidence estimator which can be used alongside any model (regression or classification) to produce a calibrated point prediction. In the regression case, MACEst produces a confidence interval about the point prediction, e.g. "the point prediction is 10 and I am 90% confident that the prediction lies between 8 and 12." In Classification MACEst produces a confidence score for the point prediction. e.g. the point prediction is class 0 and I am 90% sure that the prediction is correct.

MACEst produces well-calibrated confidence estimates: For example, 90% confidence means that you will on average be correct 90% of the time. It is also aware of model limitations, such as when a model is being asked to predict a point that it does not have the necessary knowledge (data) to predict confidently. In these cases, MACEst can incorporate the (epistemic) uncertainty and return a very low confidence prediction (in regression, this means a large prediction interval).

To demonstrate MACEst calibrating confidence estimates, the below plot shows MACEst and various existing confidence calibration techniques’ confidences estimates in a multiclass (10-class) classification setting as a function of noise. Gaussian noise was added here to features in a held-out test set based on a so-called noise leve (*n* standard deviations from the features’ mean values). The goal here was to change the test set in a controlled manner away from the training set that the learned model has seen. This simulated real-world model deployment scenarios where the model is not retrained and redeployed despite data changes.

As the noise level increases, all calibration methods asymptotically approach a confidence of 0.5. MACEst, however, asymptotically approaches 0.1. Given the large differences among the training and test sets due to the elevated noise level, MACEst adjusts the confidence estimates accordingly and tends towards the 1/10 chance of randomly guessing the correct class (out of the 10 classes). Existing methods do not utilize such differences, and thus apply the same static transformation function (for example, sigmoid) that was learnt originally for the random forest model’s confidence estimates. As a result, the original random forest model’s confidence estimates are scaled up to be incorrectly overconfident.

You can install MACEst by either git cloning the github repository and then using that local version, or running` pip install macest`

and installing MACEst directly from PyPi. After installing, you will be able to use MACEst alongside your existing scikit-learn models to recalibrate their confidence estimates. You can find useful examples of how to use MACEst for classification and regression in the accompanying Jupyter notebooks. To import and use MACEst, we recommend Python version >= 3.6.8.

Under the hood, MACEst works by compiling a graph space of feature vectors as nodes and determines how close nodes are to one another based on their Euclidean distance (you can change the distance metric to fit your use case and feature vector types). Using this space, the k nearest neighbours to a to-be-predicted point are gathered, these neighbours are then used to compute:

1. The local aleatoric uncertainty, as the number of incorrect predictions weighted by the distance to the to-be-predicted point, and

2. The epistemic uncertainty as the average distance to those neighbors.

MACEst optimizes parameters to weigh the contribution of the aleatoric and epistemic uncertainty factors when scaling any new estimated confidence values. This has a transformative effect: Confidence estimates are taken by the original scikit-learn model and MACEst then transforms those estimates based on the learned graph space. You can find worked examples in these Jupyter notebooks.

We hope that MACEst will help the machine learning and data science communities provide better calibrated confidence estimates and aid the use of confidence when deploying models into production. If you want to get involved with MACEst, lease file any feature requests or issues in the Issues page of the GitHub repository. We accept code contributions to MACEst under the Oracle Contributor Agreement (see details here).

Matthew Rowe is a Data Scientist Senior Manager working in Oracle Cloud Infrastructure. His background is in Sofware Engineering, Machine Learning and Mathematics.