Predictive models are at the center of data-informed decision-making strategies at successful enterprise companies today, but sometimes their performance doesn’t meet expectations. Why? These models are trained on historical data with the expectation that they’ll continue to make accurate predictions once they’re exposed to entirely new data. Of course, change is the only constant, so this assumption doesn’t always hold.
The problem is, once a model has been deployed in a real production environment, it’s not always possible to immediately tell when its predictions have gone off course. We often think of model deployment as the last stage of the data science workflow, but it should the beginning of a careful testing process that aims to answer two basic questions about a model’s performance:
Does the model behave as expected?
Is the model that seemed most accurate during the building and training process actually the best one for the job? Or does a different version perform better on new, unseen data?
This process is called model testing. It’s a set of procedures that pits competing models against each other post-deployment to determine which of them does the best job providing outputs that are not only statistically accurate, but are also easy to translate into metrics that help stakeholders meet key performance indicators, or KPIs. In other words, model testing is all about selecting a model that strikes the right balance between mathematical accuracy and practical usefulness.
"Model testing is all about selecting a model that strikes the right balance between mathematical accuracy and practical usefulness."
A/B testing and multi-armed bandit tests are the best approaches a data scientist can use to ensure a model will be successful across these two criteria post-deployment. Both of these methods are considered tried-and-true in the realms of digital marketing and website optimization, for example, but they stand to be much more widely adopted in data modeling.
In website optimization, these tests compare specific variants of a website to find the version that generates the best response from visitors. That might be more usage of a search feature, clicks on a specific call to action, or a lower bounce rate. No matter what the business goal is, it’s critical to evaluate the different variants in terms of their impact on that specific goal. Model testing is no different. The goal is to deploy multiple models that attempt to address the same business problem, then compare their performance to select the best one to answer that question without compromising accuracy. This ensures you’ll minimize the risk of selecting a single model upfront that may turn out to be suboptimal once it’s deployed in a real life setting.
Our latest white paper provides an introduction to implementing both A/B and multi-armed bandit testing methods for predictive model testing. It’s an important read for anyone interested in helping their data science team ensure the success of their hard work.