How to Make Sure Your Machine Learning Model Holds Up In Court

The scene is a familiar one: a quiet room, somber faces deep in thought. All eyes are cast on a technical expert tasked to present critical evidence. Questions are asked and answered. Everyone wonders what decision will be made.

This isn’t a courtroom as you might expect—it’s a conference room much like any other. Here, company managers are the jury and you’re the expert, discussing the results of a new machine learning model. As available data rapidly grows, this scene is becoming more common. Companies in every industry are beginning to use machine learning for demand planning, machine maintenance, and other applications.

Machine learning is being used to support legal claims, too. I recently worked on a team to identify leakage in a company’s value chain. My role as an expert witness was to apply predictive analytics to transaction data and find the likely culprits. After finishing the case, I identified some considerations for bringing machine learning into any setting.

Help People Understand the Model

Your audience is far more likely to accept results from something they understand. This presents a challenge since machine learning models are complex by nature. The challenge is most serious when the audience has a wide range of technical backgrounds. In our case, the analysis relied on a technique known as t-SNE. Suppose you were on a jury and heard me read from the authors' original paper, “t-SNE, short for t-Distributed Stochastic Neighbor Embedding, is a technique for dimensionality reduction that is particularly well-suited for the visualization of high-dimensional datasets.” ...Right. Reading through a wordy synopsis and hoping for head nods doesn’t work here nor in a business setting.

Instead, we can use a relatable example to help the audience connect at the right level. The math behind t-SNE is insane. Not many people without PhDs really know about K-L divergence, Shannon entropy, and so on. And that’s okay. We can start with sharing that t-SNE’s basic purpose is to show similarities and differences among data points. Suppose this time I throw a few dozen Lego blocks onto a table and start sorting them based on size, color, and shape. We could talk about how that looks and how it mimics the algorithm. It’s very likely we would then reach a common understanding at the right level. From there, it’s a short step to agreeing that the model’s core processes make sense.

Know which factors are important

Machine learning excels at finding correlations across hundreds or even thousands of factors. That’s great for finding new relationships among inputs. It’s not so great for knowing which factors are most significant. Even with tools like SHAP and Tensor Board, explaining significance can still be a challenge. Fortunately, we can make choices to limit the difficulty when designing the model.

One such choice is to use simple factors that come straight from the process being modeled. In our case, we used basic transaction data and ratios instead of transformed numbers that weren’t intuitive. Another choice is to limit the number of factors going into the model. We limited our model to fewer than 20. More factors and more data complexity can improve accuracy in most cases. It can also affect the ability to explain the model. If we can't shed light on which factors influence the findings, a more accurate model may be less useful in practice.

Embrace Messy Data

Data is never perfect. A 2017 survey by CrowdFlower showed that 53% of a data scientist’s time is spent “collecting, labeling, cleaning and organizing data.” Other sources put the figure closer to 80%. Yet despite those efforts, the data consumed by most machine learning models still has outliers and artifacts. The data for our model was no exception. We had several thousand records with data errors. There were start dates occurring after end dates, duplicate id numbers, and other issues. It seemed like an obstacle to building a reliable model.

So how did we have any confidence in the analysis with such erroneous data? We embraced the mess and quantified its net effect on our results. Thankfully, the effect was small in this case as we had over 3 million transactions. Several thousand errors gave an error rate of less than 1% and had a negligible impact. Even so, we were clear on the level of impact and understood how the imperfect data affected the results. Understanding how imperfect data affects decisions in a real business environment is powerful knowledge. When everyone is aligned on a model’s limitations due to messy data, you can move forward with what you have.

Consult With Other Experts—Early and Often

Input from team members and other experts in machine learning is a great way to get feedback on your model and the methods behind it. In a legal environment, you can expect the other side to hire their own expert. That person’s job is to question every decision you made or didn’t make in creating the model. I was grilled on countless aspects of my analysis. We discussed everything from the overall methodology down to the model parameters listed in the computer code.

A review group can take many shapes as long as it strikes the right balance. An adversarial relationship like that seen in a legal setting is too intense for most purposes. On the other hand, a ‘go along to get along’ approach doesn't work either. A group that is thorough and critical in a constructive way usually works best. It’s much better to find issues and areas of high risk with such a group before asking for business approval, or worse, using the model in daily operations. Even if the model is completely sound, a technical review will help you prepare for later stages of the adoption process.

I found these considerations for machine learning to be very helpful as an expert witness. As the use of machine learning grows, I hope you find them helpful in your setting as well. So, ask yourself—would your machine learning model hold up in court? Answering yes won’t guarantee a successful outcome, but it will improve the odds that your model gets used.