As a practitioner in machine learning, I value keeping up to date on the industry’s evolution in order to be competitive. I yearn to ingest new information about the latest trends, developments and happenings in the machine learning/deep learning/artificial intelligence/[FILL IN THE LATEST DATA INDUSTRY BUZZWORD] sector.
Yet, over the past several years, I've noticed a trend: almost every article is just a regurgitation of something I've already read countless times before. It is usually titled something like this:
"A [BRIEF/GENTLE/QUICK] [INTRODUCTION/PRIMER] TO [SOME MACHINE LEARNING ALGORITHM] USING [PYTHON/SOME PYTHON PACKAGE/SOME CLOUD PLATFORM] FOR [SOME TRIVIAL PURPOSE] WITH [SOME BENCHMARK DATA SET LIKE MNIST/IRIS/BOSTON HOUSING PRICES/ETC.]"
Admittedly, articles like this probably had their time and place, but I argue that both of those have long passed us, and that most of this material has depreciated into the sad world of click-bait. Yet, my protestation isn't with the sheer volume of these repetitious publications (although it has led me to read fewer and fewer articles), but with their failure to articulate real-world applicability to business problems.
Like the ratio of medical practitioners to the insured U.S. population following the ACA/Obamacare, the data industry is also suffering from a shortfall of qualified specialists. With traditional colleges playing catch-up to serve these needs, many practitioners are of the self-taught, autodidact variety -- myself included. Writing about one small segment of the overall machine learning engineering process (i.e. model building and evaluation) is significantly restricting knowledge growth and awareness of the underlying scope of the domain.
So, what aren’t we talking about that we should be? While data cleansing is admittedly not a very glamorous (albeit necessary) topic of discussion, I suggest more content on how to link machine learning outcomes to actual business metrics.
Most content on machine learning engineering ends with model evaluation, which could leave the uninitiated to believe that this is the end of their work. Wrong!
"My final model achieved 92% predictive accuracy! Yay!"
Present that assertion of finality to the CEO/COO/CDO and see what they say:
A comprehensive disconnect exists in the body of (Internet) machine learning knowledge about how to operationalize models into production. No, I don't mean how to scale them up in a cloud environment -- there is plenty of literature on that subject -- but rather, how to quantify the value of implementing them in a language decision-makers understand: the language of dollars saved and/or dollars earned.
This is where the Expected Value Framework comes in handy.
The Expected Value Framework is something I stumbled upon while reading one of the best, most approachable books on machine learning: Data Science for Business. While still covering topics like data cleansing, model building, and evaluation, the book spends a great deal of time trying to answer the question "How do we evaluate our models in a business context?"
Here is the general form of the equation that constitutes the framework:
The Expected Value of a decision, E[X], is a linear combination of all possible outcomes associated with it. We decompose each outcome, or On in our equation, to a combination of the probability of that outcome, P(On), and its associated value, V(On).
What can we do with the Expected Value Framework? Here are two main applications:
1) Evaluating how we will use a model in a business context
2) Comparing between models again in this same context
Let's examine the first application.
Imagine we built a classification model for predicting whether a customer will purchase a product, so that we can determine whom to mail a brochure to. As with most classification models, when we pass new customer data through the model, that customer is "scored" with a probability -- in this case, the probability of purchasing.
The naive approach would be to mail a brochure to every customer that was scored with a probability higher than 50% of purchasing. Sadly, this is typically the recommended threshold in "introductory" content regarding binary classification models when the probability of purchase is decidedly lower -- maybe one percent!
Using our equation, we can determine a better threshold. By adjusting the general form of the equation above for a binary outcome, we obtain the following:
Here, Pp(X) is the probability of purchase given the set of customer features/variables we used to build our model, whereas Vp is the value associated with a purchase. Both Pnp(X), or alternatively just [1-Pp(X)], and Vnp are the converses of these values -- the probability and value of not purchasing.
Probabilities are determined by our model. Value, on the other hand, is typically determined by business leaders, executives, or process owners.
How much does it cost to manufacture a product? How much to market it? What about logistical costs? All of these factor into the net profit we can expect from each individual sale of a product.
Our goal with this application of the Expected Value Framework is to determine a better threshold for whether to solicit our brochure to a customer. After conversations with SMEs at our organization, we've determined that the costs associated with our product are $245 per unit (costs for marketing, manufacturing, etc.). If we sell our product for $550, then our profit is $305 per product.
Conversely, the value of not purchasing is negative, since we’ve still spent money on our brochure and mailing it (we don’t count the cost of manufacturing the product, since we can still sell it to someone else. We can’t really ask a customer to send us back our brochure if they don’t purchase!).
Say these costs are $15. We've just calculated Vp and Vnp and can plug those into our equation below:
We set the equation as an inequality with the right side as zero because we'd expect that any profit greater than zero is a success (keep in mind for this simple example that I'm making the assumption a customer can only obtain our product directly through this campaign):
If we solve the inequality for Pp(X), we are left with the optimal targeting threshold (i.e. any customer with a predicted probability of purchase above Pp(X) should be included in our campaign)!
Therefore, given our model’s predictions, we should mail our brochure to any customer with a probability of purchase greater than 0.05. Simple, isn't it?
The framework is nothing new or even special, frankly (and I’m sure someone will say ironically this article is a rehashing of existing content) -- it's just cost-benefit analysis applied to machine learning!
In Part 2, I'll cover the application of the Expected Value Framework for comparing models overall -- both when we know and don’t know how to calculate the value component of our equation.
To learn more about machine learning and data science, visit Oracle's Data Science page.