Learn data science best practices

  • March 28, 2017

6 Reasons Data Modeling Fails

When executed properly, predictive modeling helps data-savvy companies make smarter business decisions and foster more meaningful relationships with their clients, ultimately translating into a healthier bottom line. In fact, data science drives a $65.7 million net income boost for Fortune 1000 companies that increased their data accessibility by as little as 10%. But when data modeling isn’t done well, the value of this work is lost.

When building a predictive model, data scientists leverage a company’s historical data to identify revenue opportunities. But many companies still aren’t achieving their expected return on investment in this process. Below are six common reasons why:

Poor data quality

When it comes to data modeling, you get what you put in. If the data used to train a predictive model is inaccurate or irrelevant to begin with, then the model’s results will be unreliable or even misleading. As the saying goes: garbage in, garbage out. Before getting started, it’s critical to thoroughly explore the quality of available data to ensure you have the right information to meet your objectives in the first place. The next step is to clean your data, or, in other words, remove entries that are missing critical information or containing inaccuracies. This initial work can seem tedious, but it’s essential to ensuring that the end results of your predictive model will actually bring value to your business.

Not defining use cases with business teams

Beginning a new predictive modeling problem is exciting, but many data science teams fail to set themselves up for success. Often, they dive into the crux of the problem without first taking time to . Level-setting with the end users of the model at an early stage is critical to defining exactly how the model’s outputs can and should be leveraged to make better business decisions. Without establishing clear goals of the project, weeks or even months of work developing a model could turn into a more of an R&D effort than a solution that genuinely delivers value.

Lack of “data lineage”

Simply put, data lineage is about fully disclosing all the relevant information about a predictive model to its end users. Without clear documentation, most business users won’t understand the model’s layered complexities. Unless the data science team can deconstruct how they arrived at an end recommendation and gauge its business impact, decision makers may fall back on their intuition to fill gaps in what they understand, or even entirely dismiss the results of an unfamiliar methodology. Best practices for constructing clear data lineage include evaluating the data used, explaining the methodology behind building the model, and underscoring precisely how various cross-functional teams will be able to leverage the outputs.

Not deploying models

Data scientists build predictive models in programming languages like R, Python, or Scala, but they typically rely on data engineers to deploy these models into production. That’s because model building and deployment often require completely different languages. The problem is that engineering teams often don’t have the time or resources to carry out model deployment efficiently, which can render your data scientists’ work virtually useless. Finding a tool or platform that can instantly translate code from one language to another or allow your data science team to deploy models behind an API so they can be integrated anywhere without the help of engineering are two ways to overcome this potential pitfall.

“Ivory tower” analytics or data science teams

Successful predictive modeling requires seamless communication between data scientists and decision makers. It’s easy for quantitative teams to become siloed, but these “ivory tower” data science teams inevitably isolate their work from the rest of their company, leaving analytical opportunities on the table. What’s more, this kind of separation can lead executives to resort to an ad-hoc approach to data science instead of implementing an algorithmic decision-making strategy across all facets of the business. It’s important to establish a pipeline for communicating and operationalizing insights from the very beginning in order to maximize the impact of each predictive model your data science team deploys.

Relying on “black box” models

With predictive modeling there are no shortcuts to success. That’s because building a predictive model that’s truly representative of your business requires custom work by data scientists with the expertise to qualify a variety of methodologies and select the one best suited to your specific use case and dataset. It is dangerous to plug data into a “black box” model and expect that it has applied the right algorithm. The best data science teams would never take this approach — and neither should you.  

Find out how a data science platform makes data modeling more valuable. Download the whitepaper from Forrester today.

Be the first to comment

Comments ( 0 )
Please enter your name.Please provide a valid email address.Please enter a comment.CAPTCHA challenge response provided was incorrect. Please try again.