X

Learn data science best practices

Making the Case for Centralized Data Science

Saras Yagnavajhala
Senior Director, HGBU Data Science and Strategy

As enterprises look to develop their AI strategy, a number of questions emerge: Where do we start? How do we demonstrate ROI? How should we operationalize the initial successes from data scientists to drive sustainable enterprise solutions?

In this article, we will discuss how a centralized approach to data science can help break organizational silos and drive measurable and scalable results.

A Retail Enterprise Case Study

Consider a typical large retail department chain with 1,000 stores and 100,000 SKUs. Simply determining what the products should be priced at, when and how to promote, how much to order, and how to flow inventory through the supply chain involves several million combinations of decision possibilities.

For example, if you’re trying to understand the impact of running a 30% off promotion on all women’s sweaters, the marketing organization would want to know what new customers or repeat customers the promotion will attract. Will the campaign cannibalize demand from another department/category or a sister brand? Merchandise planners will want to know how the promotion will impact their plan and budget, and supply chain leads will want to understand what the predicted demand by product/location/day/channel will be. How much inventory is needed in stores versus the warehouse? What safety stock is needed to account for variations between forecasted and actual demand?

All of these decisions start with an understanding of base demand. To put it simply:

Demand = Function of (base demand, seasonality, product life cycle effects, base price effects, promotional price effects, inventory effects)

The model can be further enhanced by factors such as brand sentiment, product reviews, price sensitivity, promotion responsiveness, and social chatter.

So, demand is impacted by decisions made across different organizational entities.

Benefits of Centralized Data Management

Currently, in a majority of organizations, decisions are made in silos. Marketing owns the relationship with the customer and directly manages campaigns and promotions. Pricing analysts and revenue managers tend to make pricing decisions without a clear view of whether inventory is available to meet the new demand. Supply chain tends to have a reactive approach to inventory planning, without knowledge of all upstream levers that impact demand. Having a centralized data science engine brings about the following benefits:

Centralized algorithms and management

A centralized place where data cleansing is performed and the cleansed data is used for further modeling means that data is cleansed once and used to feed multiple modeling components. Data cleansing can in turn leverage a pool of centralized algorithms such as anomaly detection, exponential smoothing, etc.

Centralized data science

Figure 1: An example of centralized data science engine driving enterprise AI

Role-based access and explainability

A well-designed centralized data science engine can meet the needs of a range of users, each of which have a different set of objectives.

  • Business users need real-time visibility to the impact of the decisions they are making. They need support for real-time what-if analysis and timely recommendations based on business data. The system needs to automatically consider relevant business constraints.

  • Data analysts need to manage models and refresh models on an ongoing basis. System needs to provide support for multiple model runs and comparisons of results and ideally become self-correcting over time, with minimal analyst/scientist intervention. System guard rails could be used to trigger automatic model refreshes.

  • Data scientists may want to simulate different models, especially as new data sources are incorporated. It is typical to start with core transactional data and then continue to enhance the models with additional data such as click streams, online reviews, or third party data such as weather, market & competitor data. The system needs to support such an evolutionary approach to model deployment. 

Systems needs to be smart and dark to the business users and provide a robust platform for evolutionary model management for data analysts and scientists. 

Demonstrable ROI from Data Science

A centralized data science system can constantly measure improvements in key metrics over time and compare projected versus realized KPIs. The system can learn and make improvements based on system recommendations overridden by business users. Intuitive reason codes, when user overrides are made, can help enhance the system’s future recommendations. The system can also provide insights into opportunity costs (the cost of a missed opportunity, if a recommended decision is not taken). These insights can then drive change management and adoption at the executive level. A well architected centralized data science engine can drive reusability, scalability, transparency, and accountability.

Summary

  • It’s important to be able to show ROI, or else data science ends up as just another buzzword or check in the box initiative for your enterprise. Start small, and show quick ROI before investing in integrating newer data sources. Lay the foundation right to support evolutionary model enhancements. Make the right investments early on to ensure that data scientist outcomes are repeatable, scalable, and can be operationalized, rather than one-off "cool insights."

  • A centralized data science engine can help bridge organizational silos, cut costs, and reduce data scientist churn.

  • Data science can be a great enabler to drive organizational change management and make the shift to a truly data-driven culture. Predictive and prescriptive insights can be integrated into core business processes. It will require a cultural shift to start having faith in the “algorithm.”

Be the first to comment

Comments ( 0 )
Please enter your name.Please provide a valid email address.Please enter a comment.CAPTCHA challenge response provided was incorrect. Please try again.