Learn data science best practices

  • April 20, 2017

Data Science is a Team Sport

Building a data science team is not as simple as hiring an experienced data scientist and giving him or her access to a database. In too many cases, data scientists are hired without the requisite supporting roles, so they ultimately spend the majority of their time on tasks other than deriving valuable insights from data.

Successful organizations view data science as a team sport: They assemble individuals with different skill sets and assign them different responsibilities to support each step of the data science process. Below are three roles and the contributions they should be making to ensure you’re producing quality outputs in the most efficient way possible.

Business Intelligence Experts Define the Scope

The first step of any enterprise data science challenge is to frame the business problem. This involves honing in on the specific outputs key stakeholders in your organization can use to improve their decision-making processes. Then, you’ll need to get a handle on what data will be required for the analysis. Typically this task belongs to a business intelligence analyst who can leverage his or her domain expertise to connect the rest of the quantitative team to the operations they hope to influence. Once this is complete, a BI analyst can also help to expedite the data-wrangling process by conducting some of the early cleaning and transformations that typically precede predictive modeling.

Data Engineers Extract, Transform, and Load (ETL)

Once your team is aligned on the problem you’re trying to solve, the next step is to collect the raw data that will act as the foundation of your data model. This data is often stored in disparate sources, so making it accessible will require a front-end engineer’s expertise accessing web APIs or employing various languages and tools for extracting data. Then, a data engineer is required to transform the data so that it conforms to the technical requirements of the target database. Finally, the data is ready to load into the final database where it will be accessed by your quantitative team. This process is complicated, but it’s necessary. Data science work requires a structured and clean dataset, which can be used to train better models and algorithms more efficiently.

Data Scientists Analyze and Productionalize

When your data has been and transported and prepared for analysis, it’s finally time for data scientists to do the job they were initially hired to do: extracting insights from your data. They apply algorithms and build models specifically chosen based on the use case your team has defined and the data that’s available.

But the process doesn’t end at model results: Once the data scientists have produced the results of their analysis, the next and final step is to productionize their findings by integrating them into your decision makers’ workflows. Your data engineers are crucial at this stage as well. That’s because the outputs of the model your data scientist has built is likely to be in a format that’s different from the one your decision makers typically use to discover the information that typically informs their day-to-day work. If your data science platform allows, you can deploy a model behind an API and integrate it instantly with the tools your stakeholders use; if not, your engineers will need to translate the model from R or Python into a production stack language and move it into a production environment for deployment.

It takes a village to do data science well and ensure insights are used to their full potential. Careful planning and the right tools are an invaluable part of that process, but collaboration between experts is key to creating a functional and efficient data science workflow.

Be the first to comment

Comments ( 0 )
Please enter your name.Please provide a valid email address.Please enter a comment.CAPTCHA challenge response provided was incorrect. Please try again.