There is little question that data scientists and those with advanced analytics and machine learning skills are in very high demand. According to the recently released Linkedin 2018 Emerging Jobs Report, “Machine Learning Specialists,” “Data Science Specialists,” and “Data Science Managers” each saw 6x, 5x, and 4x growth respectively in job postings over the past 4 years.
Despite the huge demand for data scientists, there is a very wide range of data science maturity among organizations. Some companies are just getting started with data science, while others have made significant investments and have data scientists spread across global business units. Regardless of where an organization is in their maturity, the common challenge arises of how to best structure data science teams so they can scale to meet growing demands.
Data Science is a Team Sport
Before considering organizational structure, it’s important to realize that building a data science capability is not as simple as hiring data scientists and giving them access to data. Data science is a team sport and in too many cases data scientists are hired without the resources required to support their workflows. There are many roles and responsibilities that support different steps in the data science lifecyle- from building data pipelines to embedding machine learning models into applications, which fall outside the data scientist’s purview. Below are a few of the key roles and responsibilities of individuals who work within data science teams or who work closely with data science teams to ensure that data science outputs are delivered efficiently.
Business analysts are responsible for converting particular business challenges into well-defined analysis plans. Since they have domain knowledge of business functions, they can bridge the gap between business stakeholders and data scientists to ensure data science projects are well defined and actionable.
Data architects are responsible for designing and operating the underlying platform and infrastructure that supports data science work. This role is particularly important for large data science teams who have a variety of data, workflow, and tooling requirements.
Data engineers are responsible for data ingestion, processing, and storage. They build data pipelines which make data accessible for data scientists to work with. The data engineer plays a vital role in the analytical lifecycle- without the right data there is no data science.
Data Scientists are responsible for using statistical methods, processes, and algorithms to extract insights from data. To do this, they prepare data, explore and visualize data, and build models using programming languages like Python or R. Depending on the company and industry, this role may have other titles like “Research Scientist,” “Data Miner,” or “Data Analyst.”
Machine Learning Engineer:
The Machine Learning Engineer is an emerging role, which is responsible for deploying models developed by data scientists into production and monitoring and maintaining those models. They sit between the data scientist and the developer.
The developer is responsible for taking a deployed model and embedding it into an application that end-users can interact with, or in a product or system that can consume the model. As the demand for machine learning and artificial intelligence increases, a number of tools are emerging to help developers embed machine learning in their applications faster, such as pre-trained model API’s, or drag and drop modeling tools.
How should teams be structured?
There is no “one size fits all” when it comes to structuring a data science team. Every company operates differently and has different use cases and demands with regards to data science. That said, data science teams are generally organized under a centralized, decentralized, or hybrid structure.
In a decentralized model, data scientists report into specific business units (ex: Marketing) or functional units (ex: Product Recommendations) within a company. The decentralized model often emerges in mid or larger sized organizations where specific units have the budget to hire and support their own data science team and have adequate demand for data science use cases. A significant drawback of the decentralized model is that it can lead to isolated teams- meaning little collaboration or knowledge sharing between data science teams across the organization.
In a centralized model, data scientists are members of a core group, reporting to a head of data science or analytics. In smaller companies, this may be 2 or 3 data scientists, but in larger organizations, this may be tens or even hundreds of data scientists operating in a center of excellence (COE). Having a centralized data science team enables organizations to be flexible with regards to allocating resources to different projects, while at the same time exposing data scientists to a broad range of analytical challenges. A centralized model also helps with documenting and scaling best practices- ensuring work is repeatable and teams are operating efficiently. A major challenge of the centralized model is that it can be hard to assess and meet demands for incoming data science projects. This is particularly true for smaller centralized data science teams who are responsible for supporting multiple product or business teams who may come to them with both ad hoc requests and longer-term projects which require significant resource investments.
The hybrid model adopts aspects of both a centralized and decentralized approach. One example would be a large global organization with decentralized data science teams supporting different global business functions, but also a data science center of excellence which serves as a hub to document best practices and propagate those to each of the global data science teams. An alternate example would be a company where data scientists are aligned to specific business units but report into centralized analytics management.
I hope that this post provided some key insights and considerations for how to organize effective data science teams. If you're interested in learning about the products and services that Oracle offers for every role in the data science lifecycle, read about our AI Platform at oracle.com/ai.
You can follow me on twitter at @Jcharness for the latest news on artificial intelligence and machine learning.