Best practices, news, tips and tricks - learn about Oracle's R Technologies for Oracle Database and Big Data

Data Science Maturity Model - Collaboration Dimension (Part 4)

Mark Hornick
Senior Director, Data Science and Machine Learning

In this next installment of the Data Science Maturity Model (DSMM) dimension discussion, I focus on 'collaboration':

How do data scientists collaborate among themselves
and with others in the enterprise,
e.g., business analysts, application and dashboard developers,
to evolve and hand-off data science work products?

Data science projects often involve significant collaboration, defined as "two or more people or organizations working together to realize or achieve a goal." Successful data science projects that positively impact an enterprise will often require the involvement of multiple players: data scientists, data / business analysts, business leaders, domain experts, application / dashboard developers, database administrators, and information technology ( IT) administrators, just to name a few. Collaboration can informal or formal, however, in this context, we look to tools that support, encourage, monitor, and guide collaboration among players.

The 5 maturity levels of the "collaboration" dimension are:

Level 1: Data analysts often work in silos, performing work in isolation and storing data and results in local environments.

Enterprises at Level 1 often suffer from the 'silo effect', where data analysts in different parts of the enterprise work in isolation, focusing narrowly on the data they have access to, to answer questions for their department or organization. Results produced in one area may not be consistent with those in another even if the underlying question is the same. These differences may result from using (possibly subtlety) different data, or versions of the same data, or taking a different approach to arrive at a given result. These differences can make for interesting cross-organization or enterprise-wide meetings where results are presented.

Level 2: Greater collaboration exists between IT and line-of-business organizations.

The Level 2 enterprise seeks greater collaboration among the traditional keepers of data (Information Technology) and the various lines of business with their data analysts and data scientists. Sharing of data and results may still be ad hoc, but greater collaboration helps identify data to solve important business problems and communicate results within the organization or enterprise.

Level 3: Recognized need for greater collaboration among the various players in data science projects.

With the introduction of data scientists, and the desire to make greater use of data to solve business problems, Level 3 enterprises see the need to have greater collaboration among the various players involved in or affected by data science projects. These include data scientists, business analysts, business leaders, and application/dashboard developers, among others. Collaboration takes the form of sharing, modification, and hand-off of data science work products. Work products consist of, e.g., data (raw and transformed), data visualization plots and graphs, requirements and design specifications, code written as R / Python / SQL / other scripts directly or in web-based notebooks (e.g., Zeppelin, Jupyter), and predictive models. Use of traditional tools such as source code control systems and object repositories with version control may be used, but inconsistently.

Level 4: Broad use of tools introduced to enable sharing, modifying, tracking, and handing off data science work products.

Level 4 enterprises build on the progress from Level 3, introducing tools specifically geared toward enhanced collaboration among data science project players. This includes support for sharing and modifying work products, as well as tracking changes and workflow. The ability to hand off work products within a defined workflow in a seamless and controlled manner is key. Different organizations within the enterprise may experiment with a variety of tools, which typically do not interoperate.

Level 5: Standardized tools introduced across the enterprise to enable seamless collaboration.

While the Level 4 enterprise made significant strides in enhancing collaboration, the Level 5 enterprise standardizes on tool(s) to facilitate cross-enterprise collaboration among data science project players.

In my next post, we'll cover the 'methodology' dimension of the Data Science Maturity Model.

Be the first to comment

Comments ( 0 )
Please enter your name.Please provide a valid email address.Please enter a comment.CAPTCHA challenge response provided was incorrect. Please try again.