Bridging the gap: How citizen data scientists can turn data into actionable information

April 1, 2022 | 5 minute read
Barry Mostert
Senior Director, Artificial Intelligence and Analytics
Text Size 100%:

The term “citizen data scientist” was originally coined by Gartner to describe advanced data and analytics professionals whose objective is to derive value from their data through machine learning (ML).  The process transforms organizations from being simply data-driven – where you could make a decision simply based on trusted, static historical data, such as an ERP data export – to becoming analytics-driven.  Being analytics-driven means using advanced analytical techniques like ML that consider all relevant data at the lowest possible granularity to train models that produce predictions or recommendations for strategic business decisions.

Few people label themselves as a citizen data scientist, but these roles usually reside inside business groups where they understand the needs of the business and the associated problems that need solving. There is a shortage of people that have the skills to fulfil the current demand for citizen data scientists in the market.  That is in part due to the skills required to use classic and expensive data tools that usually require a level of coding experience.

What do citizen data scientists do?  Essentially, they are pragmatic problem solvers that find solutions to business problems through data and analytics.  With deep domain knowledge, they build models that represent their business - models that can recommend the next best course of action for any given scenario.  Keep in mind most employees who fill this role didn’t train specifically to become citizen data scientists. Instead, they have other titles within their business group, like a financial analyst, HR analyst or sales operations, etc.  Skills required for machine learning are gained organically on the job as needed.

How do citizen data scientists and data scientists differ?  The key difference is that a citizen data scientist leverages accessible technologies to derive value from data, while a data scientist has additional skills to tackle challenging analytical tasks.  These additional skills usually come from more formal training to become data scientists, including coding (with languages like, Python and R), advanced statistics, and machine learning itself. 

So, how do modern data management and analytics platforms enable citizen data scientists with the capabilities they need, without resorting to a complex architecture of expensive and loosely integrated tools?  Good analytics platforms allow citizen data scientists to:

  • connect to all necessary data, without needing to move data around
  • prepare and enrich data without exporting it
  • build, train, and test custom ML models without heavy coding
  • share certified ML models with business groups

Connect all data

Connecting all data may sound obvious, but this can be a challenge with the ever-increasing volume of data and varied data types (i.e., unstructured, semi-structed, and structured).  Capabilities like direct query and function shipping are essential to ensure data is processed where it lies, without relying on data movement. Native connectors built into the analytics platform usually provide both performance and security advantages.

Read more about connecting sources

Prepare and enrich data

Data in source systems is almost never ready for analytics.  Some level of data preparation is required to repair and reconcile disparate data sets.  Data sets must be enriched with additional dimensionality, calculated fields, and 3rd party detail information (e.g., weather, stock indices, sports scores) to fill gaps required for a complete business view.  Having the ability to prepare and enrich data within the same platform means better security, but also ensures that processes are centralized and trackable so that they are not dependent on any one individual. 

Reviewing, fixing, and enriching the data set for analytics

Read more about preparing and enriching data

Create custom ML models

When creating custom ML models that produce reliable recommendations it is essential to 1) choose the right ML algorithm for the job and 2) tune that model correctly for the unique business use case at hand.  Every business is different, thus custom ML models are unique to their specific business use case.  Citizen data scientists may not be able to code their own models from scratch, but they do know which type of algorithm may best suit a given purpose.  For example, a regression algorithm for financial forecasting or a naïve bayes binary classifier for employee turnover prediction.  Additionally, they understand how to correctly tune those models for their specific use case.  For example, whether to use a lasso or ridge regression method for forecasting.

Choosing lasso or ridge for predicting revenue with a regression algorithm

The analytics platform should allow the citizen data scientist the flexibility to select and customize their model to suit their specific use case.

Share certified models

Once models are producing reliable recommendations, they can be published for wider business users to access and run against their own data sets.  Ordinary business users can now become analytics-driven ­– and don’t have to rely on the citizen data scientist to complete individual requests – without any extra tools or upskilling.  Running new data sets through a certified ML model does not require any knowledge as to how the model works nor the intricate tuning that went into creating it.

Registering a certified model for HR business analysts to use

Summary

Citizen data scientists are charged with finding solutions to business problems with data.  They understand the business problem and create a solution by deriving value from data.  While they may not be hardcore coders, they are familiar with data science and machine learning subjects.  Empowering citizen data scientists to solve problems requires an integrated data management and analytics platform that provides access to large amounts of data, preparation and enrichment capabilities, and the ability to build custom ML models that support business questions.  Sharing those models and enabling ordinary business users to access them is key to preventing data science bottlenecks and reducing repetitive last-minute requests.

Oracle Analytics is an integrated platform that provides open connectivity, built-in data prep and enrichment with intelligent machine-led augmented insights, and built-in, customizable machine learning algorithms that can be shared with a wider business community. 

Read more about Oracle Analytics here. Follow us on Twitter@OracleAnalytics and connect with us on LinkedIn

Sign up for the citizen data scientist hands-on workshop

                              

Barry Mostert

Senior Director, Artificial Intelligence and Analytics

Barry is a senior director for product marketing covering Oracle's AI and Analytics services.


Previous Post

Oracle Analytics Server 2022 is available!

Mitch Campbell | 4 min read

Next Post


Oracle Analytics Best Practices: Performance Tuning Multi-Dimensional Database Queries

Romesh Lokuge | 8 min read