It seems like everybody wants to be a data scientist these days, and for good reason. From cybersecurity to predicting customer behavior, data analytics is becoming mission-critical to more and more businesses.
The demand is no longer confined to the high-tech and software realms. Data science jobs in sectors like education, marketing, and manufacturing are helping companies visualize strategies that help drive growth, financials, and engagement.
But if there’s one secret that all data scientists know, it’s this: data scientists aren’t developers.
Data scientists write code as a means to an end, whereas software developers write code to build things. The distinction is important because many cloud platforms designed for data scientists are really built for developers, rather than data scientists.
That’s why when we conceived the Oracle Cloud Infrastructure (OCI) Data Science—a service that enables data scientists to build, train, manage, and deploy their machine learning models on Oracle Cloud—we built it with the key features and functionality data scientists told us that they needed. This included things such as model reproducibility, secure collaboration, and a comprehensive model-building and training environment that includes all the open source tools that data scientists use on a daily basis.
Also key on the list of concerns was crafting the right approach to collaboration. Data scientists often work in teams with other data scientists, interns, and sometimes consultants on the same projects. Having a way to collaborate securely while meeting governance and compliance requirements is a constant struggle. The OCI Data Science Platform addresses this through powerful team capabilities that include shared notebooks, model catalogs, and team-based security policies.
Many of the great capabilities and features of OCI Data Science were drawn from modern software development processes that improve collaboration and productivity. Productivity was the driver behind the development of Accelerated Data Science (ADS).
ADS is a native Python library available within OCI Data Science that contains tools covering the end-to-end lifecycle of predictive machine learning models, including data acquisition, data visualization, data profiling, automated data transformation, feature engineering, model training, model evaluation, model explanation, and capturing the model artifact itself.
The goal is to provide one interface that allows data scientists to seamlessly access their data via a client SDK, whether on the Oracle Cloud or from increasingly diverse sources like corporate data warehouses or big-data technologies like Hadoop. Then, using a rich toolset, improve data scientist productivity by quickly automating work that would normally require hundreds of lines of code and countless hours of experimenting.
Another distinctive feature of ADS is model interpretation. Predictive modeling is supposed to be neutral; in practice algorithms can be just as packed with the same biases built into the real-world data used to create them.
Data scientists that work in fintech, healthcare, or banking are already very familiar with having their models audited to identify potential biases in model predictions. They rely on interpretation and explanation techniques that help avoid unintended consequences.
ADS brings model interpretation and explainability to the data scientists to help them understand why a given prediction was made and which features the model considers when making its prediction. This helps raise the bar with best practices as the field progresses and grows.
Finally, data scientists will delight to learn that ADS includes Oracle’s AutoML capabilities that enable automated feature and sample selection, model selection, and hyperparameter tuning. AutoML automates the process of running tests against multiple algorithms and hyperparameter configurations, then checks them for accuracy and confirms that the optimal model and configuration is selected. This helps data scientists choose the best performing algorithm for a specific use case. This helps data scientists choose the best-fit algorithm for a specific use case.
As more people look at data and gain insights from it, data-driven strategies will continue to gain prevalence and become an integral part of any successful organization. OCI Data Science and the ADS toolkit are on the forefront of bringing greater collaboration and speed to data science projects to ensure they deliver real value to businesses.