Recently Oracle announced three new interesting Data cloud services, part of the Oracle Cloud Data Science Platform. These are Oracle Cloud Infrastructure Data Science Cloud Service, Oracle Cloud Infrastructure Data Catalog and Oracle Cloud Infrastructure Data Flow Services. Additionally, Oracle is planning to make available the Oracle Big Data Service at the end of current month. In total, the Oracle Cloud Data Science Platform has seven new services, with Oracle Cloud Infrastructure Data Science at the core.
The cloud service based on acquisition of leading data science platform DataScience.com at May 16, 2018. Oracle Cloud Infrastructure Data Science built with the goal of making data science collaborative, scalable, and powerful for every enterprise on Oracle Cloud Infrastructure. Oracle Cloud Infrastructure Data Science makes data science more efficient by offering:
Access to data and open-source tools:
Ability to utilize compute on demand:
Customers can select the amount of compute you need to train your model on Oracle Cloud Infrastructure. They can choose from small to large CPU virtual machines. In the near future, Oracle are planning to add GPUs.
Teams of data scientists can work together in a collaborative workspace with features for granular access control and security, centralizing and organizing data science assets all in one place.
Oracle Cloud Infrastructure Data Science has:
Projects to centralize, organize, and document a team’s work.
Notebook Sessions for Python analyses and model development.
Accelerated Data Science (ADS) SDK to make common data science tasks faster, easier, and less error-prone. This Python library offers capabilities AutoML for automated model training.
Model Catalog to enable model auditability and reproducibility.
You can get started Data Science journey from here:
Oracle Cloud Infrastructure Data Catalog is a catalog for all metadata in your cloud as a data lake. By applying metadata to everything within the cloud, data discovery and governance become much easier tasks. By applying metadata and a hierarchical logic to incoming data, datasets receive the necessary context and trackable lineage to be used efficiently in workflows.
We can use the analogy of notes in a researcher’s library. In this library, a researcher gets structured data in the form of books that feature chapters, indices, and glossaries. The researcher also gets unstructured data in the form of notebooks that feature no real organization or delineation at all. A data catalog would take each of these items without changing their native format and apply a logical catalog to them using metadata such as date received, sender, general topic, and other such items that could accelerate data discovery.
This means that data scientists, data analysts, and data engineers—can all find data across systems and the enterprise more easily because a data catalog provides a centralized, collaborative environment to encourage exploration.
You can get started with Oracle Data Catalog with video and step-by-step instructions here
A fully-managed Big Data service that allows users to run Apache Spark applications with no infrastructure to deploy or manage. Data Flow is a cloud-based serverless platform with a rich user interface. It allows Spark developers and data scientists to create, edit, and run Spark jobs at any scale without the need for clusters, an operations team, or highly specialized Spark knowledge. Being serverless means there is no infrastructure for you to deploy or manage. It is entirely driven by REST APIs, giving you easy integration with applications or workflows. You can:
You can start to study and use this new service at https://docs.cloud.oracle.com/en-us/iaas/data-flow/using/dfs_data_flow.htm
All these new Oracle Cloud Infrastructure services are new great instruments to create new solutions and offering for Oracle customers. If you are looking for new ideas based on AI & Data Science technologies, you can also take a look on the Oracle Data Science blog at https://blogs.oracle.com/datascience/