@OracleIMC Partner Resources & Training: Discover your Modernization options + Reach new potential through Innovation

New Oracle Cloud Infrastructure Data Science platform provides new business instruments for Partners

Evgeny Pleskach
Cloud Adoption & Implementation Consultant

Recently Oracle announced three new interesting Data cloud services, part of the Oracle Cloud Data Science Platform. These are Oracle Cloud Infrastructure Data Science Cloud Service, Oracle Cloud Infrastructure Data Catalog and Oracle Cloud Infrastructure Data Flow Services. Additionally, Oracle is planning to make available the Oracle Big Data Service at the end of current month. In total, the Oracle Cloud Data Science Platform has seven new services, with Oracle Cloud Infrastructure Data Science at the core. 

Oracle Cloud Infrastructure Data Science

The cloud service based on acquisition of leading data science platform DataScience.com at May 16, 2018. Oracle Cloud Infrastructure Data Science built with the goal of making data science collaborative, scalable, and powerful for every enterprise on Oracle Cloud Infrastructure. Oracle Cloud Infrastructure Data Science makes data science more efficient by offering:

Access to data and open-source tools:

  • Tools and languages like Python and JupyterLab
  • Visualization like Plotly and Matplotlib
  • Machine-learning libraries like TensorFlow, Keras, SciKit-Learn, and XGBoost
  • Version control with Git

Ability to utilize compute on demand:

Customers can select the amount of compute you need to train your model on Oracle Cloud Infrastructure. They can choose from small to large CPU virtual machines. In the near future, Oracle are planning to add GPUs.

Collaborative workflow:

Teams of data scientists can work together in a collaborative workspace with features for granular access control and security, centralizing and organizing data science assets all in one place.

Model deployment:

  • Ability to train large models on large amounts of data with minimal infrastructure expertise
  • Evaluate and monitor models throughout their lifecycle
  • Improved productivity through automation and streamlined workflows
  • Capabilities to deploy models for varying use cases

Oracle Cloud Infrastructure Data Science has:

Projects to centralize, organize, and document a team’s work.

Notebook Sessions for Python analyses and model development.

Accelerated Data Science (ADS) SDK to make common data science tasks faster, easier, and less error-prone. This Python library offers capabilities AutoML for automated model training.

Model Catalog to enable model auditability and reproducibility.

You can get started Data Science journey from here:

  1. Service Overview: You can get overview of this new cloud service in this short video
  2. Configuration instructions:


Oracle Cloud Infrastructure Data Catalog

Oracle Cloud Infrastructure Data Catalog is a catalog for all metadata in your cloud as a data lake. By applying metadata to everything within the cloud, data discovery and governance become much easier tasks. By applying metadata and a hierarchical logic to incoming data, datasets receive the necessary context and trackable lineage to be used efficiently in workflows.

We can use the analogy of notes in a researcher’s library. In this library, a researcher gets structured data in the form of books that feature chapters, indices, and glossaries. The researcher also gets unstructured data in the form of notebooks that feature no real organization or delineation at all. A data catalog would take each of these items without changing their native format and apply a logical catalog to them using metadata such as date received, sender, general topic, and other such items that could accelerate data discovery.

This means that data scientists, data analysts, and data engineers—can all find data across systems and the enterprise more easily because a data catalog provides a centralized, collaborative environment to encourage exploration.

You can get started with Oracle Data Catalog with video and step-by-step instructions here

Oracle Cloud Infrastructure Data Flow

A fully-managed Big Data service that allows users to run Apache Spark applications with no infrastructure to deploy or manage. Data Flow is a cloud-based serverless platform with a rich user interface. It allows Spark developers and data scientists to create, edit, and run Spark jobs at any scale without the need for clusters, an operations team, or highly specialized Spark knowledge. Being serverless means there is no infrastructure for you to deploy or manage. It is entirely driven by REST APIs, giving you easy integration with applications or workflows. You can:

  • Connect to Apache Spark data sources.
  • Create reusable Apache Spark applications.
  • Launch Apache Spark jobs in seconds.
  • Create Apache Spark applications using SQL, Python, Java, or Scala.
  • Manage all Apache Spark applications from a single platform.
  • Process data in the Cloud or on-premises in your data center.
  • Create Big Data building blocks that you can easily assemble into advanced Big Data applications.

You can start to study and use this new service at https://docs.cloud.oracle.com/en-us/iaas/data-flow/using/dfs_data_flow.htm

All these new Oracle Cloud Infrastructure services are new great instruments to create new solutions and offering for Oracle customers. If you are looking for new ideas based on AI & Data Science technologies, you can also take a look on the Oracle Data Science blog at https://blogs.oracle.com/datascience/


Join the discussion

Comments ( 1 )
Please enter your name.Please provide a valid email address.Please enter a comment.CAPTCHA challenge response provided was incorrect. Please try again.