Announcing the Availability of VM for Data Science and AI on Oracle Cloud Infrastructure

February 12, 2020 | 4 minute read
Sanjay Basu
Sr. Director, AI/ML GPU Services
Text Size 100%:

With the explosion of business data, ranging from customer data to the Internet of Things (IoT), data scientists need the flexibility to explore and build models quickly, often more quickly and flexibly than traditional on-premises IT hardware can provide.

Oracle Cloud Infrastructure’s VM for Data Science and AI is a preconfigured environment that includes a virtual machine (VM) with an NVIDIA GPU and CUDA and cuDNN drivers, common Python machine learning libraries, JupyterLabs Jupyter Notebooks, and open source machine learning (ML) and deep learning (DL) frameworks like Tensorflow 2.X for GPU, PyTorch, MXnet. There are about 110 ML/DL and visualization  libraries. You can expand your compute resources by using compute autoscaling, or you can stop the compute instance when it’s not needed, to control costs. The VM even includes basic sample data and code for you to test and explore.

This solution is built on Oracle Cloud Infrastructure, with its exceptional performance, security, and control, and enables you to build models and deliver business value faster.

Please checkout the Oracle Cloud Marketplace for more details - Updated on June 1st 2020

https://cloudmarketplace.oracle.com/marketplace/en_US/listing/78643201

Benefits

Following are some immediate benefits of using this solution:

  • All-in-one image: The image includes a complete set of preinstalled tools that you can easily add to and customize, either before deployment with the Terraform script or manually.
  • Quick implementation: Just deploy the preconfigured image and start working. When you’re finished, deleting it is just as easy.
  • Compute shapes that meet your needs: For deep learning model training and inference, use a GPU-based shape (P100 or V100). For machine learning, use a CPU-based shape.
  • Easy to launch: Launch these images yourself in the cloud quickly and easily, without the assistance or intervention of your IT organization.
  • Easy to add resources: Add more compute resources in the cloud quickly and easily, by autoscaling or using Resource Manager.
  • Keep costs low: You can run a model for a day on a Tesla P100 GPU in the cloud for about US$30.

Get Started

Sign up for an Oracle Cloud Free Tier account, and then go to the VM for Data Science and AI page in the Marketplace to launch an image in your tenancy and view the usage instructions.

The VM or BM node is provisioned with either with through Oracle Resource Manager or Core Services API or the Console as you can provision nodes with the Oracle Resource Manager, programmatically, or manually. Immediately after boot, the AI/ML/DL environment is ready for configuration. Configuration can be done locally from accessing the instance across the Internet or from an external node, such as a bastion host.

The sandbox environment "mlenv' is already created and configured with 110 ML libraries and frameworks. This environment should be activated for the AI/ML/DL developer user by running the following commands:

$source mlenv/bin/activate

Next you should reset the password for the Jupyter Notebook (read and Write access over the network):

[mlenv]$jupyter notebook password

This command can be used to reset the password too.

Next you should put a TLS certificate for the Jupyter Notebook Web Interface for security. For quick start we have included a self-signed certificate and configured the Notebook environment. You can put your own self-signed certificate. Commands can be found here - https://jupyter-notebook.readthedocs.io/en/stable/public_server.html

You need to start the Jupyter Server by running

[mlenv]$jupyter notebook --ip=0.0.0.0 --port=8080


The Jupyter environment can accessed through web - https://:8080

The OS firewall is already configured to allow this access.

But if there is any problem in accessing or starting the Jupyter Notebook Server, please make sure the TCP port 8080 in allowed for incoming traffic, by reconfiguring the firewall:

$sudo firewall-cmd --permanent --zone=public --add-port=8080/tcp

$sudo firewall-cmd --reload

$sudo systemctl stop firewalld.service
$sudo systemctl start firewalld.service

The firewall status can be checked by

$systemctl status firewalld.service --l

NOTE: Please make sure your security list and network security group (if used) are configured to allow inbound port 8080.

To list the AI Data Science packages included in the image, please run the command:

[mlenv]$pip3 list

Following hands-on labs  are available in the Example directory under /home/opc:

Examples/deeplearning_bootcamp-master/lab1

Examples/deeplearning_bootcamp-master/lab2

Examples/deeplearning_bootcamp-master/lab3

Examples/deeplearning_bootcamp-master/lab4

The proposed architecture for running DL training workloads on Oracle Cloud is located in the Oracle Cloud Infrastructure Architecture Center.

This image will be updated in conjunction with the kernel updates and major framework updates. We hope that you can use this image to jump start your ML or DL workloads on Oracle Cloud Infrastructure. Please provide feedback in the Comments section of this post.

Related Information

Sanjay Basu

Sr. Director, AI/ML GPU Services

Sanjay focuses on OCI's NVIDIA GPU offerings for large scale model training and inference. He also works with Oracle AI, Blockchain, Microservices along with Cloud Security and Compliance.


Previous Post

Announcing Oracle Cloud Shell

Jonathan Schreiber | 3 min read

Next Post


Ironclad Security Provided by Oracle Cloud Infrastructure File Storage

Mona Khabazan | 3 min read
Oracle Chatbot
Disconnected