Computer vision examples using Oracle's data science platform

April 13, 2021 | 8 minute read
Henry Yin
Principle Solution Engineer, Technology, China
Text Size 100%:

Data science is nothing new to us; we see it in every area of our lives, from self-driving cars to stock trading. To accelerate customers' exploration of data science, Oracle offers a complete data science platform that supports different forms of AI, including machine learning and computer vision. In today's content, we'll explore Oracle's data science solutions, then apply them to three computer vision examples for object detection and video composition. 


Explore the data science platform

Oracle Cloud Infrastructure (OCI) Data Science provides a complete environment for data science exploration. It doesn't need tedious installation and configuration, and you don't need to worry about compatibility between various components. You can quickly start exploring in OCI Data Science notebooks with just a few clicks. 

Another data science exploration tool from Oracle is the AI (All-in-One) GPU Image for Data Science. When creating an OCI computing instance, select the Image as shown in the image below.


AI (All-in-One GPU Image Data Science

In order to obtain better performance, you can choose the GPU shape in OCI when creating an instance. If you use the trained model and the performance requirements are not high, you can choose CPU shapes. However, when executing relevant scripts, specify the program to run on CPUs; otherwise, the program will report an error. 


This AI All-in-One environment has all the components commonly used in AI and machine learning, and it has optimized the compatibility between each component. We can directly install an object detection module in the environment.

Subscribe to the Oracle Data Science Newsletter to get the latest data science and machine learning content delivered to your inbox! 


Use the existing model of Detectron2 to identify objects in the image

Detectron2 is an open-source project for detecting and tagging regular objects in images using existing models, as well as for model training on user-defined data sets to detect and tag specific items.

In the following figure, we use its trained model to detect and mark a picture that we randomly choose. The identified object is marked with a color mask, a box is applied to the object, and the category name and confidence of the object are marked. For object detection, the processing time for a single image on a CPU environment was in the tens of seconds under the i7 series' four cores CPU environment. If you are doing batch processing, even with multi-core CPU processors, it is hard to be satisfied with the processing speed. However, if you use the GPU environment on OCI, even if you use the basic shape of GPU3.1, there is still high performance in batch image processing.

Airplane image detection

In Detectron2, there are trained models for land space detection and markers for complex environments. However, when marking multiple objects in complex environments, the requirements for computing performance become more stringent, so you should use GPUs to improve the processing speed.

Land space detection in complex environments

Another trained model of Detectron2 is body keypoint detection. As shown in the figure below, you can not only detect the person in the image or video, and add tags and confidence, but you can also mark the key points of the person's torso. This technology plays an important role in autonomous driving and analyzing human behavior.

Body keypoint detection


Object detection and labeling using custom dataset trained models

Using a model trained by Detectron2 to detect objects cannot meet our needs under normal circumstances. When we use computer vision, we always want to use our own data set to train the model, and then let the model help us detect the objects in the image.

Let's say we wanted to create an application to help us measure various cell unit densitites in a blood sample. We first set up our own training data set, which is a set of images, and use Coco or other labeling tools to mark the objects in the images. These images and tag files (typically an XML file) are used as a training data set. If you are interested, you can build your own Coco marking system and create your own data set.

Blood cell markers computer vision

When you've created the data set for the training model, we can begin the training action. At this time, you need GPUs for training. It took 26 hours to train 20 images, with two tags on each image, using 32 cores CPU @2.2GHz and 64GB of RAM, compared to less than 2 hours using OCI GPU3.1. If you have a large training data set, it is recommended that you use highly configured GPU shapes, and let multiple GPUs work together; this will improve the processing performance and reduce the training time.

The following are the results of several trained models with custom data sets. The first is mask detection, which identifies the face of a recognizable person in the image. We can limit the identification degree of people by the threshold value. For example, the biker in the middle left of the image below cannot distinguish facial features due to a long distance, so this biker is not judged in this marking.

Detect who's wearing a mask and who isn't

The following example is the detection of potholes on the road surface, which can be connected with the on-board camera equipment. After detecting the potholes on the road surface, the report with potholes location information is generated by combining with the position information given by the vehicle satellite positioning system and submitted to the road maintenance department for road maintenance.

Detect potholes on roads

The following is a scenario for wildfire detection. Object detection systems have been used instead of traditional smoke sensors to sense fires for many years. In the tunnel, video surveillance is used instead of traditional smoke detectors to detect fires inside the tunnel. Traditional tunnel smoke detection needs to install a large number of smoke detectors, and the smoke detectors' perception scope is quite limited. A large number of smoke detectors are not only expensive to install, but also a huge burden on local governments to maintain. The use of video monitoring instead of traditional sensors improves the accuracy of fire alarm judgment while reducing the number of monitoring points and reducing the cost of equipment procurement and maintenance.

Wildfire detection using computer vision


CLIP: The pre-training model is used for object detection

Recently, Open AI's CLIP is very popular in the field of object detection. It uses 500,000 queries to get 400 million text-image pairing data sets and trains them with comparative learning targets. The image and text are encoded respectively, and then the similarity is calculated in pairs, then each image or text is classified to get matching results. Input picture, then output text description. In the diagram below, you can see that the performance of CLIP is outstanding.

OpenAI CLIP performance

In terms of the CLIP code itself, those with hands-on experience in deep learning can reproduce all the codes by themselves through the paper. But one of the difficulties in the process of reproduction is the 400 million text image matching data sets obtained from 500,000 queries.

OCI's AI (All-In-One) GPU Image for Data Science already contains the basic installation package to be used by Clip, so you can use it directly by downloading the project from GitHub.  If you are processing a single image, a CPU shape will be fine, and if you are processing a large number of images, a GPU shape will get a big performance boost.

In the following example, we input an image of a puppy and then use CLIP to identify the object in the image, then output the similarity between the image and the three text descriptions. In the following figure, we can see that CLIP has accurately identified the puppy.

CLIP correctly identifites the image as a puppy


Video synthesis on Oracle Cloud Infrastructure

The NeurIPS 2019 paper "First Order Motion Model for Image Animation" describes how to make static images move. We can still implement the paper on OCI's AI (All-in-One) GPU Image for Data Science.

The Python in AI (All-In-One) GPU Image for Data Science is a good choice; it uses the face video data set VoxCeleb for training. Due to a large amount of computation, it takes a long time to complete the model training using the GPU on a personal computer. Therefore, it is a good choice to use the OCI GPU Shape, which is cheap and powerful.

New features in OCI Data Science and the new generation of the AI (all-in-one) GPU Image for Data Science will be released in the near future, which will provide more powerful features. Stay tuned.

Explore data science at Oracle

Henry Yin

Principle Solution Engineer, Technology, China

Previous Post

A deepfake example on Oracle Cloud Infrastructure

Maria Patelkou | 9 min read

Next Post

Announcing Oracle Cloud Infrastructure Language for AI-powered text analysis

Shahid Reza | 6 min read