Blogs about Deep Learning, Machine Learning, AI, NLP, Security, Oracle Traffic Director,Oracle iPlanet WebServer

Image Segmentation

In computer vision, image segmentation is the process of partitioning an image into multiple segments and associating every pixel in an input image with a class label.

Semantic segmentation algorithms are used in self-driving cars.

I got intrigued by this post by Lex Fridman on driving scene segmentation. I wanted to see if it works on difficult and different Indian terrain.

So I have created a short video of Tawang, in Arunachal Pradesh  India. The video is of duration 16 seconds and it contains around 325 image frames.

Refer this ipython notebook I have used. It is based on tutorial_driving_scene_segmentation.ipynb. It downloads 'mobilenetv2_coco_cityscapes_trainfine' model from tensor flow.

Different models in Tensor Flow deeplab are given in the link https://github.com/tensorflow/models/blob/master/research/deeplab/g3doc/model_zoo.md

MobileNet-v2 based model has been pre-trained on MS-COCO dataset and does not employ ASPP and decoder modules for fast computation. You can choose any pre-trained Tensor Flow model that suits your need. 

This model classifies pixels into the following classes

'road', 'sidewalk', 'building', 'wall', 'fence', 
'pole', 'traffic light', 'traffic sign', 'vegetation', 'terrain', 
'sky',  'person', 'rider', 'car', 'truck',
'bus',  'train', 'motorcycle', 'bicycle'

Color code of the classes is shown below

For all the pictures same color coding is used for the same classes. I ran the model on a two images first, here is the output

then I ran the model on the whole video. Here is the link to the output video.

Comparisons of some of the images and their output is shown in the table given below

Original Image Output Image Comments
It has detected jeep properly (in deep blue color) as vehicle. It has detected road as sidewalk (in light pink color)
It has detected person (in red) correctly. It was able to detect road as road/sidewalk (in purple color) correctly. However it wrongly classified edges of the image (in red color) as person. 
It has detected person (in red) correctly. It was able to detect road as road/sidewalk (in purple color) correctly. 
A lot of this image is detected as terrain (in light green color) !
Some part of mountain is detected incorrectly as sidewalk (deep purple color)
It has detected person (in red) correctly. It was able to detect road as road/sidewalk (in purple color) correctly. Its not able to classify the lake or mountain.
It has detected persons correctly (in red color). Road is detected as road/sidewalk correctly (in pink/purple color). It has misclassified colorful flags on the sides of the image as person.


  • The model is trained on city images but the input I gave was of a mountainous region. We may get better results if we gave city videos.
  • We are using 2D segmentation model. In 3D segmentation we will get better results. It will be able to guess the depth.
  • We are running model per image in the video. We are not taking advantage of information in 2 consecutive frames and correlating them.
  • Model wasn't trained mountains or lakes as they were not in the training set output classes. 
  • Input video is not taken from car dashboard as in training set. We may get better results if we do so.
  • If the objects are far off accuracy of prediction may not be good.
  • Some roads are not cemented or tarred but are mud roads. And in these places there is no road/pavement distinction. May be the model was trained on better proper roads we would get better results. 

This blog is also posted on my personal website here.


Be the first to comment

Comments ( 0 )
Please enter your name.Please provide a valid email address.Please enter a comment.CAPTCHA challenge response provided was incorrect. Please try again.