DLS V2 Launch Announcement

January 21, 2022 | 4 minute read
Praveen Patil
Principal Product Manager - Data Science
Text Size 100%:

OCI Data Labeling service was launched in Oct’2021 and based on feedback received from customers we are excited to announce the release of V2 for OCI Data Labeling service. This release involved launching new features and making several UX and UI enhancements and fixes. Here are the capabilities that were released as part of V2 launch

1) Consolidated Export file – Export annotated data records into single JSON format file. This single consolidated consists of metadata associated with data set and annotations for data records (images, text or documents) within the data set. The single file makes it easier to use to build custom instead of 2 export files (one was data set details and another was annotations with data records) that were part of GA launch. The export snapshot API supports both the options - separate files as well as the new consolidated file format. Below are examples for Object detection and NER annotated output formats for Images and Text respectively.

 

Image Object detection consolidated format Text entity extraction consolidated format

 

While exporting annotated data records, customers can now click on “Consolidate dataset and record file into single file” option as shown in console screen shot below

 

 

2) Standard Export file formats – Once data has been annotated, the service now allows users to export the records (images or text) into various standard formats. This way, customers do not have to perform data transformations depending on the ML model used for custom training. For object detection annotation format within images, export the labeled images into YOLOv5, COCO or PASCAL VOC files depending on type of computer vision model that will be built. For eg. COCO is common format used for object detection in computer vision such as R-CNN, Fast R-CNN. For Text based NER annotation type, data can be exported into spaCy or CoNLL formats. While exporting annotated data records, customers will need to click the “Export format” option and select corresponding file formats from the drop down menu.

 

3) Overlapping Annotations for NER: For text data with NER annotation type, the service will now allow highlighting the text in 3 ways (to support the upcoming custom model training for Language service):

 

Overlap - where you can annotate overlapping phrases.

Multiple - where you can annotate the same piece of text, with different labels, up to 4 times.

Multilevel - where you can annotate subtexts of already annotated pieces of text, up to 15 times. For e.g while highlighting entities within “University of Texas” customers can now highlight whole span of words belonging to entity “Organization” and just Texas belonging to entity “State”

 

 

4) UX/UI Enhancements:

  • Create labeling instructions for data labelers as part of dataset creation process
  • Ability to drag the panels (Label box, Tool box, Shortcuts) for Labelers preference
  • Ability to add new Labels from within Labeling UI

 

5) Usability Enhancements:

  • Snapshot path shows the link to latest exported annotated dataset in object storage
  • Create object storage bucket while uploading files from local directory during data set creation process
  • Workload request (record generation and export functionality) will now show gradual completion percentage

 

Stay tuned for exciting new feature announcements in upcoming months within Labeling service

 

Try Oracle Cloud Free Tier!
A 30-day trial with US $300 in free credits gives you access to the OCI Data Labeling Service. Ready to learn more about the OCI Data Labeling service?

  • Visit our service documentation.
  • Star and clone our new GitHub repo! We’ve included notebook tutorials and code samples.
  • Watch our tutorials on our YouTube playlist.
  • Subscribe to our Twitter feed.

 

Praveen Patil

Principal Product Manager - Data Science

Currently working as Product manager associated with Data & AI group within Oracle Cloud Infrastruture. 

Prior to moving to Product role I was a practitioner in Data science space. Over the years my experience has been in applying Advanced analytics and Data science methodologies to various domains - Financial services, Teleco, Entertainment & Gaming and Cloud Business 


Previous Post

Want more AI in production? New Oracle services tap data you already have

Aaron Ricadela | 3 min read

Next Post


JupyterLab Git Extension

Wendy Yip | 8 min read