Oracle Speech AI service now supports diarization

September 18, 2023 | 2 minute read
Michael Zhang
Senior Principal Product Manager
Text Size 100%:

We’re excited to announce that the Oracle speech service now supports the diarization feature in a selective region (US West Phoenix) with Limited Availability. You can enable diarization when creating an asynchronous job and select between 2 to 16 speakers.

A screenshot of the diarization tool

Diarization

Diarization is the process of segmenting and labeling an audio recording into distinct speaker segments. Each segment corresponds to a specific speaker within the recording. Embedding plays a crucial role in diarization by helping to differentiate and recognize individual speakers based on their unique characteristics. As part of the transcription process, speaker embedding is added to each segment based on the features extracted and the trained artificial intelligence (AI) model. Then, through classification or clustering, distinct speakers are identified, and each segment is labelled with the speaker id in final output.

In the era of seamless communication, diarization enhances the understanding and utilization of audio content in various domains, making it a valuable tool for organizing, analyzing, and extracting meaningful information from spoken interactions. Diarized transcripts are widely used in various scenarios from medical transcription to language learning, virtual agent to law enforcement, call center optimization to content creation and market research.

Want to learn more?

Contact your Oracle representative to discuss how Oracle Cloud Infrastructure Speech with diarization can help you unlock the value of your multimedia data and gain the insight you need to bring your business to the next level.

If you’re new to Oracle Cloud Infrastructure, try Oracle Cloud Free Trial, a free 30-day trial with US$300 in credits. For more information, see the following resources:

 

 

Michael Zhang

Senior Principal Product Manager

As senior principal product manager, Mike Zhang owns the Oracle Cloud Infrastructure (OCI) speech roadmap. With over 20 years of experience in high tech, Mike brings a wealth of knowledge spanning from engineering to product management across a diverse range of industries. Most recently, he was director of product management at Microsoft/Nuance, focusing on the healthcare industry. Prior to that, he held senior product management positions at Dell/EMC, NetApp, and Broadcom/CA Technologies.

Show more

Previous Post

Efficient feature management for machine learning: an introduction to Feature Stores

Srikanta Prasad (Sri) | 16 min read

Next Post


Hello world! OCI Generative AI is here!

Luis Cabrera-Cordon | 4 min read