We are excited to announce the General Availability of OCI’s Speech service, an Automatic Speech Recognition (ASR) service. OCI Speech unlocks the data in audio (and video) files and together with the other OCI offerings in big data, ML and AI, allows customers to gain insights that were hidden until now.
Speech service overview
Oracle Cloud Infrastructure (OCI) Speech is an AI service that applies ASR technology to transform audio-based content to text. Developers can easily make API calls to integrate Speech’s pretrained models into their applications. You can use OCI Speech service for accurate, text-normalized, timestamped transcription via the Console, REST API, CLI, or SDKs. In addition, you can use the Speech service in a Data Science notebook session. With Speech you can filter profanities, get confidence scores for single words (or the whole transcription), use with multiple languages and transcribe multiple files with only one API call.
Speech launch capabilities
Support for multiple languages: with our GA you will be able to transcribe audio files in English, Spanish or Portuguese.
The OCI Speech service is designed to seamlessly integrate existing customer solutions through the UI, REST APIs, SDK, and CLI. Furthermore, OCI Speech users can take advantage of batching support and submit multiple files with one call.
Blazing fast processing.
Text Normalization provides a more readable text that resembles how humans write. E.g., Speech would convert the audio “this laptop costs one thousand three hundred and fifty-five dollars” to “this laptops costs $1355”. Additionally, Speech will normalize addresses, time, numbers, URLs and more.
Word Filtering (Profanity) – Speech does profanity filtering where it can either remove (replace with star), mask (replace all but the first character with star) or tag (leave it but tag it) profanity in output text.
Job Canceling allows users to cancel their jobs even after submitting them (and while the job is not processed or done).
Confidence Score per word/transcription
Customer Benefits
Seamless integration: Speech is designed to integrate with existing customer solutions through the UI, REST APIs, SDK, and CLI. Users can also use Speech in Data Science notebooks.
Security: Audio files are not retained after processing (unlike some other cloud providers).
Zero ramp up time: Speech pretrained models allows users to leverage Automatic Speech Recognition (deep learning-based speech to text models) without any initial investment in data or model training.
Batch processing: Customers working with larger volumes of data can transcribe audio files asynchronously in batches.
Fully Managed Service: Customers don't have to worry about the choice of infrastructure hosted for model training and inference.
Here is how you get started with Speech:
Navigate to the Analytics & AI menu in the OCI console.
Guy Michaeli is senior principal product manager for OCI Speech. He has over 20 years of experience in engineering, product evangelism and product management. In 2021, Guy joined the Oracle Cloud Infrastructure AI team and is the product lead for Speech services.