OCI Speech supports text-to-speech and real-time transcription with customized vocabulary

We’re thrilled to announce our latest features further enhancing the integration experience with the Oracle Cloud Infrastructure (OCI) Speech services. With the recently announced multilingual Whisper model support, OCI Speech now supports text-to-speech (TTS) and real-time speech recognition with customized vocabulary.

Text-to-speech

Do you want to give your application a voice but are tired of repeating the same scripts searching for that perfect voice over? The text-to-speech feature is here to help. Based on neural networks deep learning technology, the TTS feature learns from vast dataset of human speech, capturing subtleties such as intonation, emotion, and rhythm to generate speech that closely mimics natural human expression. For example, a pause is added to the generated voice after a complete sentence followed by the period. You can use the synthetic voice in various ways, including improving accessibility for users with visual impairments, enhancing the immersive experience in gaming, and accelerating the creating of educational content.

This TTS feature is available in the Phoenix region now upon service request and currently supports a female voice in English. We plan to add more voices, languages, and OCI regions based on customer feedback.

You can access the TTS feature from the Oracle Cloud Console. It includes many functions, including Speech Synthesis Markup Language (SSML) support, where you can further control the pause, acronym abbreviation, prosody in the generated voice using tags.

Figure 1: Covert text to speech in Oracle Cloud Console using the TTS feature

Key features and benefits of TTS

TTS in OCI Speech offers the following features and benefits:

Convert your text into natural speech that matches intonation and emotion: You can input text in the Console or through an API request. The neural TTS engine converts the text to a naturally sounding voice as output.
Streaming support: The user can enable steaming for any request and audio will be generated in chunks using HTTPS chunk transfer encoding and start steaming to the client as soon as the first sentence is processed. Audio chunks will be streamed per sentence.
Extensive SSML tag support: To allow more precise control of the generated voice, the TTS feature supports the following tags:
- Sub Tag
- Paragraph & Sentence Tag
- Say As Tag
- Break Tag
- Prosody Tag
- Phoneme Tag
  For more information on how to use these tags, refer to built-in examples from the Oracle Cloud Console.
Software developer kits (SDKs) and API support: Refer to the API document on how to integrate with TTS using SDKs.

Real-time speech recognition

Say goodbye to delays and hello to instantaneous transcriptions! Our real-time speech recognition feature helps ensure that your speech is accurately transcribed as you speak naturally, allowing for seamless and uninterrupted communication. Whether you’re conducting a live meeting or dictating a note, our lightning-fast technology keeps pace with your speech in real-time. Real-time speech recognition with customized vocabulary is currently in beta and will be released between June and August this year.

Customized vocabulary

Every industry and business uses unique terminology and jargon that’s specific to their domain. So, we’re introducing customized vocabulary support, empowering you to tailor the speech recognition engine to your distinct needs by adding those industry-specific terms, technical jargon, or company-specific terminology, as a custom extension improving the overall recognition accuracy. For example, you can add product names and industry acronyms to customized vocabulary to help ensure proper recognition.

Want to know more?

The OCI Speech product team is committed to empowering you with tools that redefine possibilities. We look forward to seeing you using the newly introduced TTS feature in your integration with OCI. Stay tuned for the availability announcement of RTS with customized vocabulary.

If you’re new to Oracle Cloud Infrastructure, try Oracle Cloud Free Trial, a free 30-day trial with US$300 in credits.

For more information, see the following resources:

OCI Speech supports text-to-speech and real-time transcription with customized vocabulary

Text-to-speech

Key features and benefits of TTS

Real-time speech recognition

Customized vocabulary

Want to know more?

Michael Zhang

Senior Principal Product Manager

Bike sharing demand forecasting using OCI Accelerated Data Science

Introducing AI Quick Actions in OCI Data Science

OCI Speech supports text-to-speech and real-time transcription with customized vocabulary

Text-to-speech

Key features and benefits of TTS

Real-time speech recognition

Customized vocabulary

Want to know more?

Authors

Michael Zhang

Senior Principal Product Manager

Bike sharing demand forecasting using OCI Accelerated Data Science

Introducing AI Quick Actions in OCI Data Science