Overview
Prebuilt AI models can save businesses valuable time and resources by transforming the interactions between systems, employees, and customers without needing to create custom large language models. An example of this could be an organization that is faced with a language barrier when training individuals via onboarding videos or demonstrations. By merging the latest enhancements in the Oracle Cloud Infrastructure (OCI) AI Speech and OCI AI Language, you can now architect the transcription, translation, and speech between various languages. Coupled with the recently released OpenAI Whisper model, we are now able to support spoken translation from 52 languages with various voice models.
In this blog, I will discuss how to combine Oracle tools to translate various spoken languages into spoken English at scale—without requiring any prior machine learning (ML) knowledge. This end-to-end flow is also available for you to try at no cost using Oracle Cloud Free Trial.
Workflow
At a high-level, we will access the Oracle Services Network through an API ingestion point to perform speech-to-speech translation. We will start by performing the transcription of an uploaded audio/video file through OCI Speech, which will then be stored in an object storage bucket. Next, we’ll translate this JSON file from your chosen language into English through OCI Language. Finally, the output translation file will be fed into OCI Speech for the text-to-speech process using one of Oracle’s natural speaking models.

For demonstration purposes, we’ll be showing how to preform French to English speech to speech through the OCI console, but these services can be accessed through OCI software developer kits (SDKs) and REST APIs.
Pre-Requisites
To use AI Speech, AI Language, and Object Storage, an administrator must grant access through the associated IAM policies.
- Enable any user access to OCI Speech – AI Speech policies
- allow any-user to manage ai-service-speech-family in tenancy
- Enable any user access to OCI Language – Language policies
- allow any-user to manage ai-service-language-family in tenancy
- Enable OCI Language async jobs access to object storage, create a Dynamic Group with the matching rule below
- All {resource.type=’ailanguagejob’}

Building Speech to Speech Translation
- Create an object storage bucket and upload media file of your choice. The OCI AI Speech service accepts the following formats: AAC, AC3, AMR, AU, FLAC, M4A, MKV, MP3, MP4, OGA, OGG, WAV, and WEBM.
- Transcription: Under AI Services click on ‘Speech’ and create your first “Transcription job”. Use the uploaded file from the object storage bucket created in step 1 and choose the model type (comparisons found here). Then choose the language to be transcribed and click next to upload the desired file.

- Translation: Under AI Services click on ‘Language’ and under the ‘Jobs’ tab, click “Create Job”. There are various ways to use OCI Language, but the feature type we will be using is Pretrained language translation. You will the pick the source language used in the original file, then choose English as the target. After reviewing, the job will run and when processed you can find the translated file in the destination bucket.

- Speech: We’ll upload the output from step 3 back into OCI Speech to produce human-like speech in one of six variations with audio sample rate ranging from 8000 and 48000 hertz. Using the console directly, a script upload has a limitation of 5,000 characters but through batch upload, jobs can contain up to 100 records and 20,000 characters total per job. After this, the generated file will be available for download once the process is completed.

Conclusion
For demo purposes, we have now successfully transformed a French-speaking tutorial into an English-speaking tutorial available with English subtitles that was all completed within minutes. This demonstration is just one example use case, implemented with manual entries. However, by leveraging other OCI tools—such as OCI Functions, Oracle Integration Cloud, and others—you can streamline data flow and unlock a broader range of possibilities within OCI.
Try it yourself
If you are new to Oracle Cloud Infrastructure, services are available for you to try with a free 30-day trial with US$300 in credits on Oracle Cloud Free Trial. If you would like to find out more visit the links below.
