On February 23rd, we announced a new OCI service, Speech. OCI Speech is an Automatic Speech Recognition service that converts speech to text and unlocks the insights in customers’ audio files. Today, we are happy to announce three new capabilities for the Speech service at no additional cost: native support for 8kHz audio files, support for output in the SRT (a closed- caption file format), and automatic punctuation of output text. These new capabilities are now available in all OCI's commercial regions and are part of our commitment to provide high-quality, affordable transcription for our customers.
Until today, customers with audio files recorded at an 8kHz sample rate, typical of telephone recordings, could not use OCI Speech. The AI models behind OCI Speech supported 16kHz and up-sampling audio files from 8kHz to 16kHz typically yields low-quality results. Now, users can upload both 8kHz and 16kHz files, and OCI Speech will automatically identify the sample rate and transcribe the files using the correct models. Customers do not need to make any changes in how they call the service; OCI Speech will simply do the right thing for both 8kHz and 16kHz inputs.
The production and consumption of video content has seen explosive growth in recent years. One way media producers can maximize the value and use of their video content is to add subtitles, both as an accessibility measure for deaf and hard-of-hearing viewers, but also to make the video useful in a wide range of other scenarios such as on screens in noisy public environments. A common format for subtitles is SubRip files (.SRT). Previously, customers had to generate their own .SRT files from the OCI Speech service’s JSON output, adding additional steps to their publishing pipeline. Since the launch of the OCI Speech service, customers have expressed a strong desire for built-in .SRT support, and we listened! You can now request OCI Speech to generate .SRT files either through the transcription job UI, or through the API, and the service will add the .SRT files to your output bucket.
With this release we also enable punctuation in all supported languages (English, Spanish and Portuguese). OCI Speech will now transcribe and automatically add basic punctuation marks (commas, periods, and question marks) so that the output resembles the quality of human transcription. Adding punctuation does not increase transcription costs. You can always turn off punctuation in the UI or API if you desire so. Punctuations will now appear in the SRT and JSON files. In the JSON files, punctuations are marked as a new type of token as highlighted in the image below.
Log into your account and use your five monthly hours of free transcription to give these new features a try.
Guy Michaeli is senior principal product manager for OCI Speech. He has over 20 years of experience in engineering, product evangelism and product management. In 2021, Guy joined the Oracle Cloud Infrastructure AI team and is the product lead for Speech services.
Previous Post
Next Post