OpenAI has launched the Whisper API, a hosted version of the open-source Whisper speech-to-text model it launched in September. Whisper is priced at US$0.006 per minute. It is an automated speech recognition system that OpenAI claims enables "robust" transcription in multiple languages and translation from those languages into English. Receives files in a variety of formats including M4A, MP3, MP4, MPEG, MPGA, WAV and WEBM
Also Read: Hexa Receives $20.5 Million Investment
The Whisper API is the same great model you can get as open source
Many organizations have developed highly capable speech recognition systems that are at the heart of the software and services offered by tech giants like Google, Amazon and Meta. It has reportedly been trained on 680,000 hours of multilingual and "multitasking" data collected from the Web, leading to better recognition of unique accents, background noise and technical jargon.
However, Whisper has its limitations. The system is trained on large amounts of noisy data. So OpenAI warns that Whisper can add words to its transcriptions that are not actually spoken. Whisper may also not perform equally well across languages, as it suffers from a higher error rate when it comes to speakers of languages that are not well represented in the training data.
This last part is nothing new in the world of speech recognition. Biases have long plagued even the best systems, according to a 2020 Stanford study that found systems from Amazon, Apple, Google, IBM and Microsoft. Despite this, OpenAI sees Whisper's transcription capabilities being used to improve existing applications, services, products and tools. The AI-powered language learning app Speak is already using the Whisper API to power a new in-app virtual speech assistant. If OpenAI can tap into the huge speech-to-text market, it could be highly profitable for the Microsoft-backed company.
No comments yet for this news, be the first one!...