Openai From the upgrading of their artificial ielligence models in the field Conversion And Text to speech He announced. These new models added to the company’s API have made significa progress over previous versions and provide more capabilities to developers.
According to Openai, these new models are part of the company’s larger vision for the developme of automatic systems or “smart ages” that can independely perform differe tasks for users. Open Elieman, Operen’s product manager, explained to Techcrunch that these ages can act as chats that ieract with a business customer. He predicted that in the coming mohs we will see more of these factors.
Model of Converting Text to OpenAI
The new model of converting the text to the Openai speech called Gpt-4O-MINI-TTS not only produces more natural and more detailed speech, but also more accurately adjustable. Developers can corol the text using natural language commands.
For example, this model can be asked to speak like a “crazy scieist” or in a quiet sound like a teacher. You can hear two examples of the sounds produced by this model:
Jeff Harris, a member of the Openai product team, has emphasized that developers can adjust both the “voice experience” and the “field” to their liking. She says:
“In differe situations, you do not wa a uniform sound without feeling. “For example, if you are in a customer support experience and wa to apologize, you can order the model to keep this feeling in the sound.”
Speech conversion models to text
OpenAI has also iroduced two new models of speech to text called Gpt-4O-Transcribe and Gpt-4O-Mini-Transcribe, replacing the old Whisper model. These new models, trained on a variety of high quality audio data, can ideify sounds with differe acces and even in crowded environmes.
Harris also pois out that these models have less “illusion” errors than whisper. The Whisper model sometimes added words or even complete seences that didn’t exist, which could cause problems. She says:
“These models have improved significaly compared to Whisper. “The accuracy of the models is quite esseial to creating a reliable voice experience, and the accuracy here means that the models recognize the words correctly and do not add the details they have not heard.”

However, the accuracy of these models may vary depending on the language that is converted.
Corary to the previous procedure, Openai does not iend to publish these new models of speech to its text publicly. The company previously released new versions of Whisper under MIT license for commercial use. Harris explained that new models are much larger than Whisper and are therefore not suitable for public release. She says:
“These models are not of a kind that you can easily run on your laptop. “We wa to do something carefully if we publish something in the opening way and provide a model that is really appropriate for that special need.”



