Openai From the upgrading of their artificial intelligence models in the field Conversion And Text to speech He announced. These new models added to the company’s API have made significant progress over previous versions and provide more capabilities to developers.
According to Openai, these new models are part of the company’s larger vision for the development of automatic systems or “smart agents” that can independently perform different tasks for users. Open Elieman, Operen’s product manager, explained to Techcrunch that these agents can act as chats that interact with a business customer. He predicted that in the coming months we will see more of these factors.
Model of Converting Text to OpenAI
The new model of converting the text to the Openai speech called Gpt-4O-MINI-TTS not only produces more natural and more detailed speech, but also more accurately adjustable. Developers can control the text using natural language commands.
For example, this model can be asked to speak like a “crazy scientist” or in a quiet sound like a teacher. You can hear two examples of the sounds produced by this model:
Jeff Harris, a member of the Openai product team, has emphasized that developers can adjust both the “voice experience” and the “field” to their liking. She says:
“In different situations, you do not want a uniform sound without feeling. “For example, if you are in a customer support experience and want to apologize, you can order the model to keep this feeling in the sound.”
Speech conversion models to text
OpenAI has also introduced two new models of speech to text called Gpt-4O-Transcribe and Gpt-4O-Mini-Transcribe, replacing the old Whisper model. These new models, trained on a variety of high quality audio data, can identify sounds with different accents and even in crowded environments.
Harris also points out that these models have less “illusion” errors than whisper. The Whisper model sometimes added words or even complete sentences that didn’t exist, which could cause problems. She says:
“These models have improved significantly compared to Whisper. “The accuracy of the models is quite essential to creating a reliable voice experience, and the accuracy here means that the models recognize the words correctly and do not add the details they have not heard.”
However, the accuracy of these models may vary depending on the language that is converted.
Contrary to the previous procedure, Openai does not intend to publish these new models of speech to its text publicly. The company previously released new versions of Whisper under MIT license for commercial use. Harris explained that new models are much larger than Whisper and are therefore not suitable for public release. She says:
“These models are not of a kind that you can easily run on your laptop. “We want to do something carefully if we publish something in the opening way and provide a model that is really appropriate for that special need.”
RCO NEWS




