Elevenlabs has introduced the latest model of text conversion to speech as Eleven V3 (Alpha) whose main focus is on producing artificial sounds by expressing more natural and realistic emotions. This model also supports Farsi.
This model has the ability to reconstruct a more natural emotions such as whispering, laughter, sighing and emotional reactions than previous versions. The main purpose of Elevenlabs in this version was to fix the problem of transmitting emotions in artificial sounds; Because previously only the quality of sound was important, but now with complete redesign, this model can produce voices with more realistic emotions and more natural reactions.
Eleven v3 model capabilities
One of the highlights of the Eleven V3 is the support of more than 70 languages, including Farsi, as well as the possibility of natural and smooth implementation of multiplayer conversations. Users can send the structure to the model through the new API, which includes the turn of each speaker, and the automatic model of managing the speaker’s turnover takes the emotional developments and even disconnects. This feature is very useful for the production of multidisciplinary dialogues and enables the production of complex and natural conversations.
Controlling the sound of sounds through the audio in -text audio labels is another important feature of the Eleven V3. These labels, written in the form of lowercase barks and lowercase letters, such as Sighs, EXCITED, or Whispers, allow users to adjust different emotions and speech tone directly. Even several labels can be put together to create a more precise and delicate expression; For example we do it! (happily) (shouts) (laughs) (“We have succeeded! (Happiness) (Cry) (Laughter)”).
According to Elevenlabs, the model is mostly designed for professional applications such as film production, audiobooks and digital media, and the final version of the public API will be released soon. The Eleven V3 is currently available on the company’s website and will be offered 80 % discount by the end of June. However, for real -time uses or live conversations, V2.5 Turbo or Flash models are still recommended; Because the current version of the V3 is not optimized for these cases and its real -time version is under development.
Also, Professional Voice Clones are not fully compatible with this version and provide less quality than previous versions; Therefore, for projects that require new expression features, the use of instant voice clones or sounds is recommended.
RCO NEWS



