Stability AI, which is best known for its AI-based text-to-image conversion tools, has unveiled a new tool called Stable Audio that converts text io sound or music.
Diffusion models can create audio clips with a certain duration, which is not suitable for music creation; Because songs are not fixed in time. However, the new Stability AI tool can create audio clips with differe durations. To achieve this goal, the company trained its model with music and also added text metadata for the start and end time of the song.
Stable Audio artificial ielligence can create audio files with differe durations
In the past, similar tools were trained with 30-second audio clips and could only create 30-second files of arbitrary parts of the song. But the new Stability AI tool allows you to have more corol over the duration of the song.
The company said in its stateme that it coinues to train this model to improve the quality of its output:
“Stable Audio represes advanced audio production research by the Stability AI generator audio research lab Harmonai. “We coinue to improve our model architecture, datasets, and training methods to improve output quality, corollability, output delivery speed, and output duration.”
According to Stability AI, the Stable Audio AI model was trained with a dataset coaining more than 800,000 audio files of songs, sound effects and musical instrumes. Additionally, text metadata from AudioSparx is used. In total, the new Stability AI model has been trained with more than 19,500 hours of audio.
This artificial ielligence model is available to users in three versions:
- Free version with the possibility of making 20 audio clips of maximum 45 seconds per moh
- Professional version for making 500 audio clips of maximum 90 seconds for $11.99
- Eerprise version
In the free version, it is not possible to use the created songs commercially.




