Stability AI, which is best known for its AI-based text-to-image conversion tools, has unveiled a new tool called Stable Audio that converts text into sound or music.
Diffusion models can create audio clips with a certain duration, which is not suitable for music creation; Because songs are not fixed in time. However, the new Stability AI tool can create audio clips with different durations. To achieve this goal, the company trained its model with music and also added text metadata for the start and end time of the song.
Stable Audio artificial intelligence can create audio files with different durations
In the past, similar tools were trained with 30-second audio clips and could only create 30-second files of arbitrary parts of the song. But the new Stability AI tool allows you to have more control over the duration of the song.
The company said in its statement that it continues to train this model to improve the quality of its output:
“Stable Audio represents advanced audio production research by the Stability AI generator audio research lab Harmonai. “We continue to improve our model architecture, datasets, and training methods to improve output quality, controllability, output delivery speed, and output duration.”
According to Stability AI, the Stable Audio AI model was trained with a dataset containing more than 800,000 audio files of songs, sound effects and musical instruments. Additionally, text metadata from AudioSparx is used. In total, the new Stability AI model has been trained with more than 19,500 hours of audio.
This artificial intelligence model is available to users in three versions:
- Free version with the possibility of making 20 audio clips of maximum 45 seconds per month
- Professional version for making 500 audio clips of maximum 90 seconds for $11.99
- Enterprise version
In the free version, it is not possible to use the created songs commercially.
RCO NEWS