Stability AI, which developed the Stable Diffusion artificial intelligence technology to convert text into images, announced the release of the neural network Stable audio to generate short audio clips based on text descriptions. Stable Audio is based on the same core AI techniques that Stable Diffusion uses to create images.
Image source: Pixabay
“Stability AI is best known for its work with images, but now we’re launching our first music and audio creation product called Stable Audiosaid Ed Newton-Rex, vice president of audio at Stability AI. — The idea is simple: you describe the music or sound you want to hear in text form and our system generates it for you.”
Ed is no stranger to the world of computer music: in 2011 he founded the startup Jukedeck, which was acquired by TikTok in 2019. However, the technology behind Stable Audio has its roots not in Jukedeck, but in Stability AI’s in-house music production research studio called Harmonai, founded by Zach Evans. Evans explained that the text model uses a technique known as Contrastive Language Audio Pretraining (CLAP). The Stable Audio model has about 1.2 billion parameters, which is approximately the same as the original image generation version of Stable Diffusion.
The ability to generate simple audio tracks using technology is nothing new. Historically, a method called symbol generation has been used, typically used when working with the MIDI (Musical Instrument Digital Interface) format. Stable Audio’s generative AI capabilities allow users to create new music that goes beyond the repeating note sequences typical of MIDI and symbol generation.
Image source: Atomic Heart
Stable Audio works directly with raw audio samples to provide higher quality output. The model was trained on more than 800,000 licensed music tracks from the AudioSparks audio library. “One of the biggest challenges in creating text models is obtaining audio data that is not only high quality but also has appropriate metadata.” Evans explained.
One of the most common tasks that users ask image generation models is stylization for a specific artist. However, in the case of Stable Audio, users cannot turn to the AI with such a request – according to the creators of Stable Audio, most musicians would prefer to be more creative.
The Stable Audio model will be available both free and with a Pro plan for $12 per month. The free version allows you to create 20 tracks per month with a duration of up to 20 seconds, while the Pro version increases the number of tracks to 500 and their playing time to 90 seconds. The latter also allows the commercial use of the works. As part of the launch, Stable Audio Stability AI will also release a guide for text prompts.
Add Comment