This is like Sora for sound effects. ElevenLabs has introduced its new revolutionary Text-to-SFX model, which can generate precise and professional sound effects from a prompt. Say goodbye to sound designers.
ElevenLabs: AI synthetic sounds and voices
ElevenLabs was founded in 2022 by best friends: Piotr, an ex-Google machine learning engineer, and Mati, an ex-Palantir deployment strategist. Inspired by the ‘poor’ dubbing of Hollywood movies they watched growing up in their native Poland, the pair set about designing a platform that could eliminate the linguistic barriers of content. Since then, ElevenLabs has produced the most realistic synthetic voices, generating speech that is close enough to natural to be almost undetectable. For those who are not familiar with the company, here are a few words stated by ElevenLabs: “ElevenLabs is a voice AI research & deployment company with a mission to make content universally accessible in any language & voice. ElevenLabs creates the most realistic, versatile, and contextually-aware AI audio, providing the ability to generate speech in hundreds of new and existing voices in 29 languages. As a technology research company, ElevenLabs is at the forefront of developing new cutting-edge voice AI. We deploy the most advanced models and features accessible via web app or API to a user base from creators to publishers and beyond. Our mission is to make on-demand multilingual audio support a reality across education, streaming, audiobooks, gaming, movies, and even real-time conversation. Our research powers the platform’s current features but it also contributes to realizing our ultimate goal of instantly converting spoken audio between languages. The AI dubbing tool – aimed for release later this year – will let users automatically re-voice any audio or video in a different language, all while preserving the original speaker’s voice.
Text-to-SFX: No more sound designing
ElevenLabs released a new and fascinating demonstration of its upcoming Text-to-SFX model a few days ago. The idea is pretty simple. Generate precise and accurate professional sounds from a text prompt. Just like Sora for videos. Talking about Sora, ElevenLabs published its first demo SFX-ing the first Sora video. As stated by ElevenLabs: “We were blown away by the Sora announcement but felt it was missing something… What if you could describe a sound and generate it with AI. AI Sound Effects are coming soon to ElevenLabs”. Take a look at the demonstration below, based on Sora’s first video:
We were blown away by the Sora announcement but felt it was missing something… What if you could describe a sound and generate it with AI. AI Sound Effects are coming soon to ElevenLabs.
ElevenLabs
Similar to Sora, ElevenLabs’ Text-to-SFX is still under testing and has not been released yet to the crowd. You can sign in for the waiting list.
The goal: Accurate sound design based on video
As reported by Tom’s Guide:” ElevenLabs has reached a billion dollar value unicorn status at the start of this year with its most recent $80 million Series B round. This announcement of the funding round came with a new tool for synching AI speech in video for auto translations — taking on the international dubbing market. In the long run, ElevenLabs can develop tools and models that can analyze the content of a video and automatically add sound effects at exactly the right points. The same could apply to music. Most AI music tools are text-to-music, but they could shortly go from image or video”. Thus, the final goal is to generate an entire, fully rounded piece of content (Video, SFX, and music) from a single prompt. Scary, but inevitable!
FAQ
- How does it work? Simply describe the sound effect you want and we’ll generate a few samples to choose from.
- What are some primary use cases? ElevenLabs is capable of generating a wide variety of sound effects for practically any use case. It’s great for film & media, video games, commercials, and more.
- Can I use ElevenLabs Sound Effects in commercial projects? Yes, all ElevenLabs Sound Effects are royalty-free and can be used in commercial projects. However, as with all of our services, you must not sell or license our tools or use the output to develop competitive products or services.
- How can I get access to ElevenLabs Sound Effects? Fill out this form to be the first to know when it’s available.
Closing thoughts
This new Text-to-SFX model can eliminate the need for sound designers, mainly in middle-sized projects (documentaries, commercials, and short films). This is very sad. The income of a whole market segment of creative professionals is in danger. In two years or so, entire commercials (from head to toe) will be generated by a simple text prompt. As said: Scary, but inevitable!
The results sound amateurish at best. Not even close to professional quality needed in feature lenght films and videogames. So no. No such thing as “sAy gOOdbYe TO SouND dESigNeRs” unless your quality standards are down in the sewers.
The main image shows Logic. There isn’t a professional on the planet who would use that for sound design. 🤦🏼♂️
I totally get that times change with technological advances, just the same way we no longer have horse and buggy drivers but tell me again that AI is not going to take away jobs:
I don’t think AI companies understand that the problem with AI is that it will put a lot of people out of work and if nobody has a job or income, then nobody can pay for their products and services. They are basically killing their own businesses.
That might be impressive to penny pinched local agencies and production teams that just need quick sounds for a low budget spot, but the quality and consideration isn’t there yet, for anything more serious. No doubt paint by numbers projects will benefit, but part of good sound design is craft and consideration of the environment as well as the effect – this example video lacks either. Pretty much everything was slapped in, and nothing was actually believable for their scene. This is fast sound at best for people that don’t realize audio is 60% if not more of their picture.