OpenAI Presents ‘Voice Engine’: Generate Natural Sounding Based on Your Voice
OpenAI Presents ‘Voice Engine’: Generate Natural Sounding Based on Your Voice

OpenAI Presents ‘Voice Engine’: Generate Natural Sounding Based on Your Voice

2024-04-08
2 mins read

OpenAI has been developing ‘Voice Engine’ for two years. The technology allows users to upload any 15-second voice sample to generate a synthetic copy of that voice. But there’s no date for public availability yet, giving the company time to respond to how the model is used and abused, and to verify safety. 

OpenAI and sound design
OpenAI and sound design

OpenAI’s Voice Engine

As stated by OpenAI: “Today we are sharing preliminary insights and results from a small-scale preview of a model called Voice Engine, which uses text input and a single 15-second audio sample to generate natural-sounding speech that closely resembles the original speaker. Notably, a small model with a single 15-second sample can create emotive and realistic voices”. According to OpenAI, they first developed Voice Engine in late 2022, and have used it to power the preset voices available in the text-to-speech API as well as ChatGPT Voice and Read Aloud. “At the same time, we are taking a cautious and informed approach to a broader release due to the potential for synthetic voice misuse. We hope to start a dialogue on the responsible deployment of synthetic voices, and how society can adapt to these new capabilities. Based on these conversations and the results of these small-scale tests, we will make a more informed decision about whether and how to deploy this technology at scale” OpenAI claims.

Say Goodbye to Sound Designers. Meet ElevenLabs’ New AI-Generated SFX
Say Goodbye to Sound Designers. Meet ElevenLabs’ New AI-Generated SFX, Read the article here

Use it for video translation

OpenAI says that this technology can be utilized to translate content, like videos and podcasts, so creators and businesses can reach more people around the world, fluently and in their voices. One early adopter of this is HeyGen, an AI visual storytelling platform that works with enterprise customers to create custom, human-like avatars for a variety of content, from product marketing to sales demos. They use Voice Engine for video translation, so they can translate a speaker’s voice into multiple languages and reach a global audience. When used for translation, Voice Engine preserves the native accent of the original speaker: for example generating English with an audio sample from a French speaker would produce speech with a French accent.

Sound strips in colors on Premiere Pro
Sound strips in colors on Premiere Pro

Safety

OpenAI states that they are well aware of arty issues using this technology to maneuver the opinion of the crowd. “We recognize that generating speech that resembles people’s voices has serious risks, which are especially top of mind in an election year. We are engaging with U.S. and international partners from across government, media, entertainment, education, civil society, and beyond to ensure we are incorporating their feedback as we build” the company says and adds: “The partners testing Voice Engine today have agreed to our usage policies, which prohibit the impersonation of another individual or organization without consent or legal right. In addition, our terms with these partners require explicit and informed consent from the original speaker and we don’t allow developers to build ways for individual users to create their voices. Partners must also clearly disclose to their audience that the voices they’re hearing are AI-generated. Finally, we have implemented a set of safety measures, including watermarking to trace the origin of any audio generated by Voice Engine, as well as proactive monitoring of how it’s being used. We believe that any broad deployment of synthetic voice technology should be accompanied by voice authentication experiences that verify that the original speaker is knowingly adding their voice to the service and a no-go voice list that detects and prevents the creation of voices that are too similar to prominent figures”. You can read more on the OpenAI blog.

Editing sounds. Picture: MZed
Editing sounds. Picture: MZed

Advantages for content creators

OpenAI Voice Engine can be very useful for content creators, as it eliminates the need to record accurate sound-over. You just need to talk for 15 seconds to generate your synthetic voice samples, and then the engine will mimic your voice via text. However, the bad thing is that everybody could mimic you by using your voice simply to generate a precise voice of you. Anyway, you can’t use Voice Engine yet, as the technology is being tested, especially the safety aspects.

Get the best of filmmaking!

Subscribe to Y.M.Cinema Magazine to get the latest news and insights on cinematography and filmmaking!

Yossy is a filmmaker who specializes mainly in action sports cinematography. Yossy also lectures about the art of independent filmmaking in leading educational institutes, academic programs, and festivals, and his independent films have garnered international awards and recognition.
Yossy is the founder of Y.M.Cinema Magazine.

Leave a Reply

Your email address will not be published.

Get the best of filmmaking!

Subscribe to Y.M.Cinema Magazine to get the latest news and insights on cinematography and filmmaking!

Get the best of filmmaking!

Subscribe to Y.M.Cinema Magazine to get the latest news and insights on cinematography and filmmaking!

Bloomberg: “Apple Vision Pro's greatest potential may be replacing the Mac and iPad”
Previous Story

Bloomberg: “Apple Vision Pro’s greatest potential may be replacing the Mac and iPad”

IMAX 2nd Generation Film Cameras: Prototype and New Details Revealed
Next Story

IMAX New 65mm Film Cameras: Prototype and Features Revealed

Latest from News

Netflix Starts to Prefer Low-Budget Filmmaking

Netflix Starts to Prefer Low-Budget Filmmaking

Netflix is pivoting to lower-budget filmmaking projects, moving away from the high-octane big-budget action flicks, the New York Times reports. That would be a piece of great news for filmmakers as it…
Nikon Wants to Develop Cinema Lenses

Nikon Wants to Develop Cinema Lenses

RED Digital Cinema CEO Keiji Oishi tells Televisual that Nikon is considering lens development for cinema, as a part of their strategic plan to enter strongly into the cinema market. Nikon’s plan…
Go toTop

Don't Miss

OpenAI Wants Hollywood to Use Sora

OpenAI Wants Hollywood to Use Sora

OpenAI wants to partner with Hollywood. The artificial intelligence startup set appointments with Hollywood senior executives in order to persuade them to use
Sora Has Been Tested by Filmmakers: What Do They Think?

Sora Has Been Tested by Filmmakers: What Do They Think?

OpenAI’s text-to-video revolutionizing product, Sora, has been tested by acclaimed filmmakers and content creators. The results are – Wow. Watch below their Sora’s