OpenAI has been developing ‘Voice Engine’ for two years. The technology allows users to upload any 15-second voice sample to generate a synthetic copy of that voice. But there’s no date for public availability yet, giving the company time to respond to how the model is used and abused, and to verify safety.
OpenAI’s Voice Engine
As stated by OpenAI: “Today we are sharing preliminary insights and results from a small-scale preview of a model called Voice Engine, which uses text input and a single 15-second audio sample to generate natural-sounding speech that closely resembles the original speaker. Notably, a small model with a single 15-second sample can create emotive and realistic voices”. According to OpenAI, they first developed Voice Engine in late 2022, and have used it to power the preset voices available in the text-to-speech API as well as ChatGPT Voice and Read Aloud. “At the same time, we are taking a cautious and informed approach to a broader release due to the potential for synthetic voice misuse. We hope to start a dialogue on the responsible deployment of synthetic voices, and how society can adapt to these new capabilities. Based on these conversations and the results of these small-scale tests, we will make a more informed decision about whether and how to deploy this technology at scale” OpenAI claims.
Use it for video translation
OpenAI says that this technology can be utilized to translate content, like videos and podcasts, so creators and businesses can reach more people around the world, fluently and in their voices. One early adopter of this is HeyGen, an AI visual storytelling platform that works with enterprise customers to create custom, human-like avatars for a variety of content, from product marketing to sales demos. They use Voice Engine for video translation, so they can translate a speaker’s voice into multiple languages and reach a global audience. When used for translation, Voice Engine preserves the native accent of the original speaker: for example generating English with an audio sample from a French speaker would produce speech with a French accent.
Safety
OpenAI states that they are well aware of arty issues using this technology to maneuver the opinion of the crowd. “We recognize that generating speech that resembles people’s voices has serious risks, which are especially top of mind in an election year. We are engaging with U.S. and international partners from across government, media, entertainment, education, civil society, and beyond to ensure we are incorporating their feedback as we build” the company says and adds: “The partners testing Voice Engine today have agreed to our usage policies, which prohibit the impersonation of another individual or organization without consent or legal right. In addition, our terms with these partners require explicit and informed consent from the original speaker and we don’t allow developers to build ways for individual users to create their voices. Partners must also clearly disclose to their audience that the voices they’re hearing are AI-generated. Finally, we have implemented a set of safety measures, including watermarking to trace the origin of any audio generated by Voice Engine, as well as proactive monitoring of how it’s being used. We believe that any broad deployment of synthetic voice technology should be accompanied by voice authentication experiences that verify that the original speaker is knowingly adding their voice to the service and a no-go voice list that detects and prevents the creation of voices that are too similar to prominent figures”. You can read more on the OpenAI blog.
Advantages for content creators
OpenAI Voice Engine can be very useful for content creators, as it eliminates the need to record accurate sound-over. You just need to talk for 15 seconds to generate your synthetic voice samples, and then the engine will mimic your voice via text. However, the bad thing is that everybody could mimic you by using your voice simply to generate a precise voice of you. Anyway, you can’t use Voice Engine yet, as the technology is being tested, especially the safety aspects.