OpenAI’s Voice Engine can clone a voice from a 15-second clip. Listen for yourself

akinbostanci/Getty Images

Since releasing ChatGPT and ushering in the generative AI era, OpenAI has stayed ahead of the curve with cutting-edge AI technology such as Sora, its impressive text-to-video generator. On Friday, the company took another step forward by sharing insights from its small-scale preview of Voice Engine, a voice cloning AI model that can create realistic, emotive voices using text input and a 15-second audio sample. 

As seen in the clip below, the technology can generate a highly realistic-sounding voice that closely resembles the voice in the reference clip. An AI voice generator capable of impersonating someone’s voice from just a 15-second sample — what could go wrong?

OpenAI just launched Voice Engine,
It uses text input and a single 15-second audio sample to generate natural-sounding speech that closely resembles the original speaker.
Reference and Generated audio is very close and hard to differentiate.
More details in 🧵 pic.twitter.com/tJRrCO2WZP

— AshutoshShrivastava (@ai_for_success) March 29, 2024

OpenAI is aware of the risks of a voice cloning model and, as a result, has not yet released it to the public, despite first developing Voice Engine in late 2022. “We recognize that generating speech that resembles people’s voices has serious risks, which are especially top of mind in an election year,” the company said in its blog post.

In 2023, OpenAI began privately testing Voice Engine with a small group of partners to help the company learn more about the model, including its potential use cases, safeguards, and more.

Also: Microsoft has a clever way of showing you AI is normal (especially if you’re alone)

The partners testing Voice Engine had to agree to OpenAI’s usage policies, which explicitly prohibit them from impersonating an individual or organization without the original speaker’s consent. Other safeguards include disclosing to the audience that the voice they are hearing is AI-generated, watermarks that trace back to Voice Engine, monitoring the model’s usage, and prohibiting the creation of their own voices.

READ MORE  CRED in talks to acquire mutual fund startup Kuvera

OpenAI’s partners have taken Voice Engine and developed use cases with a potentially positive impact.

For example, edtech startup Age of Learning used Voice Engine to provide non-readers and children with reading assistance by generating pre-scripted voice-over content and personalized responses. Similarly, AI avatar-generating startup HeyGen built a tool on Voice Engine that translates a speaker’s voice into multiple languages.

While OpenAI is keeping Voice Engine in preview for now, other similar models are already available to the public. Take ElevenLabs, a startup that has made headlines for both positive and negative use cases of its AI-powered voice-generating platform. The best-known example of ElevenLabs’s tech  is probably the recent fake robocall of President Joe Biden that encouraged voters not to show up at the polls.

Also: ChatGPT is finally revealing its sources – but there’s a catch

The ElevenLabs Voice Cloning tool is easy to access and use. All you need is an ElevenLabs account, a few minutes of voice samples, and a text prompt.

OpenAI is smart to delay its entrance into the voice cloning space. The tech industry needs to bring awareness to the risks of AI-generated voices and emphasize to users the importance of verifying sources before they believe what they hear and see.

Leave a Comment