OpenAI unveils AI voice cloning tech that only needs a 15-second sample to work

The OpenAI logo is seen displayed on a cell phone with an image on a computer screen generated by ChatGPT's Dall-E text-to-image model, Friday, Dec. 8, 2023, in Boston. - Copyright AP Photo/Michael Dwyer

By Pascale Davies

Published on 01/04/2024 - 16:00•Updated 16:02

Comments

OpenAI’s Voice Engine was first developed in late 2022.

OpenAI has made its artificial intelligence (AI) even more humanly eerie with a text-to-voice tool that generates natural speech from a 15-second clip of someone’s voice to sound like the original speaker.

But even OpenAI is wary about the potential misuse of the technology and says it will not release Voice Engine publicly, with it currently only being available to early testers.

“We recognise that generating speech that resembles people’s voices has serious risks, which are especially top of mind in an election year,” the San Francisco-based company said in a statement.

Voice cloning AI technology is not new and has already been used under concerning circumstances.

Ahead of the primary vote in the United States in January, AI-generated robocalls mimicking President Joe Biden were sent to thousands of voters telling them to stay at home and abstain from voting.

The US Federal Communications Commission (FCC), as a result, banned AI-generated robocalls last month.

But it is not just elections that can be affected but voice cloning technology or deepfakes. Fraudulent extortion scams via impersonating AI are also a growing concern.

But it can also be used for good. OpenAI has shown how the technology is helping patients who suffer from sudden or degenerative speech conditions by restoring their voice with videos or audio materials from before they lost the ability to speak.

OpenAI said another use case is for people who cannot speak or have difficulty speaking to give them a voice, which does not sound like a robot.

“These small scale deployments are helping to inform our approach, safeguards, and thinking about how Voice Engine could be used for good across various industries,” OpenAI said in its blog post.

Voice Engine is so far only available to several of OpenAI’s partners, which the company said have agreed to their usage policies that prohibit the impersonation of another individual or organisation without consent.

Companies with access to Voice Engine include the education technology company Age of Learning, the visual storytelling platform HeyGen, and the health system Lifespan.

OpenAI said another safety measure is watermarking to trace the origin of any audio generated by Voice Engine; it also requires the partners to get the “explicit and informed consent” of the original speaker.

“We believe that any broad deployment of synthetic voice technology should be accompanied by voice authentication experiences that verify that the original speaker is knowingly adding their voice to the service and a no-go voice list that detects and prevents the creation of voices that are too similar to prominent figures,” OpenAI said.

Comments