After ChatGPT and DALL-E, meet VALL-E - the text-to-speech AI that can mimic anyone’s voice

Vall-E is a state of the art language modeling approach for text to speech synthesis - Copyright Canva

By Luke Hurst

Published on 10/01/2023 - 15:31 GMT+1•Updated 12/01/2023 - 9:44 GMT+1

VALL-E can mimic someone’s voice saying anything with just a three-second recording.

Last year saw the emergence of artificial intelligence tools (AI) that can create images, artwork, or even video with a text prompt.

There were also major steps forward in AI writing, with OpenAI’s ChatGPT causing widespread excitement - and fear - about the future of writing.

Now, just a few days into 2023, another powerful use case for AI has stepped into the limelight - a text-to-voice tool that can impeccably mimic a person’s voice.

Developed by Microsoft, VALL-E can take a three-second recording of someone’s voice, and replicate that voice, turning written words into speech, with realistic intonation and emotion depending on the context of the text.

Trained with 60,000 hours worth of English speech recordings, it can deliver a speech in a "zero-shot situation," which means without any prior examples or training in a specific context or situation.

Introducing VALL-E in a paper published by Cornell University, the developers explained that the recording data consisted of more than 7,000 unique speakers.

The team say their Text To Speech system (TTS) used hundreds of times more data than the existing TTS systems, helping them to overcome the zero-shot issue.

The tool is not currently available for public use - but it does throw up questions about safety, given it could feasibly be used to generate any text coming from anybody’s voice.

Microsoft betting big on AI

Its creators have, however, provided a demo, showcasing a number of three-second speaker prompts and a demonstration of the text-to-speech in action, with the voice correctly mimicked.

Alongside the speaker prompt and VALL-E’s output, you can compare the results with the "ground truth" - the actual speaker reading the prompt text - and the “baseline” result from current TTS technology.

Microsoft has invested heavily in AI and is one of the backers of OpenAI, the company behind ChatGPT and DALL-E, a text-to-image or art tool.

The software giant invested $1 billion (€930 million) in OpenAI in 2019, and a report this week on semafor.com stated it was looking at investing another $10 billion (€9.3 billion) in the company.

Comments

Now playing Next

Tech News

Israeli airstrike hits school in Gaza, killing at least 30

Italy's Via Appia enters the Unesco World Heritage List

Leaders react to 'unique' opening ceremony in Paris

Your posts on X are now training its Grok AI model without you knowing

Israeli airstrike hits school in Gaza, killing at least 30

Italy's Via Appia enters the Unesco World Heritage List

Leaders react to 'unique' opening ceremony in Paris

Your posts on X are now training its Grok AI model without you knowing

After ChatGPT and DALL-E, meet VALL-E - the text-to-speech AI that can mimic anyone’s voice

Microsoft betting big on AI

You might also like

Will ChatGPT and other AI tools replace journalists in newsrooms?

Microsoft cuts 10,000 jobs worldwide as tech layoffs mount

Your posts on X are now training its Grok AI model without you knowing