As AI becomes more enmeshed in our lives, most people can’t tell the difference between human voices and their synthetic clones, a new study reveals.
Artificial intelligence (AI) has become a common part of day-to-day life for many. We see it written in the AI slop on our social media feeds, speak to it using large language models, and hear it every time Amazon’s Alexa perks up at a demand. Yet, as the technology rapidly advances, it’s becoming harder to tell what’s real and what’s not.
In a new study, published in the PLoS One journal, researchers found that most people can no longer distinguish between AI-generated voices and the human voices they were cloned from.
Participants were given samples of 80 different voices, half of which were AI, the other human. They were then asked to rate what they heard based on levels of trustworthiness or dominance.
Within the AI category, there were two different types: Generic voices created from scratch, and voices cloned from recordings of humans speaking.
While most people recognised the generic AI was fake, the synthetically cloned versions proved less decipherable, with 58 per cent being mistaken for real. In comparison, 62 per cent of the real voices were correctly identified as being human, leaving only a slight difference between respondents’ ability to tell the two apart.
“The most important aspect of the research is that AI-generated voices, specifically, voice clones, sound as human as recordings of real human voices,” Dr Nadine Lavan, the study's lead author and a senior lecturer in psychology at Queen Mary University of London, told Euronews Next.
“That is particularly striking since we used commercially available tools, where anyone can create voices that sound realistic without having to pay huge amounts of money, nor do they need any particular programming or technological skills”.
Voicing concerns
AI voice cloning technology works by analysing and extracting key characteristics from voice data. Due to its ability to mimic so precisely, it’s become a popular tool for phone scammers, who sometimes use social media posts as a resource for imitating the voices of people’s loved ones.
The elderly are most at risk, with at least two-thirds of people over the age of 75 receiving attempted telephone fraud, according to research by the University of Portsmouth. They also found that nearly 60 per cent of the attempted scams are conducted via voice calls.
Although not all of these calls will be made using AI, it’s becoming increasingly prevalent due to the software’s sophistication and accessibility, with popular examples including Hume AI and ElevenLabs.
AI-cloning has also become a cause for concern in the entertainment industry, where several celebrities' voices have been used without permission. Last year, Scarlett Johansson spoke out about OpenAI using a voice that sounded 'eerily similar' to her own in the film ‘Her’ for its ChatGPT service.
Then there's the widespread use of audio deepfakes, which have previously mimicked politicians or journalists in attempts to sway public opinions and spread misinformation.
As all these troubling misuses continue to permeate society, Lavan believes AI developers have a responsibility to implement stronger safeguards.
“From our perspective as researchers, we would always recommend that companies creating the technology talk to ethicists and policy makers to consider what the ethical and legal issues are around, for example, ownership of voices, consent (and how far that can stretch in the face of an ever-changing landscape),” she said.
Improving accessibility
As with all technologies, AI-generated voices also have the potential to be used for good - and could prove particularly beneficial for people who are mute or struggle to speak.
“This kind of assistive technology has been in use for some time, with Stephen Hawking being one of the most iconic examples. What’s new, however, is the ability to personalise these synthetic voices in ways that were previously impossible,” said Lavan.
“Today, users can choose to recreate their original voice, if that’s what they prefer, or design a completely new voice that reflects their identity and personal taste”.
She also noted that, if used ethically and responsibly, the technology could improve accessibility and diversity in education, broadcasting and audiobook production.
For example, a recent study found that AI-assisted audio-learning boosted students' motivation and reading engagement - especially those with a neurodiversity like Attention Deficit Hyperactivity Disorder (ADHD).
“Another fascinating development is the ability to clone a voice into different languages, allowing people to represent themselves across linguistic boundaries while retaining their vocal identity. This could be transformative for global communication, accessibility, and cultural exchange,” Lavan added.
As the sound of artificial voices becomes ever more present in our lives, the nuances with which we utilise and engage with them will continue to develop. Lavan hopes to explore this with further research, focusing on how AI-generated voices are perceived.
“I'd be really keen to explore in more depth how whether someone knows whether a voice is AI-generated or not will change how they engage with that voice,” she said.
“Similarly, it would be very interesting to see how people would perceive AI-generated voices that sound nice and pleasant but clearly not human: For example, would people be more or less likely to follow instructions from these pleasant, but non-human AI voices? Would people be more or less likely to get angry at them when something goes wrong?
“All of these questions are really interesting from a research perspective and can tell us a lot about what matters in human (or human-computer) interactions,” she said.