Scientists train AI model like human baby to learn language

AI models can learn language through the eyes of a baby, study shows
AI models can learn language through the eyes of a baby, study shows Copyright Canva
Copyright Canva
By Oceane Duboust
Share this articleComments
Share this articleClose Button

Can artificial intelligence learn language like a baby? Researchers tested this by recording footage from a child’s life and feeding it to an AI system.


What happens when you train an artificial intelligence (AI) system at the same pace as a baby?

A team of researchers from New York University (NYU) equipped a baby with a head-mounted camera and recorded videos from when the child was six months old through their second birthday.

They managed to record around one per cent of the child’s waking hours, which they used to train an AI system or neural network - a computational model able to learn patterns from input data.

They published their findings in the journal Science.

Despite this relatively low amount of data compared to the usual massive datasets used to train AI, it was enough for language learning.

“We show, for the first time, that a neural network trained on this developmentally realistic input from a single child can learn to link words to their visual counterparts,” Wai Keen Vong, a research scientist at NYU’s Centre for Data Science and the paper’s first author, said in a statement.

“Our results demonstrate how recent algorithmic advances paired with one child’s naturalistic experience has the potential to reshape our understanding of early language and concept acquisition,” he added.

A tool to know more about language learning

Top-tier AI systems undergo training on text datasets containing trillions of words, while children are exposed to only millions of words annually.

By using AI models to study language learning “we can address classic debates about what ingredients children need to learn words - whether they need language-specific biases, innate knowledge, or just associative learning to get going,” said Brenden Lake, an assistant professor at NYU and the paper’s senior author.

Researchers had 60 hours of footage that contained some 250,000 words communicated.

These words were associated with video frames capturing what the child saw when those words were spoken during activities such as mealtimes, reading books, or playtime.

Researchers then used two modules: one for video frames and another for transcribed speech directed to the child.

These were combined and trained with contrastive learning, a type of machine learning used to train the model to understand the associations between visual and linguistic cues.

The next step for the researchers was to test the model - called the Child’s View for Contrastive Learning model (CVCL) - in the same way they measured babies’ word learning.

They showed the model a word and four pictures, asking it to pick the picture that matched the word.

The results revealed that the model learned many words from a child's daily life.

The system could also apply some words to different pictures not seen during training, which children also learn to do.

“These findings suggest that this aspect of word learning is feasible from the kind of naturalistic data that children receive while using relatively generic learning mechanisms such as those found in neural networks,” said Lake.

Share this articleComments

You might also like