'Unreliable research assistant’: False outputs from AI chatbots pose risk to science, report says

False answers from AI Large Language Models pose risk to science, says Oxford study. - Copyright Canva

Published on 20/11/2023 - 17:00•Updated 24/11/2023 - 18:15

Oxford researchers warn that AI chatbots pose a risk to science.

Large language models (LLMs) such as ChatGPT and Bard could pose a threat to science due to false responses, Oxford AI researchers argue in a new paper, that insists that their use in scientific research should be restricted.

LLMs are deep learning models that power artificial intelligence (AI) chatbots and are capable of generating human-like text.

Researchers from the Oxford Internet Institute say that people are too trusting of these models and view them as a human-like resource.

“This is, in part, due to the design of LLMs as helpful, human-sounding agents that converse with users and answer seemingly any question with confident sounding, well-written text,” Brent Mittelstadt, director of research at the Oxford Internet Institute, said in a statement.

“The result of this is that users can easily be convinced that responses are accurate even when they have no basis in fact or present a biased or partial version of the truth,” he said.

Yet LLMs do not guarantee accurate responses and could give false information coming from the training data or could generate false information (known as a hallucination) while the tone of the output remains convincing to the user.

Why shouldn’t researchers trust LLMs?

While many responses from chatbots will be accurate, there is no guarantee, and false outputs can be caused by the datasets used to train these AI models.

For example, if the datasets, which often come from content on the internet, contain “false statements, opinions, jokes, creative writing” as the researchers say, this can prompt incorrect outputs.

Another problem is that LLMs are notoriously secretive about their datasets.

In an investigation, the Washington Post revealed, for instance, that the dataset of Bard - which is second to ChatGPT in popularity - included various internet forums, personal blogs and websites dedicated to entertainment like Screenrant.

For Mittelstadt, the main concern isn’t the very obvious hallucinations but rather outputs that are "slightly wrong or slightly biased, or where you need some specific expertise to tell that it's wrong," he told Euronews Next.

For example, that can be the case for references to scientific articles.

"One of the big problems is they will invent references completely where if you don't go back and look for the references, you won't realise that it's actually a completely fabricated paper. Or the reference might be right, but it might give you the wrong information about what the paper says," he said.

“I look at ChatGPT and language models as a very unreliable research assistant. So anything that it gives me, I will always fact-check it and I will always make sure that it is true,” he added.

What are the solutions?

ChatGPT warns users that the chatbot may give inaccurate information.

Yet the researchers recommend using the large language model not as a “knowledge-base” but rather as a “zero-shot translator”.

“It is when you're giving the model a set of inputs that contain some reliable information or data, plus some request to do something with that data. And it's called zero-shot translation because the model has not been trained specifically to deal with that type of prompt,” said Mittelstadt.

That would mean rewriting a text in a more accessible language, “curating data” or “translating data from one format to another”.

“The way in which LLMs are used matters. In the scientific community, it is vital that we have confidence in factual information, so it is important to use LLMs responsibly. If LLMs are used to generate and disseminate scientific articles, serious harm could result," said Sandra Wachter, a study’s co-author.

The Oxford researchers are not the only ones who say that guardrails are needed when it comes to using ChatGPT in science.

Nature, the world’s leading scientific publication, decided at the beginning of the year that “no LLM tool will be accepted as a credited author on a research paper” as a liability issue.

Moreover, Nature required the authors to disclose the use of a large language model in a section of their paper for transparency.

Comments