Merriam-Webster claims OpenAI used approximately 100,000 articles, encyclopedia and dictionary entries to train ChatGPT without authorisation.
The popular English language dictionary Merriam-Webster has filed a lawsuit against OpenAI, accusing the company of using copyrighted material to train its ChatGPT model.
The lawsuit, filed alongside its parent company, Encyclopedia Britannica, claims that OpenAI copied more than 100,000 articles, encyclopedia entries, and dictionary entries from online sources. The complaint alleges that this content was used to teach ChatGPT how to generate responses to user prompts without authorisation.
OpenAI violates copyright in three ways: large-scale copying of protected material, using that content to train its AI, and generating outputs that resemble the original content, the company alleges.
ChatGPT’s responses often contain “verbatim or near-verbatim reproductions” of information from the dictionary’s content, the lawsuit claims. That then pushes users who would otherwise visit their website away.
“Defendants’ ChatGPT-based AI products free ride on Plaintiffs’ trusted, high-quality content … by cannibalising traffic to Defendants’ websites with AI-generated summaries of Plaintiffs’ own content,” the lawsuit said.
Information from the dictionary has also been used in AI hallucinations: responses that AI systems invent when they do not have enough information to answer a user’s query, the lawsuit argues.
ChatGPT’s replies also “misleadingly omit” portions of the dictionary’s content, which makes its explanations “incomplete and inaccurate”.
The plaintiffs are seeking financial damages and a court order that would permanently prevent OpenAI from continuing the alleged practices.
Euronews Next contacted OpenAI about the lawsuit, but did not receive an immediate reply.