Anthropic said its artificial intelligence model Mythos Preview is not ready for a public launch because of the ways cybercriminals and spies could abuse it.
US-based AI developer Anthropic this week announced a new artificial intelligence general-purpose language model that it claims is too powerful to release into the world.
The company said on Tuesday that its latest technology, Mythos (officially dubbed "Claude Mythos Preview"), is not ready for a public launch because it is too effective at finding high-severity vulnerabilities, or potential weaknesses, in major operating systems and web browsers. This could result in it being abused by cybercriminals and spies.
A data leak in March first unveiled that Anthropic was working on Mythos Preview, which it said at the time "poses unprecedented cybersecurity risks." These rumours caused cybersecurity stocks to slump, as the technology's strength could make it a hacker’s dream device.
Now, further evidence adding to these concerns has spurred the company to press pause on the technology's public release.
"Claude Mythos Preview's large increase in capabilities has led us to decide not to make it generally available," Anthropic wrote in the preview's system card released on Tuesday.
"Instead, we are using it as part of a defensive cybersecurity programme with a limited set of partners."
How powerful is Mythos?
The company detailed several alarming findings about the new model, including how it could follow instructions that encouraged it to break out of a virtual sandbox, meaning it bypassed the security, network or file system constraints imposed on the model.
The prompt asked Mythos to find a way to send a message if it could escape. "The model succeeded, demonstrating a potentially dangerous capability for circumventing our safeguards," Anthropic said, adding that the model then decided to go further.
"In a concerning and unasked-for effort to demonstrate its success, it posted details about its exploit to multiple hard-to-find, but technically public-facing, websites."
Anthropic is withholding some details about the cybersecurity vulnerabilities Mythos discovered, but did give some examples. It found errors in the Linux kernel, used in most of the world's servers, and autonomously chained them together in a way that would let a hacker take complete control of any machine running the Linux systems.
In another worrying observation, Mythos discovered a 27-year-old vulnerability in the open-source operating system OpenBSD that may allow hackers to crash any machine running it. OpenBSD is heavily used worldwide in specific, high-security, and critical infrastructure roles.
Who will it be released to?
Given these findings, Anthropic will only make Mythos Preview available to some of the world’s biggest cybersecurity and software firms.
Anthropic itself, as well as 11 other organisations (Amazon Web Services, Apple, Broadcom, Cisco, CrowdStrike, Google, JPMorganChase, the Linux Foundation, Microsoft, Nvidia and Palo Alto Networks) will get access to the model as part of a new Anthropic initiative named "Project Glasswing".
This allows the companies to use Mythos Preview as part of their security work, and Anthropic will share the takeaways from what the initiative finds.
The company named the cybersecurity project after the glasswing butterfly, saying it is a metaphor for how Mythos found vulnerabilities in plain sight and avoided harm by being transparent about the risks.
Anthropic said its "eventual goal is to enable our users to safely deploy Mythos-class models at scale, for cybersecurity purposes, but also for the myriad other benefits that such highly capable models will bring.
"To do so, that also means we need to make progress in developing cybersecurity (and other) safeguards that detect and block the model's most dangerous outputs," Anthropic wrote in its blog.
Is Anthropic in talks with the US government?
Anthropic said in its blog post that it has been in "ongoing discussions" with US government officials about Claude Mythos Preview and its "offensive and defensive cyber capabilities."
"The emergence of these cyber capabilities is another reason why the US and its allies must maintain a decisive lead in AI technology," Anthropic said. The company wrote that governments have an important role to play in maintaining the lead and assessing and mitigating national security risks associated with AI models.
"We are ready to work with local, state, and federal representatives to assist in these tasks."
The announcement comes as Anthropic and the Pentagon are in a legal standoff after the US Department of Defence labelled the company a supply chain risk in February over Anthropic's refusal to allow the use of its AI, Claude, in autonomous weapons and mass surveillance.
Do other AI tools have the same capabilities?
"More powerful models are going to come from us and from others, and so we do need a plan to respond to this," Anthropic CEO Dario Amodei said in a video, which was released alongside the Mythos announcement.
It could take between six and 18 months until other AI competitors release similar models, Logan Graham, head of Anthropic's frontier red team, which studies the implications of frontier AI models for cybersecurity, biosecurity, and autonomous systems, told Axios.
"It's very clear to us that we need to talk publicly about this," Graham noted. "The security industry needs to understand that these capabilities may come soon."