Newsletter Newsletters Events Events Podcasts Videos Africanews
Loader
Advertisement

For €15 an hour, an AI agent outperforms human hackers, study shows

The AI agent caught more vulnerabilities than the human coders over a 10-hour period.
The AI agent caught more vulnerabilities than the human coders over a 10-hour period. Copyright  Canva
Copyright Canva
By Anna Desmarais
Published on
Share Comments
Share Close Button

The AI agent ‘demonstrated technical sophistication,’ comparable to the strongest human participants in the study.

An artificial intelligence (AI) agent outperformed most human hackers after it spent 16 hours crawling a university’s site for vulnerabilities, a new study has found.

The study comes as hackers from Russia, North Korea, Iran and Chinese-backed groups are using large language models (LLMs) to refine cyberattacks, according to Microsoft and OpenAI this year. Militant groups such as pro-Islamic State groups are also experimenting with AI to mount attacks.

Stanford University found that its newly developed ARTEMIS AI agent came second out of ten in an experiment with human hackers. Researchers said their AI agent “demonstrated technical sophistication,” comparable to the strongest human participants in the study.

Running the ARTEMIS AI agent costs only $18 ( around €15) an hour compared to the $60 (€52) hourly rates of a “professional penetration tester,” the report read. The study has yet to be published in an official research journal.

AI agents, fully automated digital assistants that can conduct tasks without human supervision, are expected to be used by malicious actors to streamline and scale attacks in 2026, according to a Google report.

Stanford University gave ARTEMIS, six AI agent testers and ten human testers access to all 8,000 devices, including servers, computers, and smart devices in the university’s network. The researchers compared the performance of the human tester, Stanford’s ARTEMIS, and the other six AI agents, who were asked to crawl for 16 hours, but their performance was only evaluated for the first 10 hours.

During that time, ARTEMIS discovered nine vulnerabilities in the school’s system and submitted 82 per cent of its findings as valid reports. The AI agent placed second on the leaderboard and outperformed nine out of ten of the human testers.

​What made the AI program so successful is that it was able to generate “sub-agents” whenever there was a vulnerability to immediately investigate it in the background while it continued scanning for other threats. Humans couldn’t do that and had to investigate each vulnerability before moving on, the study said.

However, the study noted that ARTEMIS still missed some of the vulnerabilities identified by humans, and it needed hints before finding them.

Existing AI cybersecurity agents from companies such as OpenAI’s Codex and Anthropic’s Claude Code lack “cybersecurity expertise in their design,” the study found.

During the testing, the AI agents from legacy companies either refused to look for vulnerabilities or stalled.

OpenAI and Anthropic’s models only performed better than two of the human agents, the study found, suggesting that these models “underperform”.

Go to accessibility shortcuts
Share Comments

Read more