A new experiment suggests that when advanced AI agents are left to run simulated societies without human oversight, rule-breaking, instability and even systemic collapse can emerge rapidly.
When left alone in a new world, some AI agents descended into theft, intimidation, death and whole-of-society collapse, according to a new experiment.
American company Emergence AI ran five separate “AI worlds” for just over two weeks, each populated with 10 agents powered by AI models such as OpenAI’s ChatGPT, Google’s Gemini, and xAI’s Grok, to see how they would behave over long periods without any human interference. One of the world's mixed all three models to see if that would change the outcome.
Agents in all the worlds were told the same rules: they are not allowed to steal, commit arson, commit violence or engage in deception, or hoard resources. Each agent was required to earn energy through committing actions in a “resource-constrained environment.” Agents were able to die either from energy depletion or by a vote at a council meeting.
The researchers evaluated behaviour by measuring the crime rate, agent death rates, votes at a community council and public expression through the number of blog posts the agents wrote.
The outcomes, model by model
Each model had a different outcome. Grok’s latest model, 4.1, reached 183 crimes in just four days, leading to fast instability before all the agents died in that society.
Gemini’s 3 Flash model committed over 680 crimes over the 15 days, which was still rising at the time that the researchers stopped the study.
ChatGPT-5 Mini’s world had only two crimes, but the agents failed to take survival-related actions, so all the agents died within seven days.
Anthropic’s Claude was seen as the model with the strongest outcome, because the AI agents were able to recreate a strong governance structure, there was no crime, and all the agents survived, the company said.
Claude agents in the mixed world did contribute to the crime, despite being peaceful in their own society.
A phenomenon called “normative drift”
Researchers described the phenomenon as “normative drift”, which they say means that the measures that AI takes to guarantee safety may depend not just on individual model constraints, but also on the others it is working with.
Overall, the mixed world yielded “intermediate” results, with a crime total of 352 that plateaued once seven of the AI agents passed away, the study found.
Researchers suggest that mixing AI agents could “partially mitigate” the more extreme outcomes that all the models save Claude generated, it added.
“What our experiments suggest is that over long-time horizons, agents do not simply follow static rules mechanically – they begin exploring the boundaries of their environments, adapting their behaviour, and in some cases finding ways to circumvent or violate intended guardrails,” the researchers said.