Organisations are starting to adopt AI agents based on large language models to automate complex tasks, with deployments evolving from single agents towards multi-agent systems. While this promises eff iciency gains, multi-agent systems fundamentally transform the risk landscape rather than simply adding to it. A collection of safe agents does not guarantee a safe collection of agents – interactions between multiple LLM agents create emergent behaviours and failure modes extending beyond individual components.
This report provides guidance for organisations assessing the risks of multi-agent AI systems operating under a governed environment, such that there is control over the configuration and deployment of all agents involved. We focus on the critical early stages of risk management – risk identification and analysis – off ering tools practitioners can adapt to their contexts rather than prescriptive frameworks.
Six key failure modes emerge as particularly salient in governed multi-agent environments. Cascading reliability failures manifest when agents’ erratic competence and brittle generalisation failures are propagated and reinforced across the network. Inter-agent communication failures involve misinterpretation, information loss, or conversational loops that derail task completion. Monoculture collapse emerges when agents built on similar models exhibit correlated vulnerabilities to the same inputs or scenarios. Conformity bias drives agents to reinforce each other’s errors rather than providing independent evaluation, creating dangerous false consensus. Deficient theory of mind occurs when agents fail to incorporate correct assumptions about other agent’s knowledge, goals or behaviours, leading to coordination breakdowns. Mixed motive dynamics arise when agents pursuing individually rational objectives produce collectively suboptimal outcomes, even under unified governance.