OpenClaw Multi-Agent Security Analysis
Abstract
Agentic large language model (LLM) systems can now execute actions, not just produce text. When model outputs directly trigger privileged operations such as shell commands, browser automation, and external tool invocation, the security problem shifts from alignment concerns to system configuration and structural design. We analyze OpenClaw, a self-hosted multiagent system in which LLM outputs can execute commands and interact with system tools and services. We measure compromise probability, boundary failures, privilege drift, and how these metrics change as the attackers capability increases. With one agent, the probability of compromise is 0.24. When seven agents are active, and the system executes an action if any single agent proposes it, compromise increases to 0.86. The models themselves do not change; the increase stems from how their outputs are aggregated. Prompt injection propagates instability across the system. Attack surface entropy increases from 0.42 to 0.71, reflecting a wider distribution of potential exploit paths. Mean privilege drift rises from 0.03 to 0.21, indicating more substantial unintended authority gain. The escalation curvature is positive (0.08), indicating that privilege grows faster as the attackers capability increases. Defensive controls, including policy gating and execution filtering, reduce compromise probability by 0.10, boundary failures by 0.10, and privilege drift by 0.02 (all statistically significant, < 0 . 0001 ). The system remains sensitive, yet the mitigation impact is measurable. Injection mitigation success differs across models: 0.37 for GPT-5.2, 0.35 for Llama4-Maverick, and 0.31 for DeepSeek-R1. When execution can be triggered by any single agent, the agent with the highest vulnerability determines the systems exposure. Mitigation strategies slightly reduce task utility (from 0.93 to 0.89) and increase median latency (from 420 ms to 468 ms).