FILTERING BY: CLEAR FILTER

AutoDojo: Exposing the Failure of Static Defenses in LLM Agent Workflows

Researchers have introduced AutoDojo, an adaptive adversarial framework designed to expose the inadequacy of static security benchmarks like AgentDojo in evaluating Indirect Prompt Injection (IPI) vulnerabilities. By leveraging frontier LLMs to perform black-box, iterative optimization, AutoDojo bypasses current prompt-level instructions and detection-based filters. While static testing often yields a 0% Attack Success Rate (ASR), adaptive optimization recovers a 28% overall ASR and up to a 64% ASR in "action-open" tasks where agents delegate authority based on untrusted third-party data. This demonstrates a critical structural vulnerability in LLM agent workflows, necessitating a transition from static benchmarking to continuous, agentic red-teaming and robust system-level isolation.


LINK COPIED TO CLIPBOARD