FlagThis - Cybersecurity news

arXiv (Computer Science - Cryptography and Security) • 2h

The Cognitive Firewall: A Proactive Zero-Trust Framework for LLM Safety

Vulnerability Analysis#ZeroTrust#LLMSecurity#CognitiveFirewall#AIAlignment#DefensiveFramework

Current Large Language Model (LLM) safety paradigms utilize reactive, single-turn message filtering, leaving them vulnerable to "salami-slicing" attacks. These attacks decompose malicious intent across multiple dialogue turns to evade detection. The Cognitive Firewall framework addresses this through a proactive, stateful, multi-gate Zero-Trust architecture. By employing independent oversight agents—specifically Intent, Zero-Trust Context, Consistency, and Output Risk gates—the framework monitors the evolution of user objectives and treats all asserted roles as unverified evidence. This approach shifts defense from isolated scoring to escalation-based blocking, successfully reducing attack success rates (ASR) to <2% on standard benchmarks and 14% against complex, human-authored adversarial prompts.

Links:arXiv (Computer Science - Cryptography and Security), Reddit, Dev, Shiftai, Hipocap, Summerofcode, Activewizards, Medrxiv, Accuknox, Llm-hacking •

The Cognitive Firewall: A Proactive Zero-Trust Framework for LLM Safety

SHARE INTELLIGENCE WIRE