Mindgard security researchers have found two vulnerabilities in Microsoft Azure’s content safety filters for AI, namely AI Text Moderation and Prompt Shield. These vulnerabilities allow attackers to bypass these safeguards and inject malicious content into protected large language models (LLMs). Mindgard’s testing involved exposing ChatGPT 3.5 Turbo with Azure OpenAI to these filters and then using character injection and adversarial ML evasion techniques to circumvent them. The first method, character injection, involved adding specific characters and text patterns to prompts, leading to a significant drop in jailbreak detection effectiveness. The second, adversarial ML evasion, further reduced the effectiveness of both filters by finding blind spots in their ML classification systems. Microsoft acknowledged the issue and has been working on fixes for upcoming model updates. However, Mindgard emphasizes the seriousness of these vulnerabilities, as attackers could exploit them to compromise sensitive information, gain unauthorized access, manipulate outputs, and spread misinformation.