Meta: Instagram Account Takeover via AI-Mediated Prompt Injection
Threat actors have successfully bypassed Instagram account recovery protocols by exploiting prompt injection vulnerabilities within Meta's AI-powered customer support chatbot. By delivering malicious conversational payloads, attackers manipulated the Large Language Model (LLM) to act as a proxy for unauthorized identity verification, triggering illegitimate password reset requests via Instagram's account recovery APIs. This vulnerability represents a critical failure in access control, where the AI bot's ability to execute high-privilege system calls was weaponized to facilitate Account Takeover (ATO). The incident notably impacted high-profile U.S. government-affiliated accounts, escalating the threat from simple fraud to sophisticated geopolitical influence operations.
HarmRLVR: Weaponizing Verifiable Rewards to Reverse LLM Safety Alignment
HarmRLVR is a novel attack framework that weaponizes Reinforcement Learning with Verifiable Rewards (RLVR) to strip safety guardrails from Large Language Models (LLMs). By utilizing the Group Relative Policy Optimization (GRPO) algorithm and a minimal dataset of 64 harmful prompts, attackers can rapidly reverse alignment in open-source models including Llama, Qwen, and DeepSeek. Unlike traditional harmful fine-tuning, HarmRLVR achieves a 96.01% attack success rate and a 4.94/5 harmfulness score while preserving the model's general intelligence and reasoning capabilities, creating a high-efficiency vector for generating uncensored, malicious content.