GCD LLM Jailbreak Attacks

Arxiv pdf 2026-06-01T00:00:00
arXiv Paper — PDF not available. Only the Executive Summary is available here. To read or download the full paper, visit the arXiv abstract page.

Abstract

The authors introduce Greedy Coordinate Diffusion (GCD), a gray-box framework for generating adversarial prompts against safety-aligned large language models (LLMs). GCD improves upon previous optimization-based attacks, such as Greedy Coordinate Gradient (GCG), by leveraging discrete diffusion models to ensure prompts remain human-readable (low perplexity) and semantically aligned with the attacker's original intent. By replacing gradient-based token selection with generative diffusion priors, GCD achieves higher attack success rates while effectively evading common perplexity-based and guard-model filters.

Loading executive summary...

LINK COPIED TO CLIPBOARD