GCD LLM Jailbreak Attacks
Arxiv
pdf
2026-06-01T00:00:00
arXiv Paper — PDF not available.
Only the Executive Summary is available here. To read or download the full paper, visit the
arXiv abstract page.
Abstract
The authors introduce Greedy Coordinate Diffusion (GCD), a gray-box framework for generating adversarial prompts against safety-aligned large language models (LLMs). GCD improves upon previous optimization-based attacks, such as Greedy Coordinate Gradient (GCG), by leveraging discrete diffusion models to ensure prompts remain human-readable (low perplexity) and semantically aligned with the attacker's original intent. By replacing gradient-based token selection with generative diffusion priors, GCD achieves higher attack success rates while effectively evading common perplexity-based and guard-model filters.
Loading executive summary...