Img2Img AI Behavioral Fingerprints

Arxiv pdf 2026-06-01T00:00:00
arXiv Paper — PDF not available. Only the Executive Summary is available here. To read or download the full paper, visit the arXiv abstract page.

Abstract

We study six production image-to-image AI systems (gpt-image-1, Gemini 2.5 Flash Image, Flux Kontext, SDXL img2img, SD3 img2img, and Qwen Image Edit) under a content-adaptive sub-JND adversarial perturbation pipeline, scoring all outputs by frozen DINOv2 ViT-B/14 token distances against clean references. Across a 3,588-call corpus spanning COCO photographs, CelebA-HQ portraits, and AI-generated inputs, the six systems partition into two image-invariant behavioral bands on a 2D (patch_mean, ssim_clean) plane: edit-trained models (Flux Kontext, Qwen Edit, Gemini) cluster in a tight band, while T2I-base models adapted at sampling time (SDXL, SD3, gptimage-1) cluster in a drift band. The discriminating variable is training paradigm rather than architecture family: AI identity explains 69.5% of behavioral variance while image domain explains 0.2%. The discriminating axis divides the diffusion family (Flux Kontext tight, SDXL/SD3 drift) and through the multimodal-AR family (Qwen Edit tight, gpt-image-1 drift). Six-way leave-one-out attribution accuracy is 51.4% [49.3, 53.4] versus 16.7% chance; threeway pilot accuracy is 76.6% [70.3, 84.4]. Two blind baselines on the identical corpus (AEROBLADE 0.222, PRISM-style 0.373) trail this substantially and are at chance within the edit-trained band, demonstrating that the reference image is the key forensic signal rather than the detector architecture. We also quantify differential perturbation survival across these architectures: roughly 98% intact through Gemini, roughly 80% through Flux despite SSIM 0.99 visual fidelity, and overwritten by gpt-image-1, providing a systematic measurement of how the diffusion-purification effect varies across deployed commercial img2img systems. The results reframe pixel-domain perturbation pipelines from defenses into forensic primitives for reference-anchored AI-processing attribution, with deployment-ready thresholds and a behavioral feature space validated at corpus scale.

Loading executive summary...

LINK COPIED TO CLIPBOARD