Abstract
The rapid progress of generative models has made detecting realistic forgeries a critical challenge for security and trust. Existing image and frequency-based methods depend on dataset-specific artifacts with poor generalization, while Vision-Language Model (VLM)-based methods remain limited by coarse prompts and underused cross-modal alignment. To address these issues, we propose a Fine-grained Text-driven Generative Image Detection (FTGID) framework, which enables comprehensive detection through multi-modal cues. First, we design a Layer-wise Adaptive Global Extractor (LAGE) that stabilizes multi-level global representations through adaptive CLS token fusion with lightweight calibration and parameter-efficient tuning. Second, we propose a Fine-grained Text-guided Local Enhancer (FTLE) that performs patch-level text–visual interaction to enhance the localization of forgery-relevant regions. Third, we introduce a High-frequency Artifact Feature Extractor (HAFE) that adaptively captures discriminative high-frequency cues, enabling more reliable detection of subtle generative artifacts. Extensive experiments demonstrate that FTGID consistently outperforms state-of-the-art GID methods across diverse generative models and unseen datasets, achieving superior performance, thereby enhancing both robustness and interpretability in open-world generative image detection. Our codes will be made publicly available after the peer review process.
© 2026 IEEE. All rights reserved, including rights for text and data mining, and training of artificial intelligence and similar technologies. Personal use is permitted, but republication/redistribution requires IEEE permission.
© 2026 IEEE. All rights reserved, including rights for text and data mining, and training of artificial intelligence and similar technologies. Personal use is permitted, but republication/redistribution requires IEEE permission.
| Original language | English |
|---|---|
| Pages (from-to) | 4547-4557 |
| Number of pages | 11 |
| Journal | IEEE Transactions on Image Processing |
| Volume | 35 |
| Online published | 28 Apr 2026 |
| DOIs | |
| Publication status | Published - 2026 |
Research Keywords
- Feeds
- Antennas
- Radio broadcasting
- Frequency modulation
- Filtering
- Filters
- Circuits and systems
- High frequency
- Videos
- Deepfakes
- Generative image detection
- fine-grained text-driven
- frequency-aware modeling
Fingerprint
Dive into the research topics of 'FTGID: Fine-Grained Text-Driven Framework for Universal Generative Image Detection'. Together they form a unique fingerprint.Cite this
- APA
- Author
- BIBTEX
- Harvard
- Standard
- RIS
- Vancouver