Skip to main navigation Skip to search Skip to main content

FTGID: Fine-Grained Text-Driven Framework for Universal Generative Image Detection

  • Zhipeng Huang
  • , Liqun Lin*
  • , Bolin Chen
  • , Yanjie Wang
  • , Tiesong Zhao
  • *Corresponding author for this work

Research output: Journal Publications and ReviewsRGC 21 - Publication in refereed journalpeer-review

Abstract

The rapid progress of generative models has made detecting realistic forgeries a critical challenge for security and trust. Existing image and frequency-based methods depend on dataset-specific artifacts with poor generalization, while Vision-Language Model (VLM)-based methods remain limited by coarse prompts and underused cross-modal alignment. To address these issues, we propose a Fine-grained Text-driven Generative Image Detection (FTGID) framework, which enables comprehensive detection through multi-modal cues. First, we design a Layer-wise Adaptive Global Extractor (LAGE) that stabilizes multi-level global representations through adaptive CLS token fusion with lightweight calibration and parameter-efficient tuning. Second, we propose a Fine-grained Text-guided Local Enhancer (FTLE) that performs patch-level text–visual interaction to enhance the localization of forgery-relevant regions. Third, we introduce a High-frequency Artifact Feature Extractor (HAFE) that adaptively captures discriminative high-frequency cues, enabling more reliable detection of subtle generative artifacts. Extensive experiments demonstrate that FTGID consistently outperforms state-of-the-art GID methods across diverse generative models and unseen datasets, achieving superior performance, thereby enhancing both robustness and interpretability in open-world generative image detection. Our codes will be made publicly available after the peer review process.

© 2026 IEEE. All rights reserved, including rights for text and data mining, and training of artificial intelligence and similar technologies. Personal use is permitted, but republication/redistribution requires IEEE permission.
Original languageEnglish
Pages (from-to)4547-4557
Number of pages11
JournalIEEE Transactions on Image Processing
Volume35
Online published28 Apr 2026
DOIs
Publication statusPublished - 2026

Research Keywords

  • Feeds
  • Antennas
  • Radio broadcasting
  • Frequency modulation
  • Filtering
  • Filters
  • Circuits and systems
  • High frequency
  • Videos
  • Deepfakes
  • Generative image detection
  • fine-grained text-driven
  • frequency-aware modeling

Fingerprint

Dive into the research topics of 'FTGID: Fine-Grained Text-Driven Framework for Universal Generative Image Detection'. Together they form a unique fingerprint.

Cite this