Abstract
The rapid advancement of deep generative models has substantially lowered the barrier to manipulating or synthesizing highly realistic human face images and videos. While these technologies unlock a wide range of innovative applications, they also give rise to DeepFakes (also known as face forgery) and AI-generated faces, raising serious concerns about misuse, e.g., identity fraud, non-consensual content creation, and political misinformation. Such threats pose significant risks to individual rights and public trust in digital media.To mitigate these risks, numerous detection methods have been proposed. However, most existing approaches are model-driven, relying heavily on low-level artifacts introduced by specific generation or manipulation techniques, which significantly limits their generalizability. Beyond the issue of overfitting, these methods also face two fundamental conceptual challenges: (1) what digital manipulations render a real photographic face image fake, while others do not? and (2) should AI-generated face detection inherently be framed as a binary "real vs. fake" classification task? In this thesis, we revisit the foundation of face forensics by emphasizing the semantic integrity and intrinsic photographic characteristics of human faces.
For DeepFake detection, we redefine face forgery in a semantic context that computational methods that alter semantic face attributes to exceed human discrimination thresholds are sources of face forgery. Based on this definition, we construct a large-scale face forgery dataset grounded in comprehensive psychophysical experiments, with hierarchical labels spanning global attributes and local facial regions, and two new testing protocols to probe the generalizability of existing detectors. Furthermore, we propose a semantics-oriented detection paradigm that captures label relationships and integrates features across semantic levels, outperforming traditional binary and multi-class classifiers. To scale this paradigm, we further introduce an automatic dataset expansion method that broadens current face forgery datasets to support semantics-oriented DeepFake detection. We also delve deeper into semantics-oriented multitask learning for face forgery detection, leveraging the relationships among face semantics via joint embedding. This approach eliminates the need for manually setting task-agnostic and task-specific parameters.
In addition, we employ a bi-level optimization strategy to dynamically balance the fidelity loss weightings of various tasks, making the training process fully automated.
For AI-generated face detection, we move beyond the "real vs. fake" dichotomy and instead frame the task as distinguishing photographic from synthetic content, relying on self-supervised learning (SSL) of camera-intrinsic and face-specific features purely from photographic face images. We first propose a self-supervised anomaly detection approach. Optimized using an equally weighted sum of fidelity losses across five pretext tasks, i.e., ranking four ordinal exchangeable image file format (EXIF) tags (i.e., aperture, exposure time, focal length, and ISO speed) and classifying artificially manipulated face images, the learned features enable anomaly detection of AI-generated faces through a Gaussian mixture model. We further develop a bi-level optimization framework to bridge the gap between SSL pretext tasks and the downstream detection objective. The inner loop optimization aims to train a feature extractor using linearly weighted objectives of several pretext tasks, while the outer loop prioritizes a surrogate detection task approximating AI face detection, directing the feature extractor to adapt to detecting AI faces by optimizing the linear weightings to align the task relationships.
Extensive experiments demonstrate that our methods outperform state-of-the-art detectors in both generalization and interpretability. This work offers a new perspective on face forensics by shifting from artifact-specific detection to fundamentally understanding the semantic and intrinsic properties of human face images.
| Date of Award | 20 Aug 2025 |
|---|---|
| Original language | English |
| Awarding Institution |
|
| Supervisor | Kede MA (Supervisor) |
Cite this
- Standard