Deep Learning Based Image Translation for Face Verification and Makeup Transfer in the Wild


Student thesis: Doctoral Thesis

View graph of relations


Related Research Unit(s)


Awarding Institution
Award date20 Dec 2022


Face appearance is a crucial biometric feature that expresses identity, expression, race, emotion, and even attraction. Face-related subjects have always attracted scientists' attention, especially face image translation and recognition. With the development of deep learning, face image translation and recognition performance are enhanced significantly. Furthermore, the related technologies have been applied in real-world scenarios, e.g., mobile authentication, face detection and tracking in surveillance systems, and face beauty in camera applications. However, there are plenty of low-quality faces in the wild that are affected by illumination, occlusion, capture distance, etc. These low-quality face images degrade the performance of face image translation and face recognition models significantly. Although face-related subjects have been studied over a long period, low-quality face-related research is still a challenging subject. In this thesis, we introduce deep-learning-based image translation approaches with identity preservation to enhance face verification accuracy and the performance of facial makeup transfer in the wild.

First, there are many high-performance face verification models trained on high-resolution face images. However, because there is a large domain gap between high-resolution (source domain) and low-resolution (target domain) images in the wild, the performance of the well-trained verification models degrades significantly when they are applied to target domain. To alleviate this problem, we propose Dual Domain Adaptive Translation method (DDAT). Specifically, we use feature-level domain adaptation in the latent space to align the distributions of the down-sampled images on the source domain and the low-resolution images on the target domain. Meanwhile, we perform image-level domain adaptation between the generated images of the target domain and high-resolution images of the source domain to preserve identity consistency and low-level attributes. Furthermore, we design an anti-perturbation classifier to improve the verification accuracy and robustness. Experimental results verify that DDAT achieves improved results in high-quality face generation and low-resolution face verification.

Second, mainstream low-resolution (LR) face verification models utilize super-resolution models to supplement more details for LR face images to improve face verification accuracy. Generic super-resolution models are trained on down-sampled LR images obtained by a Gaussian kernel and corresponding high-resolution (HR) images on the source domain. However, the performance on the target domain is unsatisfactory because of the large domain gap between down-sampled LR images on the source domain and LR images on the target domain. To overcome the limitation, we propose a Task-specific Image Degeneration and Enhancement module (TIDE) for LR face verification. TIDE degrades HR face images on the source domain to follow the distribution of LR images on the target domain and encourages identity preservation for the face verification task. We conduct extensive experiments on multiple benchmarks to verify the effectiveness of the proposed model.

Third, as Deep Neural Networks (DNNs) develop, the accuracy of face recognition has been improved significantly. However, DNNs are vulnerable to adversarial attacks, which lead to a significant reduction in classification performance. Existing adversarial defense approaches are generally used to enhance the robustness of classifiers. We introduce a fresh perspective that there is a positive correlation between cross-domain image enhancement and adversarial defense. To this end, we propose a COllaborative Face Enhancement Module (COFEM), which is a new attempt of utilizing adversarial defense to extract domain-invariant features for cross-domain image translation. Extensive experiments demonstrate the effectiveness of our approach in improving the low-quality face image quality and verification accuracy.

Apart from face image translation applied for low-quality face verification, we also pay attention to facial makeup transfer in the wild. Different from generic image translation, makeup transfer aims at replicating the reference makeup style while at the same time preserving facial identity and appearance of non-makeup regions, like background and hair. Although the existing works have brought great progress in this task, we find significant performance degradation when the input image quality is less satisfactory. We aim to improve the applicability and quality of facial makeup transformation in the wild. To achieve this, we propose a Quality-enhanced and Semantics-aware Makeup Transfer model (QS-MTran), which is based on a two-stage architecture for enhancing source image quality and rendering reference makeup from early to late stages. Extensive experiments are conducted to demonstrate that our design choices are needed for effective makeup transfer and the resulting tightly interlinked two-stage architecture is able to deliver significant performance gains on multiple datasets across a wide range of makeup styles.