Projects per year
Abstract
With the advent of generative models and vision language pretraining significant improvement has been made in text-driven face manipulation. The text embedding can be used as target supervision for expression control. However it is non-trivial to associate with its 3D attributes, i.e. pose and illumination. To address these issues we propose a Text-conditional Attribute aLignment approach for 3D controllable face image synthesis and our model is referred to as TcALign. Specifically since the 3D rendered image can be precisely controlled with the 3D face representation we first propose a Text-conditional 3D Editor to produce the target face representation to realize text-driven manipulation in the 3D space. An attribute embedding space spanned by the target-related attributes embeddings is also introduced to infer the disentangled task-specific direction. Next we train a cross-modal latent mapping network conditioned on the derived difference of 3D representation to infer a correct vector in the latent space of StyleGAN. This correction vector learning design can accurately transfer the attribute manipulation on 3D images to 2D images. We show that the proposed method delivers more precise text-driven multi-attribute manipulation for 3D controllable face image synthesis. Extensive qualitative and quantitative experiments verify the effectiveness and superiority of our method over the other competing methods.
Original language | English |
---|---|
Title of host publication | Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2024 |
Publisher | IEEE |
Pages | 9172-9181 |
DOIs | |
Publication status | Published - Jun 2024 |
Event | 2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR 2024) - Seattle Convention Center, Seattle, United States Duration: 17 Jun 2024 → 21 Jun 2024 https://cvpr.thecvf.com/Conferences/2024 https://ieeexplore.ieee.org/xpl/conhome/1000147/all-proceedings |
Conference
Conference | 2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR 2024) |
---|---|
Country/Territory | United States |
City | Seattle |
Period | 17/06/24 → 21/06/24 |
Internet address |
Bibliographical note
Information for this record is supplemented by the author(s) concerned.Funding
This work was supported in part by the National Natural Science Foundation of China (Project No. 62072189), in part by the Research Grants Council of the Hong Kong Special Administration Region (Project No. CityU 11206622), in part by the GuangDong Basic and Applied Basic Research Foundation (Project No. 2020A1515010484, 2022A1515011160), and in part by TCL Science and Technology Innovation Fund (Project No. 20231752).
Fingerprint
Dive into the research topics of 'Text-conditional Attribute Alignment across Latent Spaces for 3D Controllable Face Image Synthesis'. Together they form a unique fingerprint.Projects
- 1 Active
-
GRF: Beyond Data Augmentation: Generative Modeling of Close-to-real Training Examples in Machine Learning through Domain Knowledge Injection
WONG, H. S. (Principal Investigator / Project Coordinator)
1/01/23 → …
Project: Research