Text-conditional Attribute Alignment across Latent Spaces for 3D Controllable Face Image Synthesis

Feifan Xu, Rui LI*, Si Wu*, Yong Xu, Hau San Wong

*Corresponding author for this work

Research output: Chapters, Conference Papers, Creative and Literary WorksRGC 32 - Refereed conference paper (with host publication)peer-review

Abstract

With the advent of generative models and vision language pretraining significant improvement has been made in text-driven face manipulation. The text embedding can be used as target supervision for expression control. However it is non-trivial to associate with its 3D attributes, i.e. pose and illumination. To address these issues we propose a Text-conditional Attribute aLignment approach for 3D controllable face image synthesis and our model is referred to as TcALign. Specifically since the 3D rendered image can be precisely controlled with the 3D face representation we first propose a Text-conditional 3D Editor to produce the target face representation to realize text-driven manipulation in the 3D space. An attribute embedding space spanned by the target-related attributes embeddings is also introduced to infer the disentangled task-specific direction. Next we train a cross-modal latent mapping network conditioned on the derived difference of 3D representation to infer a correct vector in the latent space of StyleGAN. This correction vector learning design can accurately transfer the attribute manipulation on 3D images to 2D images. We show that the proposed method delivers more precise text-driven multi-attribute manipulation for 3D controllable face image synthesis. Extensive qualitative and quantitative experiments verify the effectiveness and superiority of our method over the other competing methods.
Original languageEnglish
Title of host publicationProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2024
PublisherIEEE
Pages9172-9181
DOIs
Publication statusPublished - Jun 2024
Event2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR 2024)
- Seattle Convention Center, Seattle, United States
Duration: 17 Jun 202421 Jun 2024
https://cvpr.thecvf.com/Conferences/2024
https://ieeexplore.ieee.org/xpl/conhome/1000147/all-proceedings

Conference

Conference2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR 2024)
Country/TerritoryUnited States
CitySeattle
Period17/06/2421/06/24
Internet address

Bibliographical note

Information for this record is supplemented by the author(s) concerned.

Funding

This work was supported in part by the National Natural Science Foundation of China (Project No. 62072189), in part by the Research Grants Council of the Hong Kong Special Administration Region (Project No. CityU 11206622), in part by the GuangDong Basic and Applied Basic Research Foundation (Project No. 2020A1515010484, 2022A1515011160), and in part by TCL Science and Technology Innovation Fund (Project No. 20231752).

Fingerprint

Dive into the research topics of 'Text-conditional Attribute Alignment across Latent Spaces for 3D Controllable Face Image Synthesis'. Together they form a unique fingerprint.

Cite this