Abstract
Great progress has been made in StyleGAN-based image editing. To associate with preset attributes, most existing approaches focus on supervised learning for semantically meaningful latent space traversal directions, and each manipulation step is typically determined for an individual attribute. To address this limitation, we propose a Text-guided Unsupervised StyleGAN Latent Transformation (TUSLT) model, which adaptively infers a single transformation step in the latent space of StyleGAN to simultaneously manipulate multiple attributes on a given input image. Specifically, we adopt a two-stage architecture for a latent mapping network to break down the transformation process into two manageable steps. Our network first learns a diverse set of semantic directions tailored to an input image, and later nonlinearly fuses the ones associated with the target attributes to infer a residual vector. The resulting tightly interlinked two-stage architecture delivers the flexibility to handle diverse attribute combinations. By leveraging the cross-modal text-image representation of CLIP, we can perform pseudo annotations based on the semantic similarity between preset attribute text descriptions and training images, and further jointly train an auxiliary attribute classifier with the latent mapping network to provide semantic guidance. We perform extensive experiments to demonstrate that the adopted strategies contribute to the superior performance of TUSLT. © 2023 IEEE.
| Original language | English |
|---|---|
| Title of host publication | Proceedings - 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition |
| Publisher | IEEE |
| Pages | 19285-19294 |
| ISBN (Electronic) | 9798350301298 |
| ISBN (Print) | 979-8-3503-0130-4 |
| DOIs | |
| Publication status | Published - 2023 |
| Event | 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR 2023) - Vancouver Convention Center, Vancouver, Canada Duration: 18 Jun 2023 → 22 Jun 2023 https://cvpr2023.thecvf.com/Conferences/2023 https://openaccess.thecvf.com/menu https://ieeexplore.ieee.org/xpl/conhome/1000147/all-proceedings |
Publication series
| Name | Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition |
|---|---|
| ISSN (Print) | 1063-6919 |
| ISSN (Electronic) | 2575-7075 |
Conference
| Conference | 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR 2023) |
|---|---|
| Abbreviated title | CVPR2023 |
| Place | Canada |
| City | Vancouver |
| Period | 18/06/23 → 22/06/23 |
| Internet address |
Funding
This work was supported in part by the China Scholarship Council, in part by the National Natural Science Foundation of China (Project No. 62072189), in part by the Research Grants Council of the Hong Kong Special Administration Region (Project No. CityU 11206622), and in part by the Natural Science Foundation of Guangdong Province (Project No. 2022A1515011160).
Research Keywords
- Image and video synthesis and generation
RGC Funding Information
- RGC-funded
Fingerprint
Dive into the research topics of 'Text-Guided Unsupervised Latent Transformation for Multi-attribute Image Manipulation'. Together they form a unique fingerprint.Projects
- 1 Active
-
GRF: Beyond Data Augmentation: Generative Modeling of Close-to-real Training Examples in Machine Learning through Domain Knowledge Injection
WONG, H. S. (Principal Investigator / Project Coordinator)
1/01/23 → …
Project: Research
Cite this
- APA
- Author
- BIBTEX
- Harvard
- Standard
- RIS
- Vancouver