Skip to main navigation Skip to search Skip to main content

Text-Guided Unsupervised Latent Transformation for Multi-attribute Image Manipulation

Xiwen Wei, Zhen Xu, Cheng Liu, Si Wu*, Zhiwen Yu, Hau San Wong

*Corresponding author for this work

Research output: Chapters, Conference Papers, Creative and Literary WorksRGC 32 - Refereed conference paper (with host publication)peer-review

Abstract

Great progress has been made in StyleGAN-based image editing. To associate with preset attributes, most existing approaches focus on supervised learning for semantically meaningful latent space traversal directions, and each manipulation step is typically determined for an individual attribute. To address this limitation, we propose a Text-guided Unsupervised StyleGAN Latent Transformation (TUSLT) model, which adaptively infers a single transformation step in the latent space of StyleGAN to simultaneously manipulate multiple attributes on a given input image. Specifically, we adopt a two-stage architecture for a latent mapping network to break down the transformation process into two manageable steps. Our network first learns a diverse set of semantic directions tailored to an input image, and later nonlinearly fuses the ones associated with the target attributes to infer a residual vector. The resulting tightly interlinked two-stage architecture delivers the flexibility to handle diverse attribute combinations. By leveraging the cross-modal text-image representation of CLIP, we can perform pseudo annotations based on the semantic similarity between preset attribute text descriptions and training images, and further jointly train an auxiliary attribute classifier with the latent mapping network to provide semantic guidance. We perform extensive experiments to demonstrate that the adopted strategies contribute to the superior performance of TUSLT. © 2023 IEEE.
Original languageEnglish
Title of host publicationProceedings - 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition
PublisherIEEE
Pages19285-19294
ISBN (Electronic)9798350301298
ISBN (Print)979-8-3503-0130-4
DOIs
Publication statusPublished - 2023
Event2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR 2023) - Vancouver Convention Center, Vancouver, Canada
Duration: 18 Jun 202322 Jun 2023
https://cvpr2023.thecvf.com/Conferences/2023
https://openaccess.thecvf.com/menu
https://ieeexplore.ieee.org/xpl/conhome/1000147/all-proceedings

Publication series

NameProceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition
ISSN (Print)1063-6919
ISSN (Electronic)2575-7075

Conference

Conference2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR 2023)
Abbreviated titleCVPR2023
PlaceCanada
CityVancouver
Period18/06/2322/06/23
Internet address

Funding

This work was supported in part by the China Scholarship Council, in part by the National Natural Science Foundation of China (Project No. 62072189), in part by the Research Grants Council of the Hong Kong Special Administration Region (Project No. CityU 11206622), and in part by the Natural Science Foundation of Guangdong Province (Project No. 2022A1515011160).

Research Keywords

  • Image and video synthesis and generation

RGC Funding Information

  • RGC-funded

Fingerprint

Dive into the research topics of 'Text-Guided Unsupervised Latent Transformation for Multi-attribute Image Manipulation'. Together they form a unique fingerprint.

Cite this