Hunting Blemishes : Language-guided High-fidelity Face Retouching Transformer with Limited Paired Data

Research output: Chapters, Conference Papers, Creative and Literary WorksRGC 32 - Refereed conference paper (with host publication)peer-review

View graph of relations

Author(s)

  • Le Jiang
  • Yan Huang
  • Lianxin Xie
  • Wen Xue
  • Si Wu

Related Research Unit(s)

Detail(s)

Original languageEnglish
Title of host publicationMM ’24
Subtitle of host publicationProceedings of the 32nd ACM International Conference on Multimedia
PublisherAssociation for Computing Machinery
Pages5102-5111
ISBN (print)9798400706868
Publication statusPublished - 2024

Publication series

NameMM - Proceedings of the ACM International Conference on Multimedia

Conference

Title32nd ACM International Conference on Multimedia (MM 2024)
PlaceAustralia
CityMelbourne
Period28 October - 1 November 2024

Abstract

The prevalence of multimedia applications has led to increased concerns and demand for auto face retouching. Face retouching aims to enhance portrait quality by removing blemishes. However, the existing auto-retouching methods rely heavily on a large amount of paired training samples, and perform less satisfactorily when handling complex and unusual blemishes. To address this issue, we propose a Language-guided Blemish Removal Transformer for automatically retouching face images, while at the same time reducing the dependency of the model on paired training data. Our model is referred to as LangBRT, which leverages vision-language pre-training for precise facial blemish removal. Specifically, we design a text-prompted blemish detection module that indicates the regions to be edited. The priors not only enable the transformer network to handle specific blemishes in certain areas, but also reduce the reliance on retouching training data. Further, we adopt a target-aware cross attention mechanism, such that the blemish-like regions are edited accurately while at the same time maintaining the normal skin regions unchanged. Finally, we adopt a regularization approach to encourage the semantic consistency between the synthesized image and the text description of the desired retouching outcome. Extensive experiments are performed to demonstrate the superior performance of LangBRT over competing auto-retouching methods in terms of dependency on training data, blemish detection accuracy and synthesis quality. © 2024 Copyright held by the owner/author(s). Publication rights licensed to ACM.

Research Area(s)

  • blemish detection, face retouching, transformer, vision-language pre-training

Bibliographic Note

Full text of this publication does not contain sufficient affiliation information. With consent from the author(s) concerned, the Research Unit(s) information for this record is based on the existing academic department affiliation of the author(s).

Citation Format(s)

Hunting Blemishes: Language-guided High-fidelity Face Retouching Transformer with Limited Paired Data. / Jiang, Le; Huang, Yan; Xie, Lianxin et al.
MM ’24: Proceedings of the 32nd ACM International Conference on Multimedia. Association for Computing Machinery, 2024. p. 5102-5111 (MM - Proceedings of the ACM International Conference on Multimedia).

Research output: Chapters, Conference Papers, Creative and Literary WorksRGC 32 - Refereed conference paper (with host publication)peer-review