Hunting Blemishes : Language-guided High-fidelity Face Retouching Transformer with Limited Paired Data
Research output: Chapters, Conference Papers, Creative and Literary Works › RGC 32 - Refereed conference paper (with host publication) › peer-review
Author(s)
Related Research Unit(s)
Detail(s)
Original language | English |
---|---|
Title of host publication | MM ’24 |
Subtitle of host publication | Proceedings of the 32nd ACM International Conference on Multimedia |
Publisher | Association for Computing Machinery |
Pages | 5102-5111 |
ISBN (print) | 9798400706868 |
Publication status | Published - 2024 |
Publication series
Name | MM - Proceedings of the ACM International Conference on Multimedia |
---|
Conference
Title | 32nd ACM International Conference on Multimedia (MM 2024) |
---|---|
Place | Australia |
City | Melbourne |
Period | 28 October - 1 November 2024 |
Link(s)
Abstract
The prevalence of multimedia applications has led to increased concerns and demand for auto face retouching. Face retouching aims to enhance portrait quality by removing blemishes. However, the existing auto-retouching methods rely heavily on a large amount of paired training samples, and perform less satisfactorily when handling complex and unusual blemishes. To address this issue, we propose a Language-guided Blemish Removal Transformer for automatically retouching face images, while at the same time reducing the dependency of the model on paired training data. Our model is referred to as LangBRT, which leverages vision-language pre-training for precise facial blemish removal. Specifically, we design a text-prompted blemish detection module that indicates the regions to be edited. The priors not only enable the transformer network to handle specific blemishes in certain areas, but also reduce the reliance on retouching training data. Further, we adopt a target-aware cross attention mechanism, such that the blemish-like regions are edited accurately while at the same time maintaining the normal skin regions unchanged. Finally, we adopt a regularization approach to encourage the semantic consistency between the synthesized image and the text description of the desired retouching outcome. Extensive experiments are performed to demonstrate the superior performance of LangBRT over competing auto-retouching methods in terms of dependency on training data, blemish detection accuracy and synthesis quality. © 2024 Copyright held by the owner/author(s). Publication rights licensed to ACM.
Research Area(s)
- blemish detection, face retouching, transformer, vision-language pre-training
Bibliographic Note
Full text of this publication does not contain sufficient affiliation information. With consent from the author(s) concerned, the Research Unit(s) information for this record is based on the existing academic department affiliation of the author(s).
Citation Format(s)
Hunting Blemishes: Language-guided High-fidelity Face Retouching Transformer with Limited Paired Data. / Jiang, Le; Huang, Yan; Xie, Lianxin et al.
MM ’24: Proceedings of the 32nd ACM International Conference on Multimedia. Association for Computing Machinery, 2024. p. 5102-5111 (MM - Proceedings of the ACM International Conference on Multimedia).
MM ’24: Proceedings of the 32nd ACM International Conference on Multimedia. Association for Computing Machinery, 2024. p. 5102-5111 (MM - Proceedings of the ACM International Conference on Multimedia).
Research output: Chapters, Conference Papers, Creative and Literary Works › RGC 32 - Refereed conference paper (with host publication) › peer-review