High-Fidelity Pluralistic Image Completion with Transformers

Research output: Chapters, Conference Papers, Creative and Literary WorksRGC 32 - Refereed conference paper (with host publication)peer-review

161 Scopus Citations
View graph of relations

Author(s)

Related Research Unit(s)

Detail(s)

Original languageEnglish
Title of host publicationProceedings - 2021 IEEE/CVF International Conference on Computer Vision
Subtitle of host publicationICCV 2021
PublisherInstitute of Electrical and Electronics Engineers, Inc.
Pages4672-4681
Number of pages10
ISBN (electronic)9781665428125
ISBN (print)978-1-6654-2813-2
Publication statusPublished - Oct 2021

Publication series

NameProceedings of the IEEE International Conference on Computer Vision
ISSN (Print)1550-5499
ISSN (electronic)2380-7504

Conference

Title18th IEEE/CVF International Conference on Computer Vision (ICCV 2021)
LocationVirtual
PlaceCanada
CityMontreal
Period11 - 17 October 2021

Abstract

Image completion has made tremendous progress with convolutional neural networks (CNNs), because of their powerful texture modeling capacity. However, due to some inherent properties (e.g., local inductive prior, spatialinvariant kernels), CNNs do not perform well in understanding global structures or naturally support pluralistic completion. Recently, transformers demonstrate their power in modeling the long-term relationship and generating diverse results, but their computation complexity is quadratic to input length, thus hampering the application in processing high-resolution images. This paper brings the best of both worlds to pluralistic image completion: appearance prior reconstruction with transformer and texture replenishment with CNN. The former transformer recovers pluralistic coherent structures together with some coarse textures, while the latter CNN enhances the local texture details of coarse priors guided by the high-resolution masked images. The proposed method vastly outperforms state-ofthe-art methods in terms of three aspects: 1) large performance boost on image fidelity even compared to deterministic completion methods; 2) better diversity and higher fidelity for pluralistic completion; 3) exceptional generalization ability on large masks and generic dataset, like ImageNet. Code and pre-trained models have been publicly released at https://github.com/raywzy/ICT.

Bibliographic Note

Research Unit(s) information for this publication is provided by the author(s) concerned.

Citation Format(s)

High-Fidelity Pluralistic Image Completion with Transformers. / Wan, Ziyu; Zhang, Jingbo; Chen, Dongdong et al.
Proceedings - 2021 IEEE/CVF International Conference on Computer Vision: ICCV 2021. Institute of Electrical and Electronics Engineers, Inc., 2021. p. 4672-4681 (Proceedings of the IEEE International Conference on Computer Vision).

Research output: Chapters, Conference Papers, Creative and Literary WorksRGC 32 - Refereed conference paper (with host publication)peer-review