Uni-paint: A Unified Framework for Multimodal Image Inpainting with Pretrained Diffusion Model

Shiyuan Yang, Xiaodong Chen, Jing Liao*

*Corresponding author for this work

Research output: Chapters, Conference Papers, Creative and Literary WorksRGC 32 - Refereed conference paper (with host publication)peer-review

42 Citations (Scopus)

Abstract

Recently, text-to-image denoising diffusion probabilistic models (DDPMs) have demonstrated impressive image generation capabilities and have also been successfully applied to image inpainting. However, in practice, users often require more control over the inpainting process beyond textual guidance, especially when they want to composite objects with customized appearance, color, shape, and layout. Unfortunately, existing diffusion-based inpainting methods are limited to single-modal guidance and require task-specific training, hindering their cross-modal scalability. To address these limitations, we propose Uni-paint, a unified framework for multimodal inpainting that offers various modes of guidance, including unconditional, text-driven, stroke-driven, exemplar-driven inpainting, as well as a combination of these modes. Furthermore, our Uni-paint is based on pretrained Stable Diffusion and does not require task-specific training on specific datasets, enabling few-shot generalizability to customized images. We have conducted extensive qualitative and quantitative evaluations that show our approach achieves comparable results to existing single-modal methods while offering multimodal inpainting capabilities not available in other methods. Code is available at https://github.com/ysy31415/unipaint. © 2023 Copyright held by the owner/author(s). Publication rights licensed to ACM.
Original languageEnglish
Title of host publicationMM '23: Proceedings of the 31st ACM International Conference on Multimedia
PublisherAssociation for Computing Machinery
Pages3190-3199
ISBN (Print)9798400701085
DOIs
Publication statusPublished - Oct 2023
Event31st ACM International Conference on Multimedia (MM 2023) - Westin Ottawa, Ottawa, Canada
Duration: 29 Oct 20233 Nov 2023
https://www.acmmm2023.org/accommodation/

Publication series

NameMM - Proceedings of the ACM International Conference on Multimedia

Conference

Conference31st ACM International Conference on Multimedia (MM 2023)
Abbreviated titleMM '23
Country/TerritoryCanada
CityOttawa
Period29/10/233/11/23
Internet address

Bibliographical note

Full text of this publication does not contain sufficient affiliation information. With consent from the author(s) concerned, the Research Unit(s) information for this record is based on the existing academic department affiliation of the author(s).

Funding

This work is supported by GRF grant (Project No. CityU 11208123) from the Research Grants Council (RGC) of Hong Kong. We also thank Unsplash and the photographers for generously sharing their high-quality, free-to-use images used in this research.

Research Keywords

  • diffusion model
  • image inpainting
  • multimodal

Fingerprint

Dive into the research topics of 'Uni-paint: A Unified Framework for Multimodal Image Inpainting with Pretrained Diffusion Model'. Together they form a unique fingerprint.

Cite this