Abstract
Semantic image synthesis aims to generate high-fidelity images from a segmentation mask, and previous methods typically train a generator to associate a global random map with the conditioning mask. However, the lack of independent control of regional content impedes their application. To address this issue, we propose an effective approach for Multi-modal conditioning-based Diverse Semantic Image Synthesis, which is referred to as McDSIS. In this model, there are a number of constituent generators incorporated to synthesize the content in semantic regions from independent random maps. The regional content can be determined by the style code associated with a random map, extracted from a reference image, or by embedding a textual description via our proposed conditioning mechanisms. As a result, the generation process is spatially disentangled, which facilitates independent synthesis of diverse content in a semantic region, while at the same time preserving other content. Due to this flexible architecture, in addition to achieving superior performance over state-of-the-art semantic image generation models, McDSIS is capable of performing various visual tasks, such as face inpainting, swapping, local editing, etc. © 2024 Elsevier B.V.
| Original language | English |
|---|---|
| Article number | 112727 |
| Journal | Knowledge-Based Systems |
| Volume | 309 |
| Online published | 19 Nov 2024 |
| DOIs | |
| Publication status | Published - 30 Jan 2025 |
Research Keywords
- Constituent generators
- Multi-modal conditioning-based editing
- Semantic image synthesis
- Spatially disentangled synthesis
Fingerprint
Dive into the research topics of 'Diverse Semantic Image Synthesis with various conditioning modalities'. Together they form a unique fingerprint.Cite this
- APA
- Author
- BIBTEX
- Harvard
- Standard
- RIS
- Vancouver