Diverse Semantic Image Synthesis with various conditioning modalities

Research output: Journal Publications and ReviewsRGC 21 - Publication in refereed journalpeer-review

View graph of relations

Author(s)

Related Research Unit(s)

Detail(s)

Original languageEnglish
Article number112727
Journal / PublicationKnowledge-Based Systems
Volume309
Online published19 Nov 2024
Publication statusOnline published - 19 Nov 2024

Abstract

Semantic image synthesis aims to generate high-fidelity images from a segmentation mask, and previous methods typically train a generator to associate a global random map with the conditioning mask. However, the lack of independent control of regional content impedes their application. To address this issue, we propose an effective approach for Multi-modal conditioning-based Diverse Semantic Image Synthesis, which is referred to as McDSIS. In this model, there are a number of constituent generators incorporated to synthesize the content in semantic regions from independent random maps. The regional content can be determined by the style code associated with a random map, extracted from a reference image, or by embedding a textual description via our proposed conditioning mechanisms. As a result, the generation process is spatially disentangled, which facilitates independent synthesis of diverse content in a semantic region, while at the same time preserving other content. Due to this flexible architecture, in addition to achieving superior performance over state-of-the-art semantic image generation models, McDSIS is capable of performing various visual tasks, such as face inpainting, swapping, local editing, etc. © 2024 Elsevier B.V.

Research Area(s)

  • Constituent generators, Multi-modal conditioning-based editing, Semantic image synthesis, Spatially disentangled synthesis