SGEdit: Bridging LLM with Text2Image Generative Model for Scene Graph-based Image Editing

Zhiyuan Zhang, Dongdong Chen, Jing Liao*

*Corresponding author for this work

Research output: Journal Publications and ReviewsRGC 21 - Publication in refereed journalpeer-review

1 Citation (Scopus)

Abstract

Scene graphs offer a structured, hierarchical representation of images, with nodes and edges symbolizing objects and the relationships among them. It can serve as a natural interface for image editing, dramatically improving precision and flexibility. Leveraging this benefit, we introduce a new framework that integrates large language model (LLM) with Text2Image generative model for scene graph-based image editing. This integration enables precise modifications at the object level and creative recomposition of scenes without compromising overall image integrity. Our approach involves two primary stages: 1) Utilizing a LLM-driven scene parser, we construct an image's scene graph, capturing key objects and their interrelationships, as well as parsing fine-grained attributes such as object masks and descriptions. These annotations facilitate concept learning with a fine-tuned diffusion model, representing each object with an optimized token and detailed description prompt. 2) During the image editing phase, a LLM editing controller guides the edits towards specific areas. These edits are then implemented by an attention-modulated diffusion editor, utilizing the fine-tuned model to perform object additions, deletions, replacements, and adjustments. Through extensive experiments, we demonstrate that our framework significantly outperforms existing image editing methods in terms of editing precision and scene aesthetics. Our code is available at https://bestzzhang.github.io/SGEdit. © 2024 Copyright is held by the owner/author(s). Publication rights licensed to ACM.
Original languageEnglish
Article number195
JournalACM Transactions on Graphics
Volume43
Issue number6
Online published19 Nov 2024
DOIs
Publication statusPublished - Dec 2024

Bibliographical note

Full text of this publication does not contain sufficient affiliation information. With consent from the author(s) concerned, the Research Unit(s) information for this record is based on the existing academic department affiliation of the author(s).

Funding

We thank the anonymous reviewers for helping us to improve this paper. The work described in this paper was fully supported by a GRF grant from the Research Grants Council (RGC) of the Hong Kong Special Administrative Region, China [Project No. CityU 11216122].

Research Keywords

  • diffusion model
  • image editing
  • scene graph

RGC Funding Information

  • RGC-funded

Fingerprint

Dive into the research topics of 'SGEdit: Bridging LLM with Text2Image Generative Model for Scene Graph-based Image Editing'. Together they form a unique fingerprint.

Cite this