Projects per year
Abstract
Scene graphs offer a structured, hierarchical representation of images, with nodes and edges symbolizing objects and the relationships among them. It can serve as a natural interface for image editing, dramatically improving precision and flexibility. Leveraging this benefit, we introduce a new framework that integrates large language model (LLM) with Text2Image generative model for scene graph-based image editing. This integration enables precise modifications at the object level and creative recomposition of scenes without compromising overall image integrity. Our approach involves two primary stages: 1) Utilizing a LLM-driven scene parser, we construct an image's scene graph, capturing key objects and their interrelationships, as well as parsing fine-grained attributes such as object masks and descriptions. These annotations facilitate concept learning with a fine-tuned diffusion model, representing each object with an optimized token and detailed description prompt. 2) During the image editing phase, a LLM editing controller guides the edits towards specific areas. These edits are then implemented by an attention-modulated diffusion editor, utilizing the fine-tuned model to perform object additions, deletions, replacements, and adjustments. Through extensive experiments, we demonstrate that our framework significantly outperforms existing image editing methods in terms of editing precision and scene aesthetics. Our code is available at https://bestzzhang.github.io/SGEdit. © 2024 Copyright is held by the owner/author(s). Publication rights licensed to ACM.
| Original language | English |
|---|---|
| Article number | 195 |
| Journal | ACM Transactions on Graphics |
| Volume | 43 |
| Issue number | 6 |
| Online published | 19 Nov 2024 |
| DOIs | |
| Publication status | Published - Dec 2024 |
Bibliographical note
Full text of this publication does not contain sufficient affiliation information. With consent from the author(s) concerned, the Research Unit(s) information for this record is based on the existing academic department affiliation of the author(s).Funding
We thank the anonymous reviewers for helping us to improve this paper. The work described in this paper was fully supported by a GRF grant from the Research Grants Council (RGC) of the Hong Kong Special Administrative Region, China [Project No. CityU 11216122].
Research Keywords
- diffusion model
- image editing
- scene graph
RGC Funding Information
- RGC-funded
Fingerprint
Dive into the research topics of 'SGEdit: Bridging LLM with Text2Image Generative Model for Scene Graph-based Image Editing'. Together they form a unique fingerprint.Projects
- 1 Active
-
GRF: Towards Controllable and Efficient Generation of High-Quality Visual Content with Transformers
LIAO, J. (Principal Investigator / Project Coordinator)
1/01/23 → …
Project: Research