Abstract
This paper introduces a novel framework for multi-concept customization in text-to-image diffusion models. At its core is a Multiple Prompt Adapter (MP-Adapter) capable of processing multiple image prompts in parallel, extracting features from target concepts and projecting them into the same latent space as the text prompt. This enables simultaneous handling of multiple concepts using just one reference image per concept. To address challenges in fusing multiple concepts with complex interactions, we propose a Region-based Denoising Framework (RDF) that dynamically generates concept-specific regions of interest during inference, allowing spatially decoupled injection of concept features. By integrating the MP-Adapter and RDF, our end-to-end pipeline enables multi-concept customization with intricate occlusions and interactions while preserving concept identities. This approach surpasses current methods by resolving concept conflicts, identity degradation, and occlusion issues, allowing flexible customization without concept-specific retraining. Both qualitative and quantitative evaluations demonstrate that our framework outperforms state-of-the-art approaches in multi-concept customization tasks, while ablation studies validate the effectiveness of each proposed component. This work significantly advances text-to-image generation capabilities for complex, user-defined concept combinations. Code and models will be released at https://github.com/baojudezeze/RMP-Adapter. © 2025 Elsevier Ltd.
| Original language | English |
|---|---|
| Article number | 126936 |
| Journal | Expert Systems with Applications |
| Volume | 274 |
| Online published | 25 Feb 2025 |
| DOIs | |
| Publication status | Published - 15 May 2025 |
Research Keywords
- Adapter
- Image generation
- Multi-concept customization
- Text-to-image diffusion model
Publisher's Copyright Statement
- This full text is made available under CC-BY 4.0. https://creativecommons.org/licenses/by/4.0/