Abstract
Generative 3D modeling has made significant advances recently, but it remains constrained by its inherently ill-posed nature, leading to challenges in quality and controllability. Inspired by the real-world workflow that designers typically refer to existing 3D models when creating new ones, we propose Phidias, a novel generative model that uses diffusion for reference-augmented 3D generation. Given an image, our method leverages a retrieved or user-provided 3D reference model to guide the generation process, thereby enhancing the generation quality, generalization ability, and controllability. Phidias integrates three key components: 1) meta-ControlNet to dynamically modulate the conditioning strength, 2) dynamic reference routing to mitigate misalignment between the input image and 3D reference, and 3) self-reference augmentations to enable self-supervised training with a progressive curriculum. Collectively, these designs result in significant generative improvements over existing methods. Phidias forms a unified framework for 3D generation using text, image, and 3D conditions, offering versatile applications. Project page: https://RAG-3D.github.io/.
| Original language | English |
|---|---|
| Title of host publication | International Conference on Representation Learning 2025 (ICLR 2025) |
| Editors | Y. Yue, A. Garg, N. Peng, F. Sha, R. Yu |
| Publisher | International Conference on Learning Representations, ICLR |
| Number of pages | 20 |
| ISBN (Electronic) | 9798331320850 |
| Publication status | Published - 2025 |
| Event | 13th International Conference on Learning Representations (ICLR 2025) - Singapore EXPO, Singapore Duration: 24 Apr 2025 → 28 Apr 2025 https://iclr.cc/Conferences/2025 |
Conference
| Conference | 13th International Conference on Learning Representations (ICLR 2025) |
|---|---|
| Abbreviated title | ICLR 2025 |
| Place | Singapore |
| Period | 24/04/25 → 28/04/25 |
| Internet address |
Bibliographical note
Research Unit(s) information for this publication is provided by the author(s) concerned.Funding
This work is partially supported by the National Key R&D Program of China (2022ZD0160201) and Shanghai Artificial Intelligence Laboratory. This work is also in part supported by a GRF grant from the Research Grants Council of Hong Kong (Ref. No.: 11205620).
Fingerprint
Dive into the research topics of 'PHIDIAS: A GENERATIVE MODEL FOR CREATING 3D CONTENT FROM TEXT, IMAGE, AND 3D CONDITIONS WITH REFERENCE-AUGMENTED DIFFUSION'. Together they form a unique fingerprint.Projects
- 1 Finished
-
GRF: Learning to Predict Scene Contexts
LAU, R. W. H. (Principal Investigator / Project Coordinator), FU, H. (Co-Investigator) & FU, C. W. (Co-Investigator)
1/01/21 → 12/06/25
Project: Research
Student theses
-
Multi-Modal Generative Frameworks for Controllable Visual Content Creation
WANG, Z. (Author), HANCKE, G. P. (Supervisor) & LAU, R. W. H. (Co-supervisor), 16 Jul 2025Student thesis: Doctoral Thesis
Cite this
- APA
- Author
- BIBTEX
- Harvard
- Standard
- RIS
- Vancouver