PHIDIAS: A GENERATIVE MODEL FOR CREATING 3D CONTENT FROM TEXT, IMAGE, AND 3D CONDITIONS WITH REFERENCE-AUGMENTED DIFFUSION

Zhenwei Wang (Co-first Author), Tengfei Wang* (Co-first Author), Zexin He, Gerhard Hancke, Ziwei Liu, Rynson W.H. Lau*

*Corresponding author for this work

Research output: Chapters, Conference Papers, Creative and Literary WorksRGC 32 - Refereed conference paper (with host publication)peer-review

3 Citations (Scopus)

Abstract

Generative 3D modeling has made significant advances recently, but it remains constrained by its inherently ill-posed nature, leading to challenges in quality and controllability. Inspired by the real-world workflow that designers typically refer to existing 3D models when creating new ones, we propose Phidias, a novel generative model that uses diffusion for reference-augmented 3D generation. Given an image, our method leverages a retrieved or user-provided 3D reference model to guide the generation process, thereby enhancing the generation quality, generalization ability, and controllability. Phidias integrates three key components: 1) meta-ControlNet to dynamically modulate the conditioning strength, 2) dynamic reference routing to mitigate misalignment between the input image and 3D reference, and 3) self-reference augmentations to enable self-supervised training with a progressive curriculum. Collectively, these designs result in significant generative improvements over existing methods. Phidias forms a unified framework for 3D generation using text, image, and 3D conditions, offering versatile applications. Project page: https://RAG-3D.github.io/.
Original languageEnglish
Title of host publicationInternational Conference on Representation Learning 2025 (ICLR 2025)
EditorsY. Yue, A. Garg, N. Peng, F. Sha, R. Yu
PublisherInternational Conference on Learning Representations, ICLR
Number of pages20
ISBN (Electronic)9798331320850
Publication statusPublished - 2025
Event13th International Conference on Learning Representations (ICLR 2025) - Singapore EXPO, Singapore
Duration: 24 Apr 202528 Apr 2025
https://iclr.cc/Conferences/2025

Conference

Conference13th International Conference on Learning Representations (ICLR 2025)
Abbreviated titleICLR 2025
PlaceSingapore
Period24/04/2528/04/25
Internet address

Bibliographical note

Research Unit(s) information for this publication is provided by the author(s) concerned.

Funding

This work is partially supported by the National Key R&D Program of China (2022ZD0160201) and Shanghai Artificial Intelligence Laboratory. This work is also in part supported by a GRF grant from the Research Grants Council of Hong Kong (Ref. No.: 11205620).

Fingerprint

Dive into the research topics of 'PHIDIAS: A GENERATIVE MODEL FOR CREATING 3D CONTENT FROM TEXT, IMAGE, AND 3D CONDITIONS WITH REFERENCE-AUGMENTED DIFFUSION'. Together they form a unique fingerprint.
  • GRF: Learning to Predict Scene Contexts

    LAU, R. W. H. (Principal Investigator / Project Coordinator), FU, H. (Co-Investigator) & FU, C. W. (Co-Investigator)

    1/01/2112/06/25

    Project: Research

Cite this