Improving 3D Reconstruction through Data Prior Exploration

Student thesis: Doctoral Thesis

Abstract

Over the past few years, the field of computer vision and graphics has witnessed a surge of interest in reconstructing 3D scenes from images. This technique holds immense potential for various applications, including augmented and virtual reality, metaverse development, 3D printing, the movie industry, and immersive gaming. While numerous studies have explored 3D scene reconstruction using either multiple or single-view images, few have delved into the effective utilization of data prior information to enhance the quality of these reconstructions. In this thesis, we undertake a systematic analysis of the various types of data priors available for different 3D reconstruction tasks, aiming to enhance the quality of 3D scene reconstruction in specific scenarios, including dense multi-view reconstruction, sparse view reconstruction, single-view reconstruction, and zero-shot reconstruction. By leveraging appropriate data priors tailored to each task, we strive to achieve significant improvements in the overall reconstruction quality.

For dense multi-view reconstruction, we argue that the scene-specific image data is often sufficient or even redundant. The challenge lies in fully and effectively leveraging this data to optimize the reconstruction effect. However, unavoidable noise introduced during the capture of multi-view images and the estimation of camera poses can lead to errors in both geometry and texture, resulting in artifacts. To address this, we propose an innovative optimization method to maximize the utilization of scene-specific data priors based on differentiable rendering. This method integrates the optimization of camera pose, geometry, and texture into a single framework, enabling the creation of refined 3D models with both detailed geometry and high-quality texture.

In the context of sparse view reconstruction, the limited input images are insufficient to support high-quality 3D reconstruction due to the lack of necessary information. To overcome this, we introduce category-specific data priors. For instance, we propose a few-shot dynamic neural radiance field (FDNeRF) for dynamic 3D face reconstruction from a minimal number of dynamic input images. We train our 3D reconstruction model at the image feature level using a large volume of face data, enabling the model to identify features from sparse input views and produce high-quality, view-consistent 3D faces.

For single-view and zero-shot 3D reconstruction tasks, where information is severely limited, we introduce generative priors from cutting-edge pre-trained diffusion models. We offer two methods to utilize these generative priors: one is through data distillation, incorporating the prior information from the 2D image generation model into the 3D model reconstruction process; the other is to directly use the outcome of the image generation models as image-level supervision to provide generative priors for the 3D reconstruction process.

For each of the aforementioned reconstruction scenarios, we conduct comprehensive comparisons with numerous state-of-the-art baseline methods to showcase the effectiveness of the introduced data priors. Through extensive quantitative and qualitative experiments, we demonstrate that our solution, by properly utilizing data prior information, excels in completing the respective 3D reconstruction tasks and yields high-quality 3D assets. The results obtained from these experiments provide compelling evidence of the superiority of our approach over existing methods.
Date of Award8 Aug 2025
Original languageEnglish
Awarding Institution
  • City University of Hong Kong
SupervisorJing LIAO (Supervisor)

Keywords

  • 3D Reconstruction
  • NeRF
  • 3D Generation
  • Diffusion Priors

Cite this

'