Neural Inverse Rendering: Physically-Based Approaches for Image-Based 3D Reconstruction

Student thesis: Doctoral Thesis

Abstract

The creation and understanding of realistic three-dimensional (3D) scenes are fundamental to numerous applications in computer graphics, virtual reality, augmented reality, and digital content creation. Inverse rendering, the process of estimating intrinsic scene properties such as geometry, materials, and lighting from image observations, lies at the heart of this field. While neural rendering techniques have demonstrated remarkable progress in synthesizing photorealistic novel views, neural inverse rendering remains a significant challenge, particularly in achieving robust and physically plausible results under complex real-world conditions. These challenges include handling degraded inputs like low-light or noisy images, accurately recovering disentangled material properties that respond correctly to novel lighting, and efficiently learning from limited or unstructured data. The inherent ill-posedness of neural inverse rendering, where multiple combinations of scene properties can explain the same rendering result, further complicates these tasks.

This thesis focuses on advancing the field of physically-based neural inverse rendering by developing novel methods that leverage deep learning and machine learning methods, and an understanding of physical light transport. We aim to enhance the realism, robustness, and physical consistency of 3D scene representations recovered from 2D images. Our contributions span several key areas: unsupervised learning for neural radiance field enhancement from degraded low-light noisy input images, accurate modeling of global illumination and material interactions modeling for glossy objects inverse rendering, and a material-aware 3D reconstruction method from single-view images using generative priors. These approaches employ unsupervised, self-supervised, and synthetic data-driven learning paradigms, as well as principles of physically based rendering (PBR) to achieve promising results.

First, we address the challenge of reconstructing high-quality scene representations from low-light, low dynamic range (LDR) sRGB images, where traditional Neural Radiance Field (NeRF) methods often falter due to low signal-to-noise ratios, color distortion, and heavy noise. We introduce LLNeRF, an unsupervised framework that decomposes the radiance field learning process into illumination-related and illumination-independent components, making the radiance field learning feasible under extremely low-light conditions. By jointly optimizing these components with carefully designed prior-based unsupervised enhancement loss functions, LLNeRF effectively enhances scene illumination, reduces noise, and corrects color distortions directly during the NeRF optimization process, enabling the synthesis of normal-light novel views without requiring ground truth supervision.

Second, to tackle the inverse rendering of glossy objects with complex specular reflections and local light interactions, which are often oversimplified by methods relying on 2D lighting environmental maps, we propose NeP (Neural Plenoptic Function). This method extends NeRF principles by incorporating path tracing and a more sophisticated 5D representation of the global light field. We formulate radiance fields to represent the geometry of the target object and the environmental lighting separately. NeP introduces a material-aware cone sampling strategy to efficiently integrate incident radiance over BRDF lobes, leveraging pre-filtered radiance fields. This two-stage approach first reconstructs object geometry and the environmental radiance field, and subsequently estimates high-fidelity, physically-based rendering (PBR) materials, leading to more accurate reconstruction results.

Third, we address a highly ill-posed problem of generating 3D models with disentangled, physically-based material properties from a single input image. We present MAGE (Material-Aware 3D via the Multi-View G-Buffer Estimation Model), a novel approach inspired by deferred rendering pipelines in traditional computer graphics. MAGE learns to predict multi-view G-buffers—comprising XYZ coordinates, surface normals, albedo, roughness, and metallic properties—from a single RGB image. To address the inherent ambiguity and ensure consistency across these predicted G-buffers, we develop a deterministic network architecture derived from pre-trained diffusion models and introduce a physically-based lighting response loss. This loss enforces consistency between the estimated material attributes and rendered appearance under known lighting conditions, guided by PBR principles. Furthermore, we built a large-scale synthetic dataset with rich material diversity to facilitate robust model training.

The methods proposed in this thesis are evaluated through extensive experiments, demonstrating state-of-the-art performance on their respective tasks and showcasing improvements in the visual quality of reconstructed 3D scenes. Collectively, these contributions push the boundaries of physically-informed inverse rendering, paving the way for more realistic, controllable, and readily usable 3D digital assets in a wide array of applications.
Date of Award5 Sept 2025
Original languageEnglish
Awarding Institution
  • City University of Hong Kong
SupervisorRynson W H LAU (Supervisor)

Cite this

'