Advancing Flexibility in 3D Generation and Optimization via Continuous Representation

通過連續表徵提升三維生成和優化靈活性

Student thesis: Doctoral Thesis

View graph of relations

Author(s)

Related Research Unit(s)

Detail(s)

Awarding Institution
Supervisors/Advisors
Award date19 Jul 2024

Abstract

A fundamental task in the realm of computer vision and computer graphics is to learn how to represent the shape and appearance of a 3D scene based on limited 2D observations. One major challenge in learning 3D perception from 2D images is how to avoid multi-view inconsistency. Coordinate-based 3D generative models have emerged as a mainstream methodology, as a physical neural rendering process provides better 3D consistency. Recent neural radiance fields have achieved state-of-the-art performance by mapping spatial coordinates and camera views to volume density and the light field using a neural network. The crucial and enduring insight of replacing previous discrete representations with continuous representations enables differentially optimized possibilities and makes it feasible to synthesize high-resolution outcomes. This efficient and potent method, along with numerous other variants, has been successfully employed in various applications, including novel view synthesis, scene relighting, large-scale scene reconstruction, and simultaneous localization and mapping.

A remarkable manifestation of a profound understanding of 3D representation is the ability to create and manipulate them. 2D generative adversarial networks have yielded impressive results especially in photo-real image synthesis. To overcome the challenges of multi-view inconsistency with 2D generation, 3D-aware generation relies on a combination of a 3D-structure-aware inductive bias in the generator architecture and a neural rendering engine that aims at providing view-consistent results.

Motivated by the application of continuous representation in the 3D generation, this powerful technology can also be applied to traditional optimization. Continuation optimization, also called homotopy optimization, is a general optimization strategy for solving complicated and highly non-convex optimization problems which can be found in many machine learning applications.

In this dissertation, we focus on four parts that advance flexibility in 3D generation and optimization via continuous representation. The first part discusses how to model appearance and geometry and generate dense shape correspondence simultaneously among objects of the same category from only multi-view posed images. Our method can also accommodate annotation transfer in a one or few-shot manner, given only one or a few instances of the category. The second part explores semantic disentanglement in the 3D-aware latent space. We propose a general framework and present two representative approaches for the 3D manipulation task in both supervised and unsupervised manners. This enhances direct compatibility with 3D control by utilizing latent discovery methods. In the third part, we design a novel model-based approach to learn the whole continuation path for homotopy optimization, which is significantly different from the existing methods that iteratively solve a sequence of finite subproblems. The last part proposes leverage of multiple approaches, including gradient-based meta-learning algorithms and adversarial optimization algorithms, to enhance the performance and generalization of continuous representations.