3D Shape Completion with Reduced Supervision: From Learning Mechanisms to Prior Guidance

Student thesis: Doctoral Thesis

Abstract

3D shapes, as digital representations of real-world objects, are important research subjects in fields such as computer vision, computer graphics, and artificial intelligence. Typical 3D shapes include point clouds, voxels, meshes, and so on. Thanks to the development of laser scanners, depth cameras, and other sensor devices, 3D shapes play an indispensable role in applications such as engineering design, medical imaging, virtual reality, and autonomous driving.
However, due to limitations of data collection devices, occlusion effects, and other issues, 3D shapes are often incomplete. This incompleteness significantly affects the execution of subsequent tasks, such as object recognition, shape analysis, and scene understanding. 3D shape completion aims to infer and reconstruct the missing geometric information of an individual object through algorithms, restoring the shape to its complete state. This not only improves the usability of 3D shapes but also provides higher-quality data support for downstream tasks.

Although significant progress has been made in research on 3D shape completion, current approaches predominantly rely on fully supervised learning, which necessitates large-scale, high-quality labeled datasets and limits generalization to real-world scenarios with incomplete observations. In this thesis, we address these challenges by proposing novel approaches that minimize the reliance on fully supervised learning, focusing on two key aspects: learning mechanisms and prior guidance. In terms of learning mechanisms, we introduce cross-modal learning and optimize supervision strategies to enhance model adaptability and performance with limited labeled data. For prior guidance, we develop and leverage multi-source data priors, integrating geometric and topological guidance to improve completion accuracy and robustness. The detailed research contents of this thesis are as follows:

We propose a cross-modal unsupervised 3D point cloud completion model. This model utilizes both RGB images and partial point clouds to complete objects without requiring any complete 3D point clouds for supervision. First, to take advantage of the complementary information from 2D images, we use a single-view RGB image to extract 2D features and design a fusion module to fuse the 2D and 3D features extracted from the partial point cloud. To guide the shape of predicted point clouds, we project the predicted points of the object to the 2D plane and use the foreground pixels of its silhouette maps to constrain the position of the projected points. To reduce the outliers of the predicted point clouds, we propose a view calibrator to move the points projected to the background into the foreground by the single-view silhouette image. To the best of our knowledge, our approach is the first cross-modal point cloud completion method that does not require any 3D supervision.

We propose an unsupervised point cloud completion model based on multi-view adversarial learning. This model effectively utilizes the geometric similarity of local regions and category-specific common structures to complete missing parts without requiring any complete point clouds or multi-view images for supervision. We first introduce a Pattern Retrieval Network to retrieve similar position and curvature patterns between the partial input and the predicted shape, then leverage these similarities to densify and refine the reconstructed results. Additionally, we render the reconstructed complete shape into multi-view depth maps and design an adversarial learning module to learn the geometry of the target shape from category-specific single-view depth images of the partial point clouds in the training set. To achieve anisotropic rendering, we design a density-aware radius estimation algorithm to improve the quality of the rendered images.

We propose a weakly supervised learning framework to improve the accuracy of completing partial shapes from unseen categories. We first propose an end-to-end prior-assisted shape learning network that leverages data from the seen categories to infer a coarse shape. Specifically, we construct a prior bank consisting of representative shapes from the seen categories. Then, we design a multi-scale pattern correlation module for learning the complete shape of the input by analyzing the correlation between local patterns within the input and the priors at various scales. In addition, we propose a self-supervised shape refinement model to further refine the coarse shape. Considering the shape variability of 3D objects across categories, we construct a category-specific prior bank to facilitate shape refinement. Then, we devise a voxel-based partial matching loss and leverage the partial scans to drive the refinement process. Extensive experimental results show that our approach is superior to state-of-the-art methods by a large margin.
Date of Award18 Aug 2025
Original languageEnglish
Awarding Institution
  • City University of Hong Kong
SupervisorJunhui HOU (Supervisor) & Yong Xu (External Supervisor)

Cite this

'