Abstract
3D point cloud is an irregular and unordered set of discrete spatial points depicting geometric information of 3D objects/scenes. With the thriving development of 3D sensing technology and the rapid popularization of 3D scanning equipment, point cloud data have become one of the most common 3D representation modalities widely applied in a variety of downstream applications, such as digital preservation, autonomous driving, virtual/augmented reality, and robotics. Accordingly, there is an increasingly growing demand for developing efficient and effective 3D point cloud modeling, processing, and analysis techniques.In recent years, deep learning frameworks have achieved remarkable success for modeling regularly-structured 2D visual signals, such as different forms of images and videos, which are defined on uniformly-distributed pixel grids. However, the unstructured nature of 3D point clouds poses significant challenges and difficulties in designing expressive learning operators for discriminative point feature extraction. Although numerous researches have been conducted to construct various types of deep set architectures, there still lacks unified, efficient, and scalable paradigms for the deep modeling of 3D point clouds. Additionally, since acquiring and annotating 3D point cloud data is known to be much more costly, laborious, and time-consuming compared with the 2D counterparts, there is also an obvious scarcity of large-scale, high-quality, and richly-labeled 3D point cloud datasets, further hindering the development of powerful deep learning frameworks for 3D point cloud processing.
In this thesis, we seek to empower the deep geometric modeling of 3D point clouds from perspectives of data representation structures and network learning mechanisms by introducing three different aspects of innovative works. In terms of data structures, we begin with introducing Flattening-Net for representing irregular 3D point clouds as regularly-structured 2D point geometry images (PGIs), which is further extended as a bi-directional cycle mapping framework dubbed as Flatten Anything Model (FAM) for achieving explicitly-constrained and geometrically-interpretable global free-boundary surface parameterization. In terms of learning mechanisms, we respectively introduce PointMCD and PointVST for cross-modal 2D-to-3D knowledge distillation and self-supervised point cloud backbone network pre-training.
Flattening-Net is an unsupervised deep neural architecture designed to approximate a locally-smooth surface flattening process while effectively preserving neighborhood consistency, by which an irregular 3D point cloud of arbitrary geometry and topology can be converted to a regular 2D PGI structure, where the coordinates of spatial points are captured in colors of image pixels. This approach aims at overcoming the essential technical difficulty in learning from irregular geometric signals, and the resulting PGI representations facilitate introducing well-established 2D image processing tools and learning components for 3D point cloud applications. To pursue higher-quality surface parameterization, we further construct a bi-directional cycle mapping framework called FAM, featured by global mapping and adaptively-deformed free boundary.
Going from data-level to model-level explorations, we introduce PointMCD for enhancing the performances of 3D point cloud learning frameworks in a task-specific manner by image-to-point distillation, with purposes of harnessing the maturity of 2D image modeling architectures and the abundance of labeled 2D image data. The visual knowledge extracted from the 2D teacher model with multi-view images as its inputs is transferred to guide the learning process of the 3D point cloud student model. Besides, in order to empower 3D point cloud backbone networks in a task-agnostic and generic manner, we further propose PointVST for self-supervised pre-training of deep set architectures. We investigate view-specific point-to-image translation as the pretext task for promoting backbone feature extractors to learn highly expressive and transferable 3D geometric representations from massive amounts of unlabeled point cloud data.
In essence, the ultimate goal of this thesis is to overcome the unstructured nature of point clouds and enhance the feature extraction capability of point cloud learning networks. We conduct in-depth and comprehensive studies on: 1) point cloud structurization and parameterization; 2) image-to-point knowledge distillation; 3) self-supervised point cloud backbone pre-training. Our proposed methods offer practical and insightful solutions to efficient and effective deep geometric modeling of 3D point clouds.
| Date of Award | 9 Aug 2024 |
|---|---|
| Original language | English |
| Awarding Institution |
|
| Supervisor | Junhui HOU (Supervisor) |
Cite this
- Standard