Abstract
Pre-training across 3D vision and language remains un- der development because of limited training data. Re- cent works attempt to transfer vision-language (V-L) pre- training methods to 3D vision. However, the domain gap between 3D and images is unsolved, so that V-L pre-trained models are restricted in 3D downstream tasks. To ad- dress this issue, we propose CLIP2Point, an image-depth pre-training method by contrastive learning to transfer CLIP to the 3D domain, and adapt it to point cloud classification. We introduce a new depth rendering setting that forms a better visual effect, and then render 52,460 pairs of images and depth maps from ShapeNet for pre- training. The pre-training scheme of CLIP2Point combines cross-modality learning to enforce the depth features for capturing expressive visual and textual features and intra-modality learning to enhance the invariance of depth aggregation. Additionally, we propose a novel Gated Dual-Path Adapter (GDPA), i.e., a dual-path structure with global-view aggregators and gated fusion for down- stream representative learning. It allows the ensemble of CLIP and CLIP2Point, tuning pre-training knowledge to downstream tasks in an efficient adaptation. Experimental results show that CLIP2Point is effective in transferring CLIP knowledge to 3D vision. CLIP2Point out- performs other 3D transfer learning and pre-training net- works, achieving state-of-the-art results on zero-shot, few- shot, and fully-supervised classification. © 2023 IEEE
| Original language | English |
|---|---|
| Title of host publication | Proceedings - 2023 IEEE/CVF International Conference on Computer Vision |
| Subtitle of host publication | ICCV 2023 |
| Publisher | IEEE |
| Pages | 22157-22167 |
| ISBN (Electronic) | 979-8-3503-0718-4 |
| ISBN (Print) | 979-8-3503-0719-1 |
| DOIs | |
| Publication status | Published - Oct 2023 |
| Event | 2023 IEEE International Conference on Computer Vision (ICCV 2023) - Paris Convention Center , Paris, France Duration: 2 Oct 2023 → 6 Oct 2023 https://iccv2023.thecvf.com/ |
Conference
| Conference | 2023 IEEE International Conference on Computer Vision (ICCV 2023) |
|---|---|
| Abbreviated title | ICCV23 |
| Place | France |
| City | Paris |
| Period | 2/10/23 → 6/10/23 |
| Internet address |
Fingerprint
Dive into the research topics of 'CLIP2Point: Transfer CLIP to Point Cloud Classification with Image-Depth Pre-Training'. Together they form a unique fingerprint.Cite this
- APA
- Author
- BIBTEX
- Harvard
- Standard
- RIS
- Vancouver