Skip to main navigation Skip to search Skip to main content

CLIP2Point: Transfer CLIP to Point Cloud Classification with Image-Depth Pre-Training

Research output: Chapters, Conference Papers, Creative and Literary WorksRGC 32 - Refereed conference paper (with host publication)peer-review

Abstract

Pre-training across 3D vision and language remains un- der development because of limited training data. Re- cent works attempt to transfer vision-language (V-L) pre- training methods to 3D vision. However, the domain gap between 3D and images is unsolved, so that V-L pre-trained models are restricted in 3D downstream tasks. To ad- dress this issue, we propose CLIP2Point, an image-depth pre-training method by contrastive learning to transfer CLIP to the 3D domain, and adapt it to point cloud classification. We introduce a new depth rendering setting that forms a better visual effect, and then render 52,460 pairs of images and depth maps from ShapeNet for pre- training. The pre-training scheme of CLIP2Point combines cross-modality learning to enforce the depth features for capturing expressive visual and textual features and intra-modality learning to enhance the invariance of depth aggregation. Additionally, we propose a novel Gated Dual-Path Adapter (GDPA), i.e., a dual-path structure with global-view aggregators and gated fusion for down- stream representative learning. It allows the ensemble of CLIP and CLIP2Point, tuning pre-training knowledge to downstream tasks in an efficient adaptation. Experimental results show that CLIP2Point is effective in transferring CLIP knowledge to 3D vision. CLIP2Point out- performs other 3D transfer learning and pre-training net- works, achieving state-of-the-art results on zero-shot, few- shot, and fully-supervised classification. © 2023 IEEE
Original languageEnglish
Title of host publicationProceedings - 2023 IEEE/CVF International Conference on Computer Vision
Subtitle of host publicationICCV 2023
PublisherIEEE
Pages22157-22167
ISBN (Electronic)979-8-3503-0718-4
ISBN (Print)979-8-3503-0719-1
DOIs
Publication statusPublished - Oct 2023
Event2023 IEEE International Conference on Computer Vision (ICCV 2023) - Paris Convention Center , Paris, France
Duration: 2 Oct 20236 Oct 2023
https://iccv2023.thecvf.com/

Conference

Conference2023 IEEE International Conference on Computer Vision (ICCV 2023)
Abbreviated titleICCV23
PlaceFrance
CityParis
Period2/10/236/10/23
Internet address

Fingerprint

Dive into the research topics of 'CLIP2Point: Transfer CLIP to Point Cloud Classification with Image-Depth Pre-Training'. Together they form a unique fingerprint.

Cite this