Few-Shot Object Detection with Fully Cross-Transformer

Guangxing Han, Jiawei Ma, Shiyuan Huang, Long Chen, Shih-Fu Chang

Research output: Chapters, Conference Papers, Creative and Literary WorksRGC 32 - Refereed conference paper (with host publication)peer-review

178 Citations (Scopus)

Abstract

Few-shot object detection (FSOD), with the aim to detect novel objects using very few training examples, has recently attracted great research interest in the community. Metric-learning based methods have been demonstrated to be effective for this task using a two-branch based siamese network, and calculate the similarity between image regions and few-shot examples for detection. However, in previous works, the interaction between the two branches is only restricted in the detection head, while leaving the remaining hundreds of layers for separate feature extraction. Inspired by the recent work on vision transformers and vision-language transformers, we propose a novel Fully Cross-Transformer based model (FCT) for FSOD by incorporating cross-transformer into both the feature backbone and detection head. The asymmetric-batched cross-attention is proposed to aggregate the key information from the two branches with different batch sizes. Our model can improve the few-shot similarity learning between the two branches by introducing the multi-level interactions. Comprehensive experiments on both PASCAL VOC and MSCOCO FSOD benchmarks demonstrate the effectiveness of our model. © 2022 IEEE.
Original languageEnglish
Title of host publicationProceedings - 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR 2022)
PublisherIEEE
Pages5311-5320
ISBN (Electronic)978-1-6654-6946-3
DOIs
Publication statusPublished - 2022
Externally publishedYes
Event2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR 2022) - Hybrid, New Orleans, United States
Duration: 19 Jun 202224 Jun 2022
https://cvpr2022.thecvf.com/

Publication series

NameProceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition
Volume2022-June
ISSN (Print)1063-6919

Conference

Conference2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR 2022)
PlaceUnited States
CityNew Orleans
Period19/06/2224/06/22
Internet address

Research Keywords

  • categorization
  • Recognition: detection
  • retrieval
  • Transfer/low-shot/long-tail learning

Fingerprint

Dive into the research topics of 'Few-Shot Object Detection with Fully Cross-Transformer'. Together they form a unique fingerprint.

Cite this