Learning Tracking Representations from Single Point Annotations

Research output: Chapters, Conference Papers, Creative and Literary WorksRGC 32 - Refereed conference paper (with host publication)peer-review

1 Scopus Citations
View graph of relations

Related Research Unit(s)

Detail(s)

Original languageEnglish
Title of host publicationProceedings - 2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops
Subtitle of host publicationCVPRW 2024
PublisherInstitute of Electrical and Electronics Engineers, Inc.
Pages2606-2615
ISBN (electronic)9798350365474
ISBN (print)979-8-3503-6548-1
Publication statusPublished - 2024

Publication series

NameIEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops
ISSN (Print)2160-7508
ISSN (electronic)2160-7516

Conference

Title2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW 2024)
PlaceUnited States
CitySeattle
Period16 - 22 June 2024

Abstract

Existing deep trackers are typically trained with largescale video frames with annotated bounding boxes. However, these bounding boxes are expensive and timeconsuming to annotate, in particular for large scale datasets. In this paper, we propose to learn tracking representations from single point annotations (i.e., 4.5× faster to annotate than the traditional bounding box) in a weakly supervised manner. Specifically, we propose a soft contrastive learning (SoCL) framework that incorporates target objectness prior into end-to-end contrastive learning. Our SoCL consists of adaptive positive and negative sample generation, which is memory-efficient and effective for learning tracking representations. We apply the learned representation of SoCL to visual tracking and show that our method can 1) achieve better performance than the fully supervised baseline trained with box annotations under the same annotation time cost; 2) achieve comparable performance of the fully supervised baseline by using the same number of training frames and meanwhile reducing annotation time cost by 78% and total fees by 85%; 3) be robust to annotation noise. © 2024 IEEE

Research Area(s)

  • Representation Learning, Video Object Tracking

Citation Format(s)

Learning Tracking Representations from Single Point Annotations. / Wu, Qiangqiang; Chan, Antoni B.
Proceedings - 2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops: CVPRW 2024. Institute of Electrical and Electronics Engineers, Inc., 2024. p. 2606-2615 (IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops).

Research output: Chapters, Conference Papers, Creative and Literary WorksRGC 32 - Refereed conference paper (with host publication)peer-review