Unveiling the Power of Self-Supervision for Multi-View Multi-Human Association and Tracking

Research output: Journal Publications and ReviewsRGC 21 - Publication in refereed journalpeer-review

1 Scopus Citations
View graph of relations

Author(s)

  • Wei Feng
  • Feifan Wang
  • Ruize Han
  • Yiyang Gan
  • Zekun Qian
  • Song Wang

Detail(s)

Original languageEnglish
Article number10684138
Pages (from-to)351-368
Number of pages18
Journal / PublicationIEEE Transactions on Pattern Analysis and Machine Intelligence
Volume47
Issue number1
Online published19 Sept 2024
Publication statusPublished - Jan 2025

Abstract

Multi-view multi-human association and tracking (MvMHAT), is an emerging yet important problem for multi-person scene video surveillance, aiming to track a group of people over time in each view, as well as to identify the same person across different views at the same time, which is different from previous MOT and multi-camera MOT tasks only considering the over-time human tracking. This way, the videos for MvMHAT require more complex annotations while containing more information for self-learning. In this work, we tackle this problem with an end-to-end neural network in a self-supervised learning manner. Specifically, we propose to take advantage of the spatial-temporal self-consistency rationale by considering three properties of reflexivity, symmetry, and transitivity. Besides the reflexivity property that naturally holds, we design the self-supervised learning losses based on the properties of symmetry and transitivity, for both appearance feature learning and assignment matrix optimization, to associate multiple humans over time and across views. Furthermore, to promote the research on MvMHAT, we build two new large-scale benchmarks for the network training and testing of different algorithms. Extensive experiments on the proposed benchmarks verify the effectiveness of our method. We have released the benchmark and code to the public.

© 2024 IEEE.

Research Area(s)

  • multiple object tracking, human association, multi-view cameras, self-supervised learning

Bibliographic Note

Research Unit(s) information for this publication is provided by the author(s) concerned.

Citation Format(s)

Unveiling the Power of Self-Supervision for Multi-View Multi-Human Association and Tracking. / Feng, Wei; Wang, Feifan; Han, Ruize et al.
In: IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol. 47, No. 1, 10684138, 01.2025, p. 351-368.

Research output: Journal Publications and ReviewsRGC 21 - Publication in refereed journalpeer-review