Robust Zero-Shot Crowd Counting and Localization With Adaptive Resolution SAM
Research output: Chapters, Conference Papers, Creative and Literary Works › RGC 32 - Refereed conference paper (with host publication) › peer-review
Author(s)
Related Research Unit(s)
Detail(s)
Original language | English |
---|---|
Title of host publication | Computer Vision – ECCV 2024 |
Subtitle of host publication | 18th European Conference, Proceedings, Part LVII |
Editors | Aleš Leonardis, Elisa Ricci, Stefan Roth, Olga Russakovsky, Torsten Sattler, Gül Varol |
Publisher | Springer, Cham |
Pages | 478-495 |
Edition | 1 |
ISBN (electronic) | 978-3-031-72998-0 |
ISBN (print) | 978-3-031-72997-3 |
Publication status | Published - 2024 |
Publication series
Name | Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) |
---|---|
Volume | 15115 LNCS |
ISSN (Print) | 0302-9743 |
ISSN (electronic) | 1611-3349 |
Conference
Title | 18th European Conference on Computer Vision (ECCV 2024) |
---|---|
Location | MiCo Milano |
Place | Italy |
City | Milan |
Period | 29 September - 4 October 2024 |
Link(s)
Abstract
The existing crowd counting models require extensive training data, which is time-consuming to annotate. To tackle this issue, we propose a simple yet effective crowd counting method by utilizing the Segment-Everything-Everywhere Model (SEEM), an adaptation of the Segmentation Anything Model (SAM), to generate pseudo-labels for training crowd counting models. However, our initial investigation reveals that SEEM’s performance in dense crowd scenes is limited, primarily due to the omission of many persons in high-density areas. To overcome this limitation, we propose an adaptive resolution SEEM to handle the scale variations, occlusions, and overlapping of people within crowd scenes. Alongside this, we introduce a robust localization method, based on Gaussian Mixture Models, for predicting the head positions in the predicted people masks. Given the mask and point pseudo-labels, we propose a robust loss function, which is designed to exclude uncertain regions based on SEEM’s predictions, thereby enhancing the training process of the counting network. Finally, we propose an iterative method for generating pseudo-labels. This method aims at improving the quality of the segmentation masks by identifying more tiny persons in high-density regions, which are often missed in the first pseudo-labeling iteration. Overall, our proposed method achieves the best unsupervised performance in crowd counting, while also being comparable to some classic supervised fully methods. This makes it a highly effective and versatile tool for crowd counting, especially in situations where labeled data is not available. © The Author(s), under exclusive license to Springer Nature Switzerland AG 2025.
Research Area(s)
- Crowd Counting, Crowd Localization, Segment Anything
Citation Format(s)
Robust Zero-Shot Crowd Counting and Localization With Adaptive Resolution SAM. / Wan, Jia; Wu, Qiangqiang; Lin, Wei et al.
Computer Vision – ECCV 2024: 18th European Conference, Proceedings, Part LVII. ed. / Aleš Leonardis; Elisa Ricci; Stefan Roth; Olga Russakovsky; Torsten Sattler; Gül Varol. 1. ed. Springer, Cham, 2024. p. 478-495 (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); Vol. 15115 LNCS).
Computer Vision – ECCV 2024: 18th European Conference, Proceedings, Part LVII. ed. / Aleš Leonardis; Elisa Ricci; Stefan Roth; Olga Russakovsky; Torsten Sattler; Gül Varol. 1. ed. Springer, Cham, 2024. p. 478-495 (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); Vol. 15115 LNCS).
Research output: Chapters, Conference Papers, Creative and Literary Works › RGC 32 - Refereed conference paper (with host publication) › peer-review