3D Crowd Counting via Geometric Attention-Guided Multi-view Fusion

Research output: Journal Publications and ReviewsRGC 21 - Publication in refereed journalpeer-review

5 Scopus Citations
View graph of relations

Related Research Unit(s)

Detail(s)

Original languageEnglish
Pages (from-to)3123–3139
Journal / PublicationInternational Journal of Computer Vision
Volume130
Issue number12
Online published29 Sept 2022
Publication statusPublished - Dec 2022

Abstract

Recently multi-view crowd counting using deep neural networks has been proposed to enable counting in large and wide scenes using multiple cameras. The current methods project the camera-view features to the average-height plane of the 3D world, and then fuse the projected multi-view features to predict a 2D scene-level density map on the ground (i.e., birds-eye view). Unlike the previous research, we consider the variable height of the people in the 3D world and propose to solve the multi-view crowd counting task through 3D feature fusion with 3D scene-level density maps, instead of the 2D density map on the ground-plane. Compared to 2D fusion, the 3D fusion extracts more information of the people along the z-dimension (height), which helps to address the scale variations across multiple views. The 3D density maps still preserve the 2D density maps property that the sum is the count, while also providing 3D information about the crowd density. Furthermore, instead of using the standard method of copying the features along the view ray in the 2D-to-3D projection, we propose an attention module based on a height estimation network, which forces each 2D pixels to be projected to one 3D voxel along the view ray. We also explore the projection consistency among the 3D prediction and the ground-truth in the 2D views to further enhance the counting performance. The proposed method is tested on the synthetic and real-world multi-view counting datasets and achieves better or comparable counting performance to the state-of-the-art.

Research Area(s)

  • 2D-3D projection, 3D fusion, 3D projection, Crowd counting, geometric attention, height estimation