Wide-area Crowd Counting on Camera Networks using Multi-view Fusion

Project: Research

View graph of relations


Automatic crowd counting in images has important real-world applications in security, crowd management, urban planning, and retail. For example, in retail, shopper counts could be used to plan staffing schedules to maximize sales and minimize staffing cost, while counts on an MTR subway platform can help managers to improve customer experience and safety. One of the most effective methods for crowd counting is based on crowd-density maps, where summing over a region in the density map yields the count. Current methods have focused on counting from a single camera view of a scene using deep CNNs, and have shown good generalization ability across scenes. However, because of the limited field-of-view of a single camera, single-view counting cannot be applied to wide-area or irregularly-shaped venues (e.g., public parks, event spaces, subway platforms), which are increasingly important places for security control and crowd management in Hong Kong.In this project, we research crowd counting using multiple camera views (i.e., a camera network) to improve its accuracy and applicability to wide-area venues. We formulate our research problem within the context of the inherent challenges of collecting and processing image streams from camera networks in the real-world. First, we assume that bandwidth and computational resources only allow images (and not video) to be transmitted from the camera network and analyzed. Second, we assume that the cameras are loosely synchronized in time, i.e., images are captured roughly at the same time within some temporal threshold (e.g., 3 seconds), which allows situations where precise time synchronization is not possible (e.g., panning cameras). Third, to be deployable in the real-world, the model should generalize across scenes without re-training.Under the above multi-view cross-scene setting, we propose two deep learning architectures that fuse multiple camera views to predict a single density map for the whole scene. We extend our model to handle loosely synchronized images, as well as fine-grain counting where the crowd count is further divided into attributes, such as facing direction and movement type. To train and test our models, we will collect a large-scale dataset for multi-view crowd-counting. The proposed research enables large-scale and wide-area crowd counting of public areas (e.g., subway platforms, public squares, streets, public events), which are of significant interest to Hong Kong due to its large and dense population. The project will also further research in multi-camera fusion methods with deep learning, while the collected dataset will be a benchmark for future work.


Project number9042659
Grant typeGRF
Effective start/end date1/09/1827/02/23