Understanding Crowded Scenes: Line Counting and Small Instance Detection
人群密集場景理解︰行人過線計數和小目標檢測
Student thesis: Doctoral Thesis
Author(s)
Related Research Unit(s)
Detail(s)
Awarding Institution | |
---|---|
Supervisors/Advisors |
|
Award date | 21 Apr 2016 |
Link(s)
Permanent Link | https://scholars.cityu.edu.hk/en/theses/theses(5f9caa71-2be1-4b90-9c74-cee504ff60d0).html |
---|---|
Other link(s) | Links |
Abstract
Crowd counting and detection are key technologies for crowded scene understanding. Crowd counting estimates the number of people contained in a region-of-interest (ROI) or passing a line-of-interest (LOI) in video, while detection finds the locations and bounding boxes of each person. These technologies have many potential applications, including security, surveillance, resource management and advertising. However, many factors, such as low image resolution and quality, background clutter, changes of illumination, variance of viewpoints, different poses of object, and occlusion among objects in crowded scenes, make this problem far from been solved.
For counting people in an ROI, we present a local histogram-of-oriented-gradients (LHOG) feature for people counting in crowd scenes without applying perspective normalization. First, the mixture of dynamic textures motion algorithm is applied to generate the crowd segmentation in two directions. Second, LHOG features are extracted from each image patch which is densely sampled over the crowd segment. Third, a global descriptor of the crowd is obtained by using a bag-of-words model on the set of extracted LHOG features. Finally, the relationship between the bag-of-words of HOG features and the number of people per frame is learned with Bayesian Poisson regression. Our framework is validated on the challenging UCSD pedestrian database containing over 49,800 pedestrian instances.
Next, we propose an integer programming method for estimating the instantaneous count of pedestrians crossing a LOI in a video sequence. Through a line sampling process, the video is first converted into a temporal slice image. Second, the number of people is estimated in a set of overlapping sliding windows on the temporal slice image, using a regression function that maps from local features to a count. Given that the count in a sliding window is the sum of the instantaneous counts in the corresponding time interval, an integer programming method is proposed to recover the number of pedestrians crossing the LOI in each frame. Integrating over a specific time interval yields the cumulative count of pedestrian crossing the line. Compared with current methods for line counting, our proposed approach achieves state-of-the-art performance on several challenging crowd video datasets.
Finally, we address the problem of partially-occluded small instance localization and bounding box estimation in crowded scenes by leveraging object density maps typically used for object counting. Small instances such as pedestrians in low resolution surveillance video, cells under a microscope, flocks of small animals, tiny insects and traffics in satellite image are very challenging for traditional detection algorithms due to overlapping clutter and small size in the image. Our approach avoids learning an individual-centric detector. Instead, we use measurements from the object density map of an image to recover the positions of the objects. In particular, we pass a sliding window over the density map to calculate the instance count within it. 2D integer programming is used to recover the locations of object instances from the set of sliding window counts, and the group-level count estimate of the density map is used as a constraint to regularize the detection performance. Finally, the bounding box for each instance is estimated via the local density distribution. Compared with current small-instance detection methods, our proposed approach achieves state-of-the-art performance on several challenging datasets including fluorescence microscopy cell images, pedestrians, small animals, insects and satellite images.
For counting people in an ROI, we present a local histogram-of-oriented-gradients (LHOG) feature for people counting in crowd scenes without applying perspective normalization. First, the mixture of dynamic textures motion algorithm is applied to generate the crowd segmentation in two directions. Second, LHOG features are extracted from each image patch which is densely sampled over the crowd segment. Third, a global descriptor of the crowd is obtained by using a bag-of-words model on the set of extracted LHOG features. Finally, the relationship between the bag-of-words of HOG features and the number of people per frame is learned with Bayesian Poisson regression. Our framework is validated on the challenging UCSD pedestrian database containing over 49,800 pedestrian instances.
Next, we propose an integer programming method for estimating the instantaneous count of pedestrians crossing a LOI in a video sequence. Through a line sampling process, the video is first converted into a temporal slice image. Second, the number of people is estimated in a set of overlapping sliding windows on the temporal slice image, using a regression function that maps from local features to a count. Given that the count in a sliding window is the sum of the instantaneous counts in the corresponding time interval, an integer programming method is proposed to recover the number of pedestrians crossing the LOI in each frame. Integrating over a specific time interval yields the cumulative count of pedestrian crossing the line. Compared with current methods for line counting, our proposed approach achieves state-of-the-art performance on several challenging crowd video datasets.
Finally, we address the problem of partially-occluded small instance localization and bounding box estimation in crowded scenes by leveraging object density maps typically used for object counting. Small instances such as pedestrians in low resolution surveillance video, cells under a microscope, flocks of small animals, tiny insects and traffics in satellite image are very challenging for traditional detection algorithms due to overlapping clutter and small size in the image. Our approach avoids learning an individual-centric detector. Instead, we use measurements from the object density map of an image to recover the positions of the objects. In particular, we pass a sliding window over the density map to calculate the instance count within it. 2D integer programming is used to recover the locations of object instances from the set of sliding window counts, and the group-level count estimate of the density map is used as a constraint to regularize the detection performance. Finally, the bounding box for each instance is estimated via the local density distribution. Compared with current small-instance detection methods, our proposed approach achieves state-of-the-art performance on several challenging datasets including fluorescence microscopy cell images, pedestrians, small animals, insects and satellite images.