Crowd Counting via Perspective-Guided Fractional-Dilation Convolution

Research output: Journal Publications and Reviews (RGC: 21, 22, 62)21_Publication in refereed journalpeer-review

8 Scopus Citations
View graph of relations



Original languageEnglish
Pages (from-to)2633-2647
Journal / PublicationIEEE Transactions on Multimedia
Online published30 Jun 2021
Publication statusPublished - 2022


Crowd counting is critical for numerous video surveillance scenarios. One of the main issues in this task is how to handle the dramatic scale variations of pedestrians caused by the perspective effect. To address this issue, this paper proposes a novel convolution neural network-based crowd counting method, termed Perspective-guided Fractional-Dilation Network (PFDNet). By modeling the continuous scale variations, the proposed PFDNet is able to select the proper fractional-dilation kernels for adapting to different spatial locations. It significantly improves the flexibility of the most state-of-the-arts that only consider the discrete representative scales. In addition, by avoiding the multi-scale or multi-column architecture that used in other methods, it is computationally more efficient. In practice, the proposed PFDNet is constructed by stacking multiple Perspective-guided Fractional-Dilation Convolutions (PFC) on a VGG16-BN backbone. By introducing a novel generalized dilation convolution operation, the PFC can handle fractional dilation ratios in the spatial domain under the guidance of perspective annotations, achieving continuous scales modeling of pedestrians. To deal with the problem of unavailable perspective information in some cases, we further introduce an effective perspective estimation branch to the proposed PFDNet, which can be trained in either supervised or weakly-supervised setting once the branch has been pre-trained. Extensive experiments show that the proposed PFD-Net outperforms state-of-the-art methods on ShanghaiTech A, ShanghaiTech B, WorldExpo10, UCF-QNRF, UCFCC50 and TRANCOS dataset, achieving MAE53.8, 6.5, 6.8, 84.3, 205.8, and 3.06 respectively. The pre-trained models and source code are online available at

Research Area(s)

  • Annotations, Computer architecture, Computer science, Convolution, Estimation, Feature extraction, Kernel

Citation Format(s)

Crowd Counting via Perspective-Guided Fractional-Dilation Convolution. / Yan, Zhaoyi; Zhang, Ruimao; Zhang, Hongzhi et al.

In: IEEE Transactions on Multimedia, Vol. 24, 2022, p. 2633-2647.

Research output: Journal Publications and Reviews (RGC: 21, 22, 62)21_Publication in refereed journalpeer-review