Adaptive momentum weight averaging reduces initialization noise

Jia Wan, Ziquan Liu, Junyu Gao*, Xia Wu, Antoni B. Chan

*Corresponding author for this work

Research output: Journal Publications and ReviewsRGC 21 - Publication in refereed journalpeer-review

Abstract

This paper investigates the training process of Crowd Counting Networks, which is often disrupted by noise. First, the training is sensitive to noisy initialization, making it difficult to evaluate the effectiveness of a novel model. Second, the learning curve exhibits significant fluctuations due to inherent noise in gradients and loss values, increasing the risk of overfitting to the validation set while degrading performance on the test set. To address these two issues, we propose Adaptive Momentum Weight Averaging (AMWA) to smoothen the loss surface and stabilize the training process. The network is updated based on weight averaging with an adaptive momentum that is dynamically determined by the validation error and the learning process variations. Then, our theoretical analysis shows that the proposed method decreases the variance of the network parameters during training, and improves the robustness to initialization. In experiments, we observe that the AMWA generalizes better to the test set on a wide variety of architectures and tasks: STEERER, VGG, ResNet, Dilated Network (CSRNet), and WideResNet on crowd counting. We further evaluate the proposed method on other tasks including image aesthetic assessment, blind image quality analysis, and image classification. © 2025 Elsevier Ltd.
Original languageEnglish
Article number112297
Number of pages11
JournalPattern Recognition
Volume171
Issue numberPart B
Online published14 Aug 2025
DOIs
Publication statusOnline published - 14 Aug 2025

Funding

This work was supported by the National Natural Science Foundation of China under Grants 62406090 and 62306241.

Research Keywords

  • Crowd counting
  • Image aesthetic assessment
  • Stable training

Fingerprint

Dive into the research topics of 'Adaptive momentum weight averaging reduces initialization noise'. Together they form a unique fingerprint.

Cite this