Skip to main navigation Skip to search Skip to main content

Improve Generalization and Robustness of Neural Networks via Weight Scale Shifting Invariant Regularizations

Ziquan Liu*, Yufei Cui, Antoni B. Chan

*Corresponding author for this work

Research output: Conference PapersRGC 32 - Refereed conference paper (without host publication)peer-review

Abstract

Using weight decay to penalize the L2 norms of weights in neural networks has been a standard training practice to regularize the complexity of networks. In this paper, we show that a family of regularizers, including weight decay, is ineffective at penalizing the intrinsic norms of weights for networks with positively homogeneous activation functions, such as linear, ReLU and max-pooling functions. As a result of homogeneity, functions specified by the networks are invariant to the shifting of weight scales between layers. The ineffective regularizers are sensitive to such shifting and thus poorly regularize the model capacity, leading to overfitting. To address this shortcoming, we propose an improved regularizer that is invariant to weight scale shifting and thus effectively constrains the intrinsic norm of a neural network. The derived regularizer is an upper bound for the input gradient of the network so minimizing the improved regularizer also benefits the adversarial robustness. We demonstrate the efficacy of our proposed regularizer on various datasets and neural network architectures at improving generalization and adversarial robustness.
Original languageEnglish
Number of pages14
Publication statusPublished - Jul 2021
Event2021 Workshop on Adversarial Machine Learning (ICML 2021) -
Duration: 24 Jul 202124 Jul 2021
https://advml-workshop.github.io/icml2021/

Conference

Conference2021 Workshop on Adversarial Machine Learning (ICML 2021)
Period24/07/2124/07/21
Internet address

Research Keywords

  • Generalization
  • Adversarial Robustness
  • Regularization

Fingerprint

Dive into the research topics of 'Improve Generalization and Robustness of Neural Networks via Weight Scale Shifting Invariant Regularizations'. Together they form a unique fingerprint.

Cite this