A Limitation of Gradient Descent Learning

John Sum*, Chi-Sing Leung, Kevin Ho

*Corresponding author for this work

Research output: Journal Publications and ReviewsRGC 21 - Publication in refereed journalpeer-review

13 Citations (Scopus)

Abstract

Over decades, gradient descent has been applied to develop learning algorithm to train a neural network (NN). In this brief, a limitation of applying such algorithm to train an NN with persistent weight noise is revealed. Let V (w) be the performance measure of an ideal NN. V (w) is applied to develop the gradient descent learning (GDL). With weight noise, the desired performance measure (denoted as (w)) is E [(w˜)|w], where w is the noisy weight vector. Applying GDL to ˜ train an NN with weight noise, the actual learning objective is clearly not (w) but another scalar function L(w). For decades, there is a misconception that(w) = (w), and hence, the actual model attained by the GDL is the desired model. However, we show that it might not: 1) with persistent additive weight noise, the actual model attained is the desired model as (w) = (w); and 2) with persistent multiplicative weight noise, the actual model attained is unlikely the desired model as (w) ≠ (w). Accordingly, the properties of the models attained as compared with the desired models are analyzed and the learning curves are sketched. Simulation results on 1) a simple regression problem and 2) the MNIST handwritten digit recognition are presented to support our claims.
Original languageEnglish
Article number8789696
Pages (from-to)2227-2232
JournalIEEE Transactions on Neural Networks and Learning Systems
Volume31
Issue number6
Online published6 Aug 2019
DOIs
Publication statusPublished - Jun 2020

Research Keywords

  • Additive weight noise
  • gradient descent algorithms
  • MNIST
  • multiplicative weight noise

Fingerprint

Dive into the research topics of 'A Limitation of Gradient Descent Learning'. Together they form a unique fingerprint.

Cite this