Abstract
Deep neural networks (DNNs) trained with the logistic loss have achieved impressive success in various binary classification tasks. However, generalization analysis accounting for this success remains scarce. Existing generalization analysis either only gives slow convergence rates or can only be carried out under some restrictive assumptions. The unboundedness of the target function for the logistic loss in binary classification is the main obstacle to deriving satisfactory generalization bounds because common techniques such as those based on Bernstein's inequality may not be applicable to unbounded cases.This work aims to fill this gap by providing a novel theoretical framework of generalization analysis. This framework is based on a new oracle inequality which enables us to deal with the boundedness restriction of the target function. By using this new oracle inequality, this work establishes sharp generalization bounds for fully connected ReLU DNN classifiers ƒNNl,n trained with logistic loss, which lead to optimal (up to log terms) rates of convergence as n→∞, under a compositional assumption that requires the conditional class probability function η of the data distribution P to be the composition of several functions which are essentially defined on low-dimensional spaces. It is also shown in this work that if η is further assumed to take values near some finite number of isolated points in [0,1], then ƒNNl,n can achieve fast (and optimal up to log terms) convergence rates of the form O(logθ n/n). Moreover, this work establishes optimal (up to log terms) convergence rates of the misclassification error of fully connected ReLU DNN classifiers trained with hinge loss, under the Tsybakov noise condition along with the compositional assumption on η mentioned above, by using the aforementioned new oracle inequality as well. All the derived rates are free from the input dimension, meaning that these results can explain why DNN classifiers can overcome the curse of dimensionality and perform very well in high-dimensional classification problems in practice. All the claims for the optimality of the convergence rates in this work are justified by corresponding minimax lower bounds, which are new in literature and constitute a main result of this work. Another contribution of this work lies in the careful consideration of measurability. Probable non-measurable cases are treated by using outer measures and outer integrals, and a minimax lower bound theory for non-measurable maps is developed, which broadens the applicability of the classical minimax lower bound theory.
The significance of this work primarily resides in two aspects. Firstly, the derived convergence rates of DNN classifiers deepen our theoretical understanding of deep learning and may provide insights for practical application of DNNs. Secondly, the theoretical framework developed in this work via which the convergence rates are derived is of independent importance. It is very general and can be applied in other settings to obtain tight generalization bounds and minimax lower bounds. Based on this novel and powerful framework, several future studies can be conducted, e.g., deriving tight generalization bounds for (binary or multi-class) classification with other loss functions/over other hypothesis spaces/in other (finite-dimensional or infinite-dimensional) input spaces.
| Date of Award | 10 Jul 2025 |
|---|---|
| Original language | English |
| Awarding Institution |
|
| Supervisor | Xiang ZHOU (Supervisor), Dingxuan ZHOU (Supervisor) & Lei Shi (External Supervisor) |