Functional Neural Networks
函數型神經網絡
Student thesis: Doctoral Thesis
Author(s)
Related Research Unit(s)
Detail(s)
Awarding Institution  

Supervisors/Advisors 

Award date  28 Dec 2023 
Link(s)
Permanent Link  https://scholars.cityu.edu.hk/en/theses/theses(044efc96553147f882e185e5a869c486).html 

Other link(s)  Links 
Abstract
Learning functionals whose domain is an infinite dimensional function space is a significant learning task in broad applications. In the prediction problem of functional data analysis, the task is to find a relation from a function covariate to a scalar response, and this relation can be viewed as a nonlinear functional. In solving partial differential equations (PDEs), the map from the initial or boundary condition to the value of the solution function at some fixed point is a nonlinear functional intrinsically. In system identification problems, one has to model the hidden states of a nonlinear system based on observations of functional input signals and scalar output signals, the model is thus an unknown nonlinear functional. Beyond these, more instances that involve learning functionals can be found in the fields of phase retrieval, reducedorder modeling, image processing, inverse problems and so on.
As a powerful tool of nonparametric estimation, deep learning based on deep neural networks has achieved unquestionable success in diverse fields including science, business, industry and many others. It is wellknown that deep neural networks excel at learning nonlinear mappings between finitedimensional spaces. Given this, it is reasonable to inquire whether neural networks can also be employed to learn nonlinear functionals defined on an infinite dimensional space. In this article, we aim to answer this question through theoretical analysis under the framework of learning theory. The details are as follows:
(1) We propose a network structure called "functional prediscretizing network" to approximate general continuous functionals that are characterized by moduli of continuity. By using this structure, we give an upper bound on the approximation error in terms of the total number of weights used in the neural network. Besides, an upper error bound in terms of the depth and width of the neural network is also presented. Moreover, we establish nearly optimal rates when measuring approximation error on the unit ball of a Hölder space, and established nearly polynomial rates (i.e., rates of the form exp(a(log M)^{b}) with a>0,0<b<1) when measuring the approximation error on a space of analytic functions.
(2) We design a novel network structure named "functional rediscretizing network" to approximate smooth functionals with high order Fréchet derivatives. The motivation of designing this new structure is to utilize the smoothness property. Quantitative rates of approximation in terms of the depth, width and total number of weights of neural networks are derived under this new structure. Furthermore, we establish improved rates for approximating smooth functionals on some specific compact sets, compared with those for approximating general continuous functionals. Based on these approximation results, we find that both the smoothness of the target functional and that of the functions in the domain play important roles in improving the approximation abilities.
(3) We investigate a generalization analysis for functional learning by using these two functional neural network structures mentioned above. To be specific, we propose a learning algorithm through empirical risk minimization and prove its universal consistency. Besides, we establish generalization error bounds, showing a tradeoff between the approximation ability and the capacity measured by covering numbers of the approximants. With the help of the error bounds, we further investigate the learning rates under diverse assumptions on the target functional and the input function space, which provide a guidance for implementing functional prediscretizing networks and functional rediscretizing networks in different situations.
Our article has four major novelties. First, we propose two functional neural network structures that adopt prespecifying discretization maps which do not need to be learned from data. This makes them easier to implement and cheaper to compute compared to structures with weight functions used in the literature. Second, we study neural networks with the rectified linear unit (ReLU) activation function, which is the most commonly used activation function in practical applications due to its ease of computation and resistance to gradient vanishing. Third, we are the first ones to derive improved approximation rates for functionals with some smoothness, instead of a good modulus of continuity. Finally, many generalization analyses in the literature require the weights of networks to be bounded or networks themselves to be bounded, which are not in line with practical applications. In this article, we do not make any restrictions on the magnitude of parameters or neural networks.
As a powerful tool of nonparametric estimation, deep learning based on deep neural networks has achieved unquestionable success in diverse fields including science, business, industry and many others. It is wellknown that deep neural networks excel at learning nonlinear mappings between finitedimensional spaces. Given this, it is reasonable to inquire whether neural networks can also be employed to learn nonlinear functionals defined on an infinite dimensional space. In this article, we aim to answer this question through theoretical analysis under the framework of learning theory. The details are as follows:
(1) We propose a network structure called "functional prediscretizing network" to approximate general continuous functionals that are characterized by moduli of continuity. By using this structure, we give an upper bound on the approximation error in terms of the total number of weights used in the neural network. Besides, an upper error bound in terms of the depth and width of the neural network is also presented. Moreover, we establish nearly optimal rates when measuring approximation error on the unit ball of a Hölder space, and established nearly polynomial rates (i.e., rates of the form exp(a(log M)^{b}) with a>0,0<b<1) when measuring the approximation error on a space of analytic functions.
(2) We design a novel network structure named "functional rediscretizing network" to approximate smooth functionals with high order Fréchet derivatives. The motivation of designing this new structure is to utilize the smoothness property. Quantitative rates of approximation in terms of the depth, width and total number of weights of neural networks are derived under this new structure. Furthermore, we establish improved rates for approximating smooth functionals on some specific compact sets, compared with those for approximating general continuous functionals. Based on these approximation results, we find that both the smoothness of the target functional and that of the functions in the domain play important roles in improving the approximation abilities.
(3) We investigate a generalization analysis for functional learning by using these two functional neural network structures mentioned above. To be specific, we propose a learning algorithm through empirical risk minimization and prove its universal consistency. Besides, we establish generalization error bounds, showing a tradeoff between the approximation ability and the capacity measured by covering numbers of the approximants. With the help of the error bounds, we further investigate the learning rates under diverse assumptions on the target functional and the input function space, which provide a guidance for implementing functional prediscretizing networks and functional rediscretizing networks in different situations.
Our article has four major novelties. First, we propose two functional neural network structures that adopt prespecifying discretization maps which do not need to be learned from data. This makes them easier to implement and cheaper to compute compared to structures with weight functions used in the literature. Second, we study neural networks with the rectified linear unit (ReLU) activation function, which is the most commonly used activation function in practical applications due to its ease of computation and resistance to gradient vanishing. Third, we are the first ones to derive improved approximation rates for functionals with some smoothness, instead of a good modulus of continuity. Finally, many generalization analyses in the literature require the weights of networks to be bounded or networks themselves to be bounded, which are not in line with practical applications. In this article, we do not make any restrictions on the magnitude of parameters or neural networks.
 Nonlinear functionals, deep learning theory, ReLU, functional neural networks, generalization analysis