TY - JOUR
T1 - Improved analysis of supervised learning in the RKHS with random features
T2 - Beyond least squares
AU - Liu, Jiamin
AU - Wang, Lei
AU - Lian, Heng
PY - 2025/4
Y1 - 2025/4
N2 - We consider kernel-based supervised learning using random Fourier features, focusing on its statistical error bounds and generalization properties with general loss functions. Beyond the least squares loss, existing results only demonstrate worst-case analysis with rate n−1/2 and the number of features at least comparable to n, and refined-case analysis where it can achieve almost n−1 rate when the kernel's eigenvalue decay is exponential and the number of features is again at least comparable to n. For the least squares loss, the results are much richer and the optimal rates can be achieved under the source and capacity assumptions, with the number of features smaller than n. In this paper, for both losses with Lipschitz derivative and Lipschitz losses, we successfully establish faster rates with number of features much smaller than n, which are the same as the rates and number of features for the least squares loss. More specifically, in the attainable case (the true function is in the RKHS), we obtain the rate n−(2ξ/2ξ+γ) which is the same as the standard method without using approximation, using o(n) features, where ξ characterizes the smoothness of the true function and γ characterizes the decay rate of the eigenvalues of the integral operator. Thus our results answer an important open question regarding random features. © 2025 Elsevier Ltd
AB - We consider kernel-based supervised learning using random Fourier features, focusing on its statistical error bounds and generalization properties with general loss functions. Beyond the least squares loss, existing results only demonstrate worst-case analysis with rate n−1/2 and the number of features at least comparable to n, and refined-case analysis where it can achieve almost n−1 rate when the kernel's eigenvalue decay is exponential and the number of features is again at least comparable to n. For the least squares loss, the results are much richer and the optimal rates can be achieved under the source and capacity assumptions, with the number of features smaller than n. In this paper, for both losses with Lipschitz derivative and Lipschitz losses, we successfully establish faster rates with number of features much smaller than n, which are the same as the rates and number of features for the least squares loss. More specifically, in the attainable case (the true function is in the RKHS), we obtain the rate n−(2ξ/2ξ+γ) which is the same as the standard method without using approximation, using o(n) features, where ξ characterizes the smoothness of the true function and γ characterizes the decay rate of the eigenvalues of the integral operator. Thus our results answer an important open question regarding random features. © 2025 Elsevier Ltd
KW - Logistic regression
KW - Quantile regression
KW - Regression and classification
KW - Reproducing kernel Hilbert space
KW - Source and capacity conditions
UR - http://www.scopus.com/inward/record.url?scp=85214677585&partnerID=8YFLogxK
UR - https://www.scopus.com/record/pubmetrics.uri?eid=2-s2.0-85214677585&origin=recordpage
U2 - 10.1016/j.neunet.2024.107091
DO - 10.1016/j.neunet.2024.107091
M3 - RGC 21 - Publication in refereed journal
SN - 0893-6080
VL - 184
JO - Neural Networks
JF - Neural Networks
M1 - 107091
ER -