Kernel-based Variable Selection

Student thesis: Doctoral Thesis

Abstract

With the development of modern technology, it is much easier to collect a large amount of data with unprecedented size and complexity at relatively low cost. High-dimensional data analysis has attracted tremendous interests from both researchers and practitioners, because of its appearance in many real applications, including microarrays, proteomics, imaging data analysis, functional data and high-frequency financial data and so on. In those case, the number of variables collected increase even at some exponential rate of the sample's size. Many classical statistical methods may lose their power when dealing with the high-dimensional data. For example, the traditional subset selection method is computationally infeasible and expensive to analyze this kind of dataset and the `curse of dimensionality' problem appears, which may seriously destroy the performance of some distance based methods such as the k-nearest neighborhood. Many attempts have been made to solve the difficulties by assuming the sparse modeling. In the sparse modeling, it is generally believed that only a small number of collected variables are truly informative and useful for analysis. Then, an efficient variable selection procedure aiming to correctly identify the truly informative variables plays a crucial rule for the subsequent statistical analysis. In this thesis, we first give a detailed review of some existing variable selection frameworks in literature, including the regularization framework as well as the sparse induced penalty terms, the sure screening framework, the measurement error framework, the knock-off filter framework and so on. We also discuss the potential drawbacks and difficulties of all the introduced methods.

After the literature review, we start to introduce our contributions in the relevant research area. A model-free gradient learning method is proposed to exploit the conditional distribution via fitting composite quantile regressions. The proposed model-free method is able to identify all the informative variables acting on the conditional distribution without any prior model specification. It is motivated by the observation that a non-informative variable can be identified via the conditional independence between the response and the variable given all the other variables. More importantly, the conditional dependence can be characterized by estimating the profiles of the conditional distribution via the composite quantile function. This observation motivates the proposed method established in a framework of learning sparse gradient functions in a reproducing kernel Hilbert space (RKHS) and brings the dawn of relaxing restrictive model assumptions. The proposed method is implemented via an efficient computing algorithm, which couples the majorize-minimization (MM) algorithm and the proximal gradient descent algorithm. Its asymptotic estimation and variable selection consistencies are established without any explicit model assumption, which assure that the truly informative variables are correctly identified with probability tending to 1. The effectiveness of the proposed method is also supported by a variety of simulated and real-life examples.

It is interesting to point out that the proposed gradient learning method has several potential drawbacks, including it is highly computational demanding since local pair-wise learning framework is applied to learn the multiple quantile functions simultaneously, and it cannot allow the dimension diverging because of the complexity of nonparametric learning. Ideally, a good variable selection method should be efficient, flexible, scalable and with theoretical guarantee. However, the most existing methods in literature cannot achieve these three properties simultaneously. Motivated by the observation of the derivative reproducing property that the gradient function in a proper RKHS can be bounded by its K-norm up to some constant. In other words, if we want to estimate the gradient function within a RKHS, it suffices to estimate the regression function itself without loss of any information. Note that this observation is crucial and allows us to avoid the highly computational demanding step for estimating the gradient functions. Instead, only a good estimator of the regression function is required. Therefore, a three-step variable selection method is proposed, involving the estimation of the regression function in a RKHS and the computation of the corresponding gradient functions as well as a hard thresholding step. Note that key advantage of the scalable kernel-based method is that it assumes no explicit model assumption, admits general predictor effects, allows for scalable computation, and attains desirable asymptotic sparsistency. Some special case and extension of the scalable kernel-based method are also provided, such as the interaction selection and the special case for linear model. The theoretical results, including the estimation and selection consistencies of all the mentioned methods are provided in this thesis as well as their numerical performance.

At the end, a conclusion including a short summary and a brief future work introduction are provided. In the future work, we introduce some extension of the learning gradient method to the nonparametric interaction selection and the identification of structure in a partial linear model. We also introduce an extension of the general kernel-based variable selection method by allowing a family of loss functions, including the most popular ones in literature.
Date of Award6 Aug 2018
Original languageEnglish
Awarding Institution
  • City University of Hong Kong
SupervisorJunhui WANG (Supervisor)

Cite this

'