Quantile Regression Analysis with Missing and Length-biased Data

Project: Research

View graph of relations


In statistical modelling, data are frequently deficient in a variety of ways. One major deficiency arises when certain observations are completely unavailable. Another deficiency commonly encountered in practice, especially with survival data, is the data sample’s being length-biased, meaning that the probability of selecting an observation from the population is proportional to the time from the initiation event to the failure event. The main interests of this project are to combine quantile regression (QR) analysis with the above mentioned forms of deficient data. These topics are not well-developed in the existing literature. In spite of the arsenal of tools that are available for mitigating the effects of missing data, the preponderance of these tools emphasizes conditional mean estimation. In contrast, few studies have examined the case when the goal is to estimate conditional quantiles. This is unfortunate because QR is now an indispensable tool for statistical research. Remarkably, the handful of methods available for handling missing data in QR all have the common weakness of being incapable of treating unobserved response and covariates simultaneously. The majority of these methods also rely on the strong assumption of identically distributed errors, and many of them do not have sound theoretical underpinnings. Obviously, these shortcomings represent severe limitations of the methods’ usefulness. A key reason for the absence of a “quantile counterpart” to the many techniques developed for conditional mean modeling with missing data is that QRs are based on non-smooth criterion functions. This latter feature poses significant technical challenges when one attempts to establish optimality properties of the methods in the quantile setting. The proposed study considers as a primary focus three methods for handling missing data in a QR setting. These include the inverse probability weighting (IPW) method, estimating equation projection (EEP) method and a combination of both. Some of these approaches have been developed for conditional mean modelling only very recently and it is by no means trivial to extend them to QR contexts due to the aforementioned technical challenges. We have spent considerable time and efforts exploring these technical issues and found that they may be overcome using techniques that draw on results of empirical process. Moreover, our proposed methods can handle missing data in the response and/or the covariates, and they possess optimal properties even under non-identical error terms. We think these results represent fairly significant advances and decide to apply for funding to complete the work. There is also a need to develop a resampling method for obtaining the asymptotic variances of estimators. The requested amount is mainly for the recruitment of a research assistant to carry out simulation and real data analysis. The second focus area of this study is on the treatment of length-biased data in QR modelling. The studies on modelling conditional quantiles of survival times have proliferated in recent years, but few studies consider data length-biasedness in conjunction and those that do all assume a linear functional relationship between the survival time and covariates. We propose to use a varying-coefficient QR approach. One important strength of our approach, besides the usual flexibility offered by the varying coefficients, is that we need not assume the censoring variable being distributed independently of the covariates or failure times. We have already worked out some of the key theoretical properties of the method but have yet to complete the numerical analysis. We combine this second topic with the first on missing data in one single project because they two constitute a cohesive group in that we can derive the properties of the QR estimators using similar analytic and computational techniques.


Project number9042086
Grant typeGRF
Effective start/end date1/10/1427/03/17

    Research areas

  • length-biased observations,missing at random,nonsmooth estimating functions,quantile regression,