In statistical modelling, data are frequently deficient in a variety of ways. One major deficiency arises
when certain observations are completely unavailable. Another deficiency commonly encountered in
practice, especially with survival data, is the data sample’s being length-biased, meaning that the
probability of selecting an observation from the population is proportional to the time from the initiation
event to the failure event. The main interests of this project are to combine quantile regression (QR)
analysis with the above mentioned forms of deficient data. These topics are not well-developed in the
existing literature.
In spite of the arsenal of tools that are available for mitigating the effects of missing data, the
preponderance of these tools emphasizes conditional mean estimation. In contrast, few studies have
examined the case when the goal is to estimate conditional quantiles. This is unfortunate because QR is
now an indispensable tool for statistical research. Remarkably, the handful of methods available for
handling missing data in QR all have the common weakness of being incapable of treating unobserved
response and covariates simultaneously. The majority of these methods also rely on the strong
assumption of identically distributed errors, and many of them do not have sound theoretical
underpinnings. Obviously, these shortcomings represent severe limitations of the methods’ usefulness. A
key reason for the absence of a “quantile counterpart” to the many techniques developed for conditional
mean modeling with missing data is that QRs are based on non-smooth criterion functions. This latter
feature poses significant technical challenges when one attempts to establish optimality properties of the
methods in the quantile setting.
The proposed study considers as a primary focus three methods for handling missing data in a QR setting.
These include the inverse probability weighting (IPW) method, estimating equation projection (EEP)
method and a combination of both. Some of these approaches have been developed for conditional mean
modelling only very recently and it is by no means trivial to extend them to QR contexts due to the
aforementioned technical challenges. We have spent considerable time and efforts exploring these
technical issues and found that they may be overcome using techniques that draw on results of empirical
process. Moreover, our proposed methods can handle missing data in the response and/or the covariates,
and they possess optimal properties even under non-identical error terms. We think these results
represent fairly significant advances and decide to apply for funding to complete the work. There is also a
need to develop a resampling method for obtaining the asymptotic variances of estimators. The requested
amount is mainly for the recruitment of a research assistant to carry out simulation and real data analysis.
The second focus area of this study is on the treatment of length-biased data in QR modelling. The studies
on modelling conditional quantiles of survival times have proliferated in recent years, but few studies
consider data length-biasedness in conjunction and those that do all assume a linear functional relationship
between the survival time and covariates. We propose to use a varying-coefficient QR approach. One
important strength of our approach, besides the usual flexibility offered by the varying coefficients, is that
we need not assume the censoring variable being distributed independently of the covariates or failure
times. We have already worked out some of the key theoretical properties of the method but have yet to
complete the numerical analysis. We combine this second topic with the first on missing data in one single
project because they two constitute a cohesive group in that we can derive the properties of the QR
estimators using similar analytic and computational techniques.