Advances in Censored Quantile Regression: Time-dependent Covariates and Big Data Setup

Project: Research

View graph of relations

Description

Quantile regression (QR) originated from econometrics in the seminal work of Koenker and Bassett (1978). In contrast to linear regression concerning the conditional mean, QR and its variant for right-censored time-to-event data, known as censored quantile regression (CQR), study the conditional quantile of the response given the covariates. QR has been broadly adopted in various econometric and statistical contexts, including labor economics and credit analysis, because of its robustness against error heterogeneity by allowing quantile-dependent covariate effects and its versatility to capture the entire conditional distribution by varying the quantile level of interest.In duration analysis, also known as survival or time-to-event analysis, the challenges of applying CQR techniques are two-fold. First, time-dependent covariates are ubiquitous in follow-up studies as data are collected over time, such as debt repayment records and interest rate history in a study of time to default. While existing literature is mostly built on timeindependent covariates, extension to the time-dependent counterpart is complicated as the failure time counting process is timed in terms of the individual quantile process. Meanwhile, massive data are more prevalent in various business contexts in the big data era, which poses enormous computational burdens on classical methods for small data. Indeed, CQR is computationally more challenging than QR as the latter is a convex optimization problem that can be implemented via linear programming while the censored version is not.We aim to extend CQR techniques in two aspects via this proposal. The first idea is a continuation of our previous work (Chu et al., 2023), where we studied the treatment of timedependent covariates in CQR. We now introduce a QR model to incorporate time-dependent covariates under the competing risks framework when the subjects are vulnerable to several dependent risk causes of failure. We propose a recursive estimator for the regression parameter as a process of the quantile level. The second direction is to study the optimal subsampling procedure for CQR so that massive data can be handled with limited computing resources. We derive the optimal subsampling scheme under which the asymptotic mean squared error of the subsample estimator is minimized. This offers a practical solution for big data analysis by extracting information optimally from the sizable full data via subsampling. We shall study numerical procedures that are practically feasible in both projects. Software packages will be developed for business analysts, econometricians, and statisticians to access and implement the proposed methods.

Detail(s)

Project number9048313
Grant typeECS
StatusNot started
Effective start/end date1/01/25 → …