Skip to main navigation Skip to search Skip to main content

Statistical Inference and Applications of the Semiparametric Models under Biased-Sampling Data

  • Wenhua WEI

    Student thesis: Doctoral Thesis

    Abstract

    Over the past century, the rapid developments in probability theory and computer science have significantly enhanced the field of modern statistics, both theoretically and practically. These days, statistical conceptions, methods and models are widely applied in various fields to analyse data and make inferences, such as in clinical trials, finance, insurance, engineering and social analysis. Integrated statistical analysis procedures generally comprise three steps: the collection of data, data analysis using various statistical tools and the interpretation of the results and subsequent application. In this thesis, we mainly concentrate on the semiparametric estimation and inference for right-censored and biased-sampling data, which are frequently encountered in survival analysis. This is done under the framework of the density ratio model, the linear transformation model and the partially linear transformation model. We further illustrate their application using real data examples.
    Chapter 1 introduces the research background. We also provide a detailed description of the different data types examined in this thesis; namely, right-censored, length-biased and general biased-sampling data, which takes the length-biased data as a special case. Each data type is analysed with reference to the relevant literature and typical real data sets. Chapter 1 also introduces the three models examined in the thesis; namely, the density ratio, linear transformation and partially linear transformation models. The relevant academic research results based on these models are also listed.
    The main body of this thesis comprise three studies, which are presented in chapters 2-4. In clinical drug trials, researchers often compare the therapeutic effects of two or more drugs. Motivated by this practical problem, we investigate a semiparametric two-sample density ratio model based on two groups of right-censored data, which are conventionally found in clinical trials due to the loss of follow-up. We propose a semiparametric maximum likelihood estimator for the unknown parameters and obtain the estimator using an EM algorithm. The uniform consistency and asymptotic normality of the proposed estimator are established using empirical process theory. In addition, we use a Kolmogorov-Smirnov-type test statistic to assess the model validity and a likelihood ratio test statistic to examine the treatment effects between two groups. The proposed estimator addresses the aforementioned practical problem well, and we evaluate its finite sample performance via simulation studies, then compare it with other alternative methods. We also use the proposed estimator to compare the therapeutic efficacy of two drugs with respect to primary biliary cirrhosis of the liver. For more details, refer to chapter 2.
    In statistics, regression models are commonly used to explore the relationships between the response variable and the independent variables. In chapter 3, we focus on the covariate effects on samples that are biased and right-censored. The semiparametric linear transformation model is a useful alternative to the classical proportional hazards and proportional odds models for studying the dependency of the survival time on the covariates. Hence, we adopt this model as the basis of our analysis. We develop an unbiased estimating equations approach based on counting processes for the simultaneous estimation of unknown coefficients and the handling of sampling biases. Under some mild conditions, we establish the consistency and asymptotic normality of the proposed estimator and derive a closed form expression for the estimator’s covariance matrix, which can be consistently estimated by a plug-in method. The simulation studies demonstrate the good finite sample properties of the proposed estimator, and comparisons are drawn with an existing method that does not adjust for sampling bias. Furthermore, we illustrate the proposed method by analysing two biased-sampling and right-censored real clinical data sets.
    Length-biased data, which can be regarded as a special case of the general biased-sampling data we consider in chapter 3, frequently arise in practice and have been extensively researched. However, the research on length-biased data mainly focuses on the nonparametric estimation of the unbiased distribution function or the estimation of the linear covariate effects under various regression models. In contrast, in chapter 4, we use a partially linear transformation model for length-biased and right-censored data to account for both the linear and nonlinear covariate effects on survival time. We adopt the local linear fitting technique and develop global and local unbiased estimating equations for the simultaneous es- timation of the unknown covariate effects, which are implemented by an iterative computational algorithm. Under several conditions and with appropriate choices of the bandwidth parameters, the estimator for the parametric component is found to be root-n consistent and asymptotically normal. Moreover, the estimator for the nonparametric component achieves the general rates of convergence in nonparametric regression. We suggest estimating the standard deviation of the proposed estimator via a bootstrap resampling method. We conduct simulation studies and apply the model to the Oscar nomination data set to demonstrate the performance of the proposed estimator under finite sample situations and practical problems.
    Finally, chapter 5 concludes this thesis by summarising the contributions and limitations of our research. Based on this discussion, some directions for future research are presented.
    Date of Award26 Jul 2016
    Original languageEnglish
    Awarding Institution
    • City University of Hong Kong
    SupervisorTze-Kin Alan WAN (Supervisor)

    Cite this

    '