Statistical Inference and Applications of Semiparametric Models with Auxiliary Information


Student thesis: Doctoral Thesis

View graph of relations


Related Research Unit(s)


Awarding Institution
Award date12 Jul 2019


Since large data sets are becoming increasingly available for research purposes, there is a growing need for statistical framework to synthesize information from different sources. By taking into account information that frequently arises in survival analysis, or incorporating the published summary information from external data sources, the estimating procedure could be more efficient. Meanwhile, complex data (such as right-censored data, left-truncated data, missing data, longitudinal data and so on) is one of the forefront of modern statistics and hot issues. In this thesis, we mainly focus on the semiparametric inference taking into account auxiliary information for right-censored data, which is frequently encountered in survival analysis, under the framework of varying coefficient partially linear transformation model, additive hazard model and general additive-multiplicative hazard model.

In chapter 1 we first state the background of our research. Subsequently, we give a detailed description of different data types that are involved in this thesis, they are right-censored data and biased-sampling data, which takes the length-biased data as a special case. In this chapter, we also give an instruction of the three models that will be studied later, namely, the varying coefficient partially linear transformation model, the additive hazard model and the general additive-multiplicative hazard model. Current research development based on these models is also presented. Finally in this chapter, we give a brief discussion about the auxiliary information used in our estimating procedure.

The main body of this thesis contains three works, which are arranged in chapters 2-4. Prevalent cohort studies in medical research often give rise to length-biased survival data that require special treatments. The recently proposed varying-coefficient partially linear transformation (VCPLT) model, as an extension of the partially linear transformation (PLT) model, has the virtue of providing a more dynamic content of the effects of the covariates on survival times than the PLT model by allowing flexible interactions between the covariates. However, the existing analysis of the VCPLT model does not account for length-biased sampling, which restricts its application. In chapter 2, we consider the VCPLT model when the data are length-biased and right censored, thereby extending the reach of this flexible and powerful tool. By taking into account auxiliary information that frequently arises in survival analysis, we propose a martingale estimating function-based approach to estimate this model, provide theoretical underpinnings, evaluate finite sample performance via simulations, and showcase its practical appeal via an empirical application using data from two HIV vaccine clinical trials conducted by the U.S. National Institute of Allergy and Infectious Diseases. More details can refer to chapter 2.

Chapter 3 presents an approach to improve efficiency in estimating the survival time distribution by incorporating the published information of t-year survival probabilities from external sources under additive hazard model. To combine with the published information, we propose to summarize the external survival information via a system of nonlinear population moments, then estimate the survival time model using empirical likelihood method. The proposed method is flexible enough since it can allow for a different survival time model in the aggregate data. The resulting regression coefficients estimates are shown to be consistent and asymptotic normal with easily estimated variance-covariance matrix. Simulation studies show that the proposed approach performs very well. An application to the Breast Cancer Data from the SEER Research Data is also given to illustrate the methodology.

General additive-multiplicative hazard model is a natural extension of additive-multiplicative hazard model in survival analysis. In chapter 4, we study the general additive-multiplicative hazard model by incorporating the quantile auxiliary subgroup information. We formulate the known auxiliary information in the form of estimating equations, then combine them with the conventional score-type estimating equations based on the maximum empirical likelihood method. The estimators are established to be consistent and asymptotically normal. The large sample property of the Breslow estimator for the baseline cumulative hazard function is also established. Simulation studies show that our proposal gains more efficiency than the conventional one in terms of standard error. By synthesizing more informative information, our proposed estimator becomes more competing. Finally, a Breast Cancer data example is analyzed to show the practical utility of our estimating procedure.

Finally in chapter 5, we conclude this thesis by summarizing both the contributions and limitations of our work. Based on these discussions, we point out some directions that may be worth for future work.