Multilevel models for categorical data in generalized linear mixed models


Student thesis: Doctoral Thesis

View graph of relations


  • Moon Tong CHAN

Related Research Unit(s)


Awarding Institution
Award date15 Jul 2014


Multilevel models for categorical data in generalized linear mixed models (GLMM) are considered and discussed in this thesis. The principal purpose of this thesis is methodological: establishing new methods of model formulation with suitable parameter estimation. Individuals from the same clusters tend to be more alike according to certain characteristics than individuals chosen from the entire population. To analyze data collected from such clustered subjects, we may resort to using multilevel GLMM. Observations obtained from the same clusters are usually correlated. Ignorance of these within-cluster variations can lead to false associations and misleading inferences. To account for the inherent dependencies among observations, random e¤ects are needed to be incorporated into the linear predictor to explain within-cluster variation. There are many situations in which categorical data appear in multilevel structures such as the two datasets considered in this thesis. For the rst dataset, respondents with ordinal responses are nested within the same districts, whereas for the second one, respondents with nominal responses are at level 1, regions at level 2, and countries at level 3, with regions nested within countries. In this thesis, two models are considered and developed: a multilevel cumulative logistic regression model with random e¤ects, and a multilevel multinomial logit model with random e¤ects. For these two models, two sets of random e¤ects are required to explain the within-cluster variation. To illustrate the former model, an empirical study is conducted on an opinion dataset with ordinal responses. For the latter model, another dataset with nominal responses is adopted to illustrate model building and parameter estimation. There exist various parameter estimation methods for multilevel GLMM. This thesis presents the use of the penalized likelihood approach. For both models, the parameter estimation of multilevel GLMM can be achieved by, rst, using the best linear unbiased prediction (BLUP) -type log-likelihood function and then extending to obtain the restricted maximum likelihood (REML) estimates for the variance component parameters. The estimation algorithm uses the Newton-Raphson iterative equations. Comparisons between the results of our two models and those of similar studies previously performed on the same datasets show that they generally agree well. Monte Carlo simulation studies are conducted to evaluate the performance of estimators under known and controlled situations. Generally, unbiased estimates for xed e¤ects and variance component parameters are obtained. Furthermore, the performance of the standard error estimates is satisfactory, indicating that the proposed estimation methods work very well. Some further discussion and concluding remarks on the postulated models as well as recommendations for future research are given. In the literature, there are other modeling and parameter estimation methods available for multilevel GLMM for categorical data. Our major contribution in this thesis is to provide an alternative modeling approach and estimation method based on the conditional penalized likelihood to analyze categorical data in a multilevel context.

    Research areas

  • Multilevel models (Statistics), Multivariate analysis, Linear models (Statistics)