Statistical Methods for Effective Personal Health Monitoring and Public Health Surveillance


Student thesis: Doctoral Thesis

View graph of relations


Related Research Unit(s)


Awarding Institution
  • Kwok Leung TSUI (Co-supervisor)
  • Xinyue LI (Supervisor)
Award date4 May 2022


Health surveillance is crucial for populations and individuals because the surveillance system contributes to better control and prevention of diseases. The surveillance system in public health and individual health monitoring helps detect signs of health outcomes and facilitates early intervention. Through epidemiological data collection and multiple sources of data integration, health professionals are able to set their priorities and develop targeted decisions to reverse the disease epidemic. A surveillance system with extensive health records can better monitor individual health, predict possible adverse events and disease outcomes, and allow timely intervention. This dissertation presents effective statistical methods for public and personal health surveillance to enhance human health. Our proposed public health surveillance system integrates timely meteorological and Internet-based data and enables accurate detection of trends, smoothness, and magnitude during influenza seasons. Our statistical modeling approaches in personal health surveillance fully utilize information from individual health histories to achieve accurate health status prediction, estimation, and statistical inference.

In public health surveillance, the collection of influenza data aims to establish the ongoing systematic collection, analysis, and interpretation of data responsible for preventing and controlling influenza outbreaks. In public health surveillance on the influenza data, individual learning approaches, such as the time series model, including the autoregressive integrated moving average (ARIMA) model, and the regression methods (e.g., the principal component linear regression, the least absolute shrinkage and the selection operator (LASSO), and elastic-net) play essential roles in solving forecasting epidemics. It has known that there is no single method that performs best on all types of data because of their unique advantages and limitations. Under this circumstance, there has been an increasing interest in ensemble learning approaches to data fusion and model assimilation for achieving better forecasting performance, typically by the Bayesian model averaging. This work focuses on the statistical properties and theoretical issues of Bayesian model averaging (BMA) around the forecasting problem. Additionally, we conduct a comparative study among various forecasting methods using the influenza-like-illness in general outpatient clinics (ILI-GOPC) data in the countrywide and citywide data sources.

With the widespread use of electronic health records (EHR) in the surveillance system, public health agencies have been enabled to access clinical data for personal health monitoring in the past two decades. The data collection and statistical modeling issues based on integrating multiple data sources in the surveillance system have gradually attracted research hotspots. EHRs collected individual clinical health histories, which typically included survival data, recurrent health care events, and longitudinal health outcomes in the long-term follow-up study. We applied visualization technologies to align longitudinal health outcomes with individual EHRs chronologically. We conducted linear mixed-effects (LME) regression to detect prognostic factors of longitudinal health outcomes among socio-demographics, disease conditions, and features extracted from EHRs. The proposed visualization approach and the LME model estimation can help trace older adults' functional status changes and identify the influencing factors. The constructed long-term surveillance system provides reference data in clinical practice and helps healthcare providers manage the time, cost, data, and human resources in community-dwelling settings.

To further investigate the complicated effect and harness the complex data structure among different components in individual EHRs, we establish hierarchical health surveillance for personal health monitoring. The hierarchical personal health surveillance aims to depict the progression of the health event process in EHRs, adopting delicate statistical models for survival, longitudinal, and event process. We depict the progression of health events in EHRs and the trajectory of functional status and simultaneously reveal their direct and indirect effects on mortality risks. The modeling challenge comes from the different resolutions in the time domain, which leads to the drill-up and drill-down problems in statistical modeling. Each individual has daily records of the occurrence of the health events versus the annual assessments for the longitudinal observations of functional status, making the modeling integration in a complicated data structure. To address those problems, we proposed a hierarchical joint modeling framework using the Cox proportional hazards regression model, the linear mixed effect model, and the Non-homogeneous Poisson Process (NHPP) model with Weibull intensity on the individual EHRs. The power parameter of the NHPP model presented a growing intensity of the health event process among community-dwelling older adults. The mixed effect coefficients in the LME models and the fixed effects in all sub-models depicted the direct and indirect effects between health event processes in EHRs, longitudinal observations of functional status, and survival times.