Forecasting Methods in Health Applications under Big Data Environment


Student thesis: Doctoral Thesis

View graph of relations



Awarding Institution
Award date21 Nov 2017


Health forecasting is predicting health situations or disease episodes and forewarning future events. Its applications can be extended to a wide range of fields, like the risk factors of diseases, the effectiveness of therapies , the duration of surgeries, the attendance of emergency department, and the outbreak of infectious diseases. The target of this thesis is to develop credible forecasts to support policy decisions that improve population health and reduce health disparities.

Overcrowding is a worldwide problem in emergency departments (ED) which
causes negative effects on both patients and health care providers. An accurate forecast of patient visits in ED is one of the key issues. A hybrid autoregressive integrated moving average linear regression (ARIMALR) approach, which combines ARIMA and LR in a sequential manner, is developed. A smoothing process is introduced to reduce the interference by outliers. Data from two EDs in Dalian, China, are used to compare the forecasting performance between the hyrid model and other existing models. The result shows that the proposed hybrid model outperforms generalized linear model (GLM), ARIMA, ARIMA with explanatory variables (ARIMAX), and ARIMA-artificial neural network (ANN) hybrid model.

Infectious diseases cause millions of severe morbidity deaths annually which leads to great burdens on society. Accurate forecasts of epidemics and pandemics could leave enough time for creating strategies and implementing interventions. We investigate the predictive utility of big data, particularly, Internet search data, to forecast influenza and hand-foot-mouth disease (HFMD) in the thesis.

The first application is to forecast new cases of influenza-like-illness (ILI) in general outpatient clinics (GOPC) in Hong Kong. To mitigate the impact of sensitivity to self-excitement and other artifacts of online Google search data, we fuse multiple offline and online data sources in this case. Four individual models: GLM, least absolute shrinkage and selection operator (LASSO), ARIMA, and deep learning (DL) with Feedforward Neural Networks (FNN) are employed to forecast ILI-GOPC both one week and two weeks in advance. The covariates include Google search queries, meteorological data, and previously recorded offline ILI. The result suggests that DL outperforms other models in terms of forecasting accuracy. Furthermore, to harness power of individual forecasting models, we use Bayesian model averaging (BMA), which allows a systematic integration of multiple forecast scenarios. The result demonstrates the superiority of BMA in forecasting ILI curves.

The other application is to use Baidu index to nowcast hand, foot, and mouth
disease (HFMD) in Guangxi, Zhejiang, Henan provinces and the whole China. Four
statistical models: ARIMA, principle component analysis (PCA), ridge regression
(RR), and LASSO are used to nowcast the monthly incidence of HFMD. The forecasting performance of the four single models is inconsistent in different cases and no best model can be recognized. Therefore, we develop a self-learning meta-learning method to automatically select the best model based on the statistical and time series meta-features. The results show that the meta-learning approach is capable of adapting to a wide variety of data, and selects the best model and provides the excellent performance in terms of nowcasting accuracy.

    Research areas

  • Forecasting, Emergency department, Infectious diseases