Data Analytics-based Patient Demand Forecasting for Public Healthcare System


Student thesis: Doctoral Thesis

View graph of relations



Awarding Institution
Award date11 Oct 2017


In most areas, the public healthcare system has played an important role in providing public health and basic medical service to meet the health needs of target populations. As for decision makers and practitioners in these healthcare systems, they continually confront many challenges ranging from limited healthcare resources to growing patient demand. Based on the report of World Health Statistics 2015, both overcrowding and shortage of medical resources are significant challenges in healthcare systems around the world, especially for those developing countries. In mainland China, the healthcare expenditure in reached about 5.4% of the national GDP in 2012. This situation is similar in Hong Kong SAR, in which healthcare expenditure is 5.2% of GDP in 2012, and is projected to increase to 9.2% by 2033. Compared with United Kingdom (9.3%), France (11.6%), Canada (10.9%), and United States (16.9%), the total healthcare expenditures as a ratio of GDP of China continue to lag behind those of the world leading economics. Hence, there is an urgency to optimize the resource allocation and enhance healthcare service, under current resource constraints in such areas.

Over the past few years, big data analytics has been tremendously successful at applications like object recognition and detection, localization, scene classification, risk analysis, demand forecast, etc. With a number of big breakthroughs in research on deep learning and artificial intelligence, the data around is well utilized to improve the quality of our daily life, and then to better learn, reason, and understand the real world. Motivated by “How can we utilize data analytics-based approaches to guide the decision-making process in healthcare system?”, this research aims to model uncertainties of patient demand in three typical departments of public healthcare system through modern machine learning techniques, in order to relieve overcrowding, resource shortage, finance burden and other challenges in healthcare system. Expected results are key references to guide relevant decision makings, such as staff scheduling, bed assignment, etc. In this way, the patient care is improved by converting complex medical data (e.g., patient admission data, data from hospital information system, reviews of physicians, etc.) into actionable knowledge.

Based on different objectives and relevant sub-tasks, research issues under this thesis are categorized as 3 folds. First, an integrated methodology is developed to model daily and hourly patient flow under different severities, by utilizing a deep learning framework. A genetic algorithm (GA)-based feature selection algorithm is implemented for this very demand forecast problem to explore key factors affecting patient flow. A deep neural network (DNN) model is applied as forecast model to utilize its universal adaptability and high flexibility. In the model-training process, the learning algorithm is well-configured and two effective regularization strategies are introduced to avoid overfitting. In case study, this methodology is validated by actual patient admission data collected from an A&ED in Hong Kong. The experimental results demonstrate that the traditional GA-based feature selection process is improved to have less hyper-parameters and higher efficiency, and the feature combination information is maintained by fitness-based crossover operator. The universal property of DNN is further enhanced by merging different regularization strategies. In practice, features selected by our improved GA can be used to acquire an underlying relationship between patient flow and possible contributing variables.

Second, we develop a novel hybrid methodology to forecast the patients’ demand for different bottleneck resources in outpatient department (OPD), by combining a new feature selection method and a deep learning approach. A modified version of genetic algorithm (MGA) is proposed for feature selection. The key operators of normal genetic algorithm are redesigned to extract useful information provided by filter-based feature selection and feature combinations. A feedforward DNN is introduced as the forecast model, and the initial parameter set is generated from a stacked autoencoder-based pre-training process to overcome the optimization challenges in constructing deep architectures. In order to evaluate the performance of our methodology, it is applied to an OPD located at Northeast China. The results are compared with those obtained from combinations of other feature selection methods and demand forecasting models. The combination of MGA and pre-trained DNN possesses strongest predictive power among all involved combinations. Furthermore, elite features obtained by MGA can provide practical insights on potential association between manifold feature combinations and demand variance.

Third, we develop a robust and accurate risk prediction framework for unplanned hospital readmission with different time windows, by combining feature selection algorithms and machine learning models. As for feature selection, an enhanced version of multi-objective bare-bones PSO (EMOBPSO) is developed as the principal search strategy, and a new mutual information (MI)-based criterion is proposed to estimate the feature relevancy and feature redundancy efficiently. A greedy local search strategy (GLS) is developed and merged into EMOBPSO to control the final feature subset size as desired. As for modeling process, manifold machine learning models like support vector machine (SVM), random forest (RF), and DNN are trained with preprocessed datasets and corresponding feature subset, and the model performance is compared to explore the best combination between feature selection algorithms and machine learning models. In the case study, the proposed methodology is applied to an actual hospital located at Northeast China, with various levels of data collected from the hospital information system. The results obtained from comparative experiments demonstrates that EMOBPSO maximally maintains the information carried by original feature sets with much less computational cost, compared with GA and SA. The combination of EMOBPSO (EMOBPSO-GLS) and DNN presents robust performance under different situations, which can be deployed as a risk analysis tool for hospital readmission prevention in healthcare systems.

The major contribution of this thesis lies in two areas. Theoretically, instead of using traditional statistical models, this research integrates the artificial intelligence and deep learning techniques to model the uncertainties in healthcare system, which is a huge breakthrough among proposed approaches focusing on healthcare forecasting. Some key operators of artificial intelligence-based algorithms are redesigned to adapt to involved feature selection tasks. As for building model with deep architectures, it is challenging to configure their structure to fit the present applications because DNN-based models contain high-flexibility and are more likely to get overfitting. We have designed manifold configuration schemes on combining training algorithms and regularization strategies, in order to strengthen the advantages of DNN. Practically, predicted values can be regarded as significant references for conducting manifold strategic decision-makings in public healthcare systems, such as nurse rostering, resource planning, bed assignment, targeting the delivery of early resource-intensive interventions, etc.