Smooth Bayesian network model for the prediction of future high-cost patients with COPD
Research output: Journal Publications and Reviews (RGC: 21, 22, 62) › 21_Publication in refereed journal › Not applicable › peer-review
|Journal / Publication||International Journal of Medical Informatics|
|Early online date||4 Apr 2019|
|Publication status||Published - Jun 2019|
|Link to Scopus||https://www.scopus.com/record/display.uri?eid=2-s2.0-85064082254&origin=recordpage|
Objectives: We aimed to develop a machine-learning model to identify future high-cost patients with COPD. Such a model should incorporate expert knowledge about causal relationships, and the method for estimating the model could provide more accurate predictions than other machine learning methods.
Methods: We used the 2011–2013 medical insurance data of patients with COPD in a large city. The data set included demographic information and admission records. Leveraging on developments in graphical modeling methods, we proposed a smooth Bayesian network (SBN) model for the prediction of high-cost individuals using medical insurance data. The modeling method incorporated some expert knowledge about causal relationships (i.e., about the Bayesian network structure). We employed a smoothing kernel based on the weighted nearest neighborhood method in the SBN model to address overfitting, case-mix effect, and data sparsity (i.e., using data about “similar patients”).
Results: The proposed SBN achieved the area under curve (AUC) of 0.80 and showed considerable improvement over the baseline machine-learning methods. Besides confirming the known factors from the literature, we found “region” (i.e., a suburban or urban area) to be a significant factor, and that in a 3-tier system with primary, secondary and tertiary hospitals, COPD patients who had been admitted to primary hospitals were more likely to develop into future high-cost patients than patients who had been admitted to tertiary hospitals.
Conclusion: The proposed SBN model not only obtained higher prediction accuracy and stronger generalizability than a number of benchmark machine-learning methods, but also used the Bayesian network to capture the complex causal relationships between different predictors by incorporating expert knowledge. Furthermore, a framework was developed to establish the relationships between exposure to historical trajectory and future outcome, which can also be applied to other temporal data to model different trajectory information and predict other outcomes.
- Health informatics, COPD, Bayesian network, Machine learning, Graphical representation, Cost prediction, Temporal data, Data sparsity, Complex causal relationships, Smoothing
Smooth Bayesian network model for the prediction of future high-cost patients with COPD. / Lin, Shaochong; Zhang, Qingpeng; Chen, Frank; Luo, Li; Chen, Lei; Zhang, Wei.In: International Journal of Medical Informatics, Vol. 126, 06.2019, p. 147-155.