Machine Learning for Individualized Cardiovascular Diseases Risk Stratification


Student thesis: Doctoral Thesis

View graph of relations


Related Research Unit(s)


Awarding Institution
Award date6 Sep 2021


Over the past decades, efficient risk stratification of cardiovascular diseases has been paid much attention in both practitioners and researchers. With the advancement of machine learning techniques and the availability of massive electrical records and sensor data in hospitals, novel data driven risk stratification models are constructed, but are not fully translated into accurate and easy-for-use diagnosis tools in healthcare industry.

This thesis consists of four studies about developing statistical methods and machine learning techniques for cardiovascular data analytics. In the first three studies, we developed statistical and machine learning models for computational cardiovascular risk stratification with territory-wide population cohorts. In the last study stream, we investigated a fuzzy and factorization based machine learning model to improve patient readmission prediction by combination hospitalization data (hard information) and domain knowledge (as soft side information).

Firstly, we presented three investigations of developing efficient statistical learning methods for cardiovascular risk stratification in Brugada Syndrome (BrS), arrhythmogenic right ventricular dysplasia/cardiomyopathy disease (ARVD/C), spontaneous ventricular tachycardia/ventricular fibrillation (VT/VF), and acquired long QT syndrome (aLQTS). Multiple electrocardiographic (ECG) indices of depolarization and repolarization were extracted from 12-leads signals. Nonlinear variable interaction patterns that were ignored in traditional linear models were considered in generalized additive model with pair-wise interactions (GA2M). The GA2M retains additive structure of linear models for intuitive explanations and identified set of ECG risk factors and their pair-wise interactions that significantly improved model prediction performance. In the study of a territory-wide retrospective cohort study of patients with ARVD/C with incident VT/VF as primary outcome, and new-onset heart failure with reduced ejection fraction (HFrEF) and all-cause mortality as secondary outcomes, risk score systems were constructed for practical diagnosis use. Then we focuses on the controversial case of existing prevalent score systems of BrS disease and the management of intermediate risk BrS, and evaluated the predictive performance of different risk scores for the overall and intermediate risk in a large Asian BrS population cohort with area under the receiver operator characteristic (ROC) curve (AUC).

Then we combine clinical and ECG risk factors for BrS risk stratification with primary outcome as spontaneous VT/VF. External validation was made with patients from three countries. Significant risk predictors of spontaneous VT/VF were identified. Non-negative matrix factorization (NMF) improves the predictive performance of arrhythmic outcomes by extracting latent features between different variables. For risk stratification of aLQTS diseases, we conducted random survival forest model with the consideration of NMF component to extract latent variables. Significant clinical characteristics and ECG parameters were identified and then were selected as input of random survival forests (RSF) model. Subsequently, latent variables extracted with NMF were entered into RSF model as additional predictors. Variable importance ranking was generated. The present RSF-NMF model significantly improved risk stratification performance with comparisons over baselines.

Finally, we focus on the mitigation of preventable readmissions that is the key to enhance the quality and efficiency of healthcare services in Hong Kong, we propose a locally weighted factorization machine with fuzzy partition (wFMFP) model, which (I) splits the data into several training subsets using the fuzzy partition algorithm, (II) fits the wFM within each training subset, and (III) combines the fitted wFM for each training subset by the Takagi-Sugeno-Kang fuzzy weighted mechanism. The wFMFP model was evaluated with a large-scale territory-wide cohort and outperforms state-of-the-art baseline models. The results indicate the effectiveness of applying the locally weighting notion, and the advantageous potential of readmission risk scoring system for clinical use.

By conducting the investigations of the above statistical methods and machine learning based studies for risk stratification in the healthcare area, the thesis aims to bridge existing gaps of translating insights from machine learning into practical applications in computational cardiovascular science. These studies in this thesis provide insights into the implementation of effective and efficient machine learning models for practical tools in healthcare.

    Research areas

  • Machine learning, computational cardiovascular science, Brugada syndrome, factorization