Identification of Determinants for Stroke Prevalence and Occurrence


Student thesis: Doctoral Thesis

View graph of relations


Related Research Unit(s)


Awarding Institution
Award date2 Sep 2022


Stroke is a severe and acute neurological condition, posing threats to life, causing disability of humans worldwide and contributing to the disease burden on society. A stroke consists of mainly two categories—ischaemic stroke and haemorrhagic stroke. Ischaemic, the most common type of stroke, results from the block or stenosis of an artery to the brain. Haemorrhagic stroke is the less common type caused by brain bleeding, comprising approximately 20% of all strokes.[1], [2] It occurs when the artery starts bleeding suddenly into the brain, which results in the dysfunction of the part of the body related to the damaged area of the brain.

Different subtypes of stroke share similar signs and symptoms, for example, sudden numbness or weakness of the face or limbs, severe headache without known causes, loss of speech or vision, and difficulty walking or understanding. Identifying the risk factors of stroke occurrence and characterising the determinants of recurrence/mortality among stroke survivors is the cornerstone of stroke prevention.[3] Understanding the profiles of individuals before and after stroke, including the modifiable and non-modifiable factors, can guide strategies on the occurrence, prognosis and recurrence of stroke. Stroke often results from the combined effects of genes and their complex interactions with environmental determinants.[4] These relationships might be non-linear and time-varying. Epidemiologic and genetic studies, coupled with the implementation of machine learning, have become widespread in exploring the determinants of stroke occurrence and prevalence, which provide insights into novel determinants and their interactions with stroke. The work in this thesis focuses on applying epidemiologic, genetic and machine learning methods to identify the principal risk factors and influential factors of stroke comprehensively and systematically. The contents of each chapter are summarised below.

Chapter 1: An introduction to stroke risk factors and causal inference. We first reviewed the well-established risk factors of strokes, including ischaemic and haemorrhagic strokes. We then introduced causal inference used to determine the risk factors. We presented the hypotheses, challenges, and alternative methods for causal inference, whereby we summarised the instrumental variable approaches and machine learning algorithms.

Chapter 2: In this chapter, we performed Mendelian randomisation, a combination of epidemiologic and genetic approaches, to discover the causal effect of mean corpuscular volume (MCV) and red cell distribution width (RCDW) on haemorrhagic strokes. We further performed colocalisation and mediation analysis to locate the pathway of single nucleotide polymorphism between MCV/RCDW and the outcomes. Finally, we carried out mediation analyses to account for the causal effect mediating the exposures and outcomes.

Chapter 3: In this chapter, we applied machine learning methods to explore a wide range of stroke-related features among stroke survivors, including dietary nutrients, blood biomarkers and clinical information. We computed feature importance, developed a nomogram and validated the nomogram on different datasets.

Chapter 4: In this chapter, we summarised the thesis and future work.