Abstract
Complex human diseases are caused by both genetic and environmental factors. Although complex diseases are diverse, they have several traits in common, including genetic heterogeneity, phenotypic complexity, high population prevalence, and significant impact on human physical and mental health. Studying the pathogenesis and aetiology of complex human diseases has important implications for disease prediction, prevention, and treatment. Stroke and non-small cell lung cancer are the two complex human diseases that I focused on in this thesis.Stroke can be referred as a cerebrovascular event, cerebrovascular accident, cerebrovascular incident, or encephalopathy. Stroke is considered as a cerebral blood circulation disorder and brain tissue function or structural damage caused by occlusion or rupture of cerebral blood vessels. Therefore, stroke can be categorized into ischemic and haemorrhagic stroke, generally referring to ischemia or haemorrhage of the cerebral arterial system. Every year, approximately 15 million people worldwide suffer from stroke, of which about 5 million patients die, and another 5 million patients are permanently disabled. For stroke, a disease with high mortality, high disability, and acute onset, the identification of fatal risk factors and the prediction of short-term mortality in patients is important, as this can provide information that supports clinicians to make optimal decisions at the right time in hospitals, help save the lives of stroke patients, and help surviving patients get a better prognosis. Moreover, it can also help medical scientists and doctors to understand the causal factors and risk factors of stroke, as well as further understand stroke at the mechanistic level, which can lead to better prevention and treatment of stroke.
Cancer is the leading cause of death globally. Lung cancer, the second most common cancer in the world, has approximately 2.2 million new cases, and approximately 1.8 million people die every year, which is a type of cancer with the highest number of deaths in the world. Non-small cell lung cancer is the most common type, accounting for almost 85\% of lung cancer cases. It has been demonstrated that expression patterns of protein-coding genes and microRNA play an important role in the diagnosis, staging, progression, and prognosis of non-small cell lung cancer. Some genetic factors may have the potential to explain the carcinogenic mechanism or have the potential to guide treatment decisions and drug development of targeted therapy.
The research fields of prediction, prevention, and treatment of complex human diseases are comprehensively interdisciplinary and multi-disciplinary, in which biotechnology, multi-omics science, computer science, applied mathematics, statistics, big data analytics and other disciplines interplay together. With the rapid development of computer hardware, computing performance and algorithms, and availability of vast and complex human genetic data, and patient information, from all over the world can be analysed with various mathematical models and multiple artificial intelligence networks. The disciplines of computational biology and bio-informatics complement each other to provide us with more advanced algorithms to analyse the ample type and amount of data. This will enable us to reveal physiological mechanisms from different perspectives and explore the nature of biology, thereby promoting the development of human biology and medicine.
The research described in this thesis mainly includes five parts: 1) The background information for the studies in this thesis; 2) The prediction of death outcomes and high-risk fatal factors of stroke patients based on machine learning and deep learning. In this part, machine learning and deep learning models integrating structured data and natural language processing were established. The predicted 6-month mortality area under the receiver operating characteristic curve for haemorrhagic stroke and ischemic stroke patients reached 0.89 and 0.88, 26 and 34 high-risk factors associated with patient mortality were identified for the two subtypes of stroke, respectively; 2) Mendelian Randomisation based analysis of non-small cell lung cancer-associated microRNAs, in which five microRNAs were found to be associated with lung adenocarcinoma, a sub-type of non-small cell lung cancer, however, direct causalities with the five microRNAs could not be identified in this study; and 3) Protein-coding gene prediction and drug repositioning of lung adenocarcinoma based on graph neural network, a graph attention network model was established for the prediction of related protein-coding genes of lung adenocarcinoma and achieved the area under the receiver operating characteristic curve of 0.90. One pre-clinical compound and two drugs were found to have therapeutic potential to treat lung adenocarcinoma; 5) Conclusions and future works. The models developed in the studies will be further optimised and used in hospital systems or more diseases and species in the future.
| Date of Award | 26 Apr 2023 |
|---|---|
| Original language | English |
| Awarding Institution |
|
| Supervisor | Kei Hang Katie CHAN (Supervisor) |