Analyze, Visualize, and Predict Lung Cancer Drug Resistance based on Molecular Dynamics Simulation and Machine Learning
基於分子動力學模擬和機器學習的肺癌抗藥性進行分析,可視化及其預測
Student thesis: Doctoral Thesis
Author(s)
Related Research Unit(s)
Detail(s)
Awarding Institution | |
---|---|
Supervisors/Advisors |
|
Award date | 1 Mar 2021 |
Link(s)
Permanent Link | https://scholars.cityu.edu.hk/en/theses/theses(a7113b2b-123e-4758-981e-41a24afe014d).html |
---|---|
Other link(s) | Links |
Abstract
Lung cancer is a leading cause of cancer deaths worldwide, resulting in the loss of millions of lives each year. The mutation in the epidermal growth factor receptor (EGFR) is a pathogenic factor in lung cancer development. EGFR tyrosine kinase inhibitors (TKIs), such as Gefitinib/Erlotinib, have been developed to treat lung cancer patients. Interventions using these inhibitors have produced encouraging early outcomes, but the long-term efficacy appears limited with the emergence of drug resistance following one or more secondary mutations, such as L858R-T790M.
This thesis used molecular dynamics (MD) simulations and machine learning-based methods to study EGFR-mutated drug resistance. Our first investigation examined the multi-domain EGFR using parametric models, such as power spectral densities, dynamic time warping, and Pearson correlation coefficient. Our work focused on the stability of these structures using the correlation between domains of the EGFR. We found that the domains of mutant structures are less stable than the wildtype protein.
In our second investigation, we developed a novel pipeline to extract the correlated motions from the domains of EGFR using normal mode analysis. Correlated motions are essential for protein functions, and the motion patterns can be used for structure prediction. To visualize these correlated motions, we used dynamic cross-correlation maps. Community network analysis was used to cluster residues in each domain based on pairwise correlation. Also, hydrogen bond analysis was performed on the extracellular and kinase domains. This work provides valuable insights into the dynamics and structural changes observed in the EGFR and its mutants.
In our next contribution, we focused on protein-drug interactions and visualizations. We predicted the drug-mutant dimer using computational methods and performed MD simulations for a period of 10ns. Based on the MD trajectory, we proposed a novel framework to visualize and analyze protein-drug interactions systematically, which is complementary to the traditional biochemistry drug discovery pipeline. Although it is a computational method, and only a small amount of experimental data was used, our results show excellent agreement with clinical findings. This method can be tested on other types of cancer or disease, and we believe that the proposed framework has the potential to improve the drug-discovery pipeline.
Finally, we worked on drug resistance prediction using machine learning. Given a patient with clinical information, we extracted geometrical and energy features and trained a classifier to predict the four-class drug-response level. The proposed model achieved 97.5% accuracy, 100% recall, 94% F1-measure, and 96.3% precision with the random forest classifier. Personalized/precision medicine is a growing field in healthcare. Diseases are heterogeneous, and personalized medicine proposes healthcare solutions tailored to individual patients. Our method used genotypic information and can be regarded as a personalized drug response prediction model for lung cancer patients with a highly accurate drug response prediction rate. This model encourages the development of personalized therapy for lung cancer patients.
This thesis has developed computational methods to analyze, visualize, and predict the drug-resistance and drug-response for lung cancer patients. These studies will lead to a better understanding of the molecular mechanisms contributing to drug resistance. The proposed methods can also be applied to other types of cancer or related disease and can improve the drug discovery pipeline.
This thesis used molecular dynamics (MD) simulations and machine learning-based methods to study EGFR-mutated drug resistance. Our first investigation examined the multi-domain EGFR using parametric models, such as power spectral densities, dynamic time warping, and Pearson correlation coefficient. Our work focused on the stability of these structures using the correlation between domains of the EGFR. We found that the domains of mutant structures are less stable than the wildtype protein.
In our second investigation, we developed a novel pipeline to extract the correlated motions from the domains of EGFR using normal mode analysis. Correlated motions are essential for protein functions, and the motion patterns can be used for structure prediction. To visualize these correlated motions, we used dynamic cross-correlation maps. Community network analysis was used to cluster residues in each domain based on pairwise correlation. Also, hydrogen bond analysis was performed on the extracellular and kinase domains. This work provides valuable insights into the dynamics and structural changes observed in the EGFR and its mutants.
In our next contribution, we focused on protein-drug interactions and visualizations. We predicted the drug-mutant dimer using computational methods and performed MD simulations for a period of 10ns. Based on the MD trajectory, we proposed a novel framework to visualize and analyze protein-drug interactions systematically, which is complementary to the traditional biochemistry drug discovery pipeline. Although it is a computational method, and only a small amount of experimental data was used, our results show excellent agreement with clinical findings. This method can be tested on other types of cancer or disease, and we believe that the proposed framework has the potential to improve the drug-discovery pipeline.
Finally, we worked on drug resistance prediction using machine learning. Given a patient with clinical information, we extracted geometrical and energy features and trained a classifier to predict the four-class drug-response level. The proposed model achieved 97.5% accuracy, 100% recall, 94% F1-measure, and 96.3% precision with the random forest classifier. Personalized/precision medicine is a growing field in healthcare. Diseases are heterogeneous, and personalized medicine proposes healthcare solutions tailored to individual patients. Our method used genotypic information and can be regarded as a personalized drug response prediction model for lung cancer patients with a highly accurate drug response prediction rate. This model encourages the development of personalized therapy for lung cancer patients.
This thesis has developed computational methods to analyze, visualize, and predict the drug-resistance and drug-response for lung cancer patients. These studies will lead to a better understanding of the molecular mechanisms contributing to drug resistance. The proposed methods can also be applied to other types of cancer or related disease and can improve the drug discovery pipeline.
- Cancer, Drug resistance, Protein-drug interactions, Protein-drug visualizations, Molecular dynamics simulation, Personalized and precision medicine, Data science in medicine, Machine learning