Abstract
Drug development is a lengthy, costly process that often takes up to 12 years and over $2.8 billion, beginning with the identification of potential drug candidates and progressing through rigorous validation in preclinical and clinical trials. Each stage is characterized by significant uncertainty, underscoring the challenges inherent in bringing new drugs to market. Fortunately, deep learning methods have paved the way for high throughput drug discovery, including de novo drug design, drug molecular property prediction, drug side effect prediction, target identification, and drug repositioning according to the stages of drug development. This thesis introduces novel machine learning methodologies designed to tackle the computational bottlenecks at the frontier of high-throughput, multi-modality drug discovery. Specifically, the algorithms proposed in this thesis demonstrate that the accuracy of machine learning methods to infer potential drug-drug interactions and drug-target interactions from multiple modalities. Furthermore, we have also designed the generative models for drug molecule geometries generation of various conditions.Drug-drug interactions (DDIs) are the important annotations in precision medicine, given its overwhelming knowledge and available information. The interactions of drugs may induce adverse reactions and increase the treatment efficacy such as a complete resolution of pain of trigeminal neuralgia. Previous work incorporated the chemical properties and biology properties including the molecular structure, pathway, side effect and target to predict the DDIs. However, the rapid growth in literature accumulates diverse and yet comprehensive drug interactions. Therefore, this thesis first proposes EGFI, which leverages LLM to fuses multiple sentence and entity information to distinguish the DDIs in the sentences and generate potential meaningful DDIs. The experimental results indicate that EGFI can outperform state-of-the-art methods on the DDIs 2013 dataset and DTIs dataset for biomedical relation extraction. Drug-target interaction (DTI) annotation is another essential task for drug development. Unveiling novel protein-ligand binding sites aids in identifying the potential side effect toxicities and drug repositioning. Although conventional manual DTI annotation approaches remain reliable, it is notoriously laborious and time-consuming to test each drug-target pair since therapeutic effects are preferred to be validated by manual laboratory and clinical trials. In this sense, this thesis then proposes CoaDTI, an end-to-end deep learning framework to significantly improve the efficiency and interpretability of drug target annotation. The experimental results demonstrate that CoaDTI achieves competitive performance on three public datasets compared to state-of-the-art models. The extended study reveals that CoaDTI can identify novel DTIs such as reactions between candidate drugs and SARS-CoV-2-associated proteins. The visualization of Co-attention scores can illustrate the interpretability of our model for mechanistic insights.
Within the realm of drug discovery, the initial phase of de novo molecule generation holds significant importance as it encompasses the automated creation of chemically valid structures possessing desirable properties. Well-built molecule libraries can largely facilitate medicinal chemists to search for ideal chemicals against chosen targets in drug discovery. However, given the huge diversity of atom types and chemical bonds, the manually daunting task in proposing valid, unique, and property-restricted molecules is extraordinarily costly. Therefore, we utilize the geometric learning to develop a generative method (MDM) which is based on the diffusion model to generate molecules in 3D space from scratch. Based on the accuracy and efficiency of MDM, we extend this framework called PMDM to the structure based drug design scenario. The comprehensive experiments indicate that PMDM outperforms baseline models across multiple evaluation metrics. To evaluate the applications of PMDM under real drug design scenarios, we conduct lead compound optimization for SARS-CoV-2 main protease and Cyclin-dependent Kinase 2 (CDK2), respectively. The selected lead optimization molecules are synthesized and evaluated for their in-vitro activities against CDK2, displaying improved CDK2 activity.
However, modulating a single target can rarely cure or alleviate the complex disorders in which the breakdown of robust physiological systems results from a combination of multiple genetic and/or environmental factors. Besides, co-crystal structures of protein–ligand pairs are hard to be verified by researchers while high dimensional structure matrices in which most residues or atoms do not have interactions are always sparse, leading to tedious and redundancy calculations. Therefore, this thesis proposes DiffDTM, a novel unified conditional structure-free deep generative model, to generate molecular graphs given dual targets. To the best of our knowledge, we are the first to design a deep generative model for molecule generation given arbitrary dual targets. Working on extensive collection integrated from multiple data sources and a real-world use case, we demonstrate that DiffDTM outperforms state-of-the-art models in terms of binding affinity scores and achieves comparable performance on multiple metrics.
| Date of Award | 23 Aug 2024 |
|---|---|
| Original language | English |
| Awarding Institution |
|
| Supervisor | Ka Chun WONG (Supervisor) |