Artificial Intelligence Methods for Pharmacogenomics Studies

Student thesis: Doctoral Thesis

Abstract

Pharmacogenomics is a research topic on the relationships between the genomics variation of individuals and the corresponding therapeutic responses. Connecting the individualized genomics factors to their therapeutic effects will contribute to precision medicines and drug discoveries in the near future. The continuous growth and ongoing deployment of high-throughput facilities and large-scale pharmacogenomic datasets have opened new avenues for the discovery of novel pharmacological pathways. However, the high-throughput and large-scale pharmacogenomic studies introduce extra demands on computational power. Therefore, it is challenging to leverage the growing data for novel findings in pharmacogenomics. Fortunately, the recent progress in artificial intelligence development can contribute to the enhancement of efficiency and accuracy for pharmacogenomic studies. In the thesis, we will firstly review up-to-date researches that focus on the related topics within the artificial intelligence field that essential to the improvement of efficiency and accuracy. Furthermore, we will present two topics about our proposed methods to enhance the efficiency of the high-throughput screening studies guided by active learning, and to precisely mine novel cancer targets through network integration. Using active learning, we can accelerate high-throughput screening facilities for pharmacological and genetics tests. Using network integration, we can infer accurate drug cancer targets on pharmacogenomics datasets.

The first topic is to apply active learning to enhance the efficiency of high-throughput screening in the context of pharmacogenomic tests drug perturbed conditions across genomically different samples. In particular, the exhaustive enumeration of all possible combinations is always involved in high-throughput drug screening. Nonetheless, such a screening strategy is hardly believed to be optimal and cost-effective. By incorporating artificial intelligence, we design an open-source model based on categorical matrix completion and active machine learning to guide high throughput screening experiments. Specifically, we narrow our scope to the high-throughput screening for chemical compound effects on diverse protein subcellular locations. In the proposed model, we believe that exploration is more important than the exploitation in the long-run of high-throughput screening experiments. Therefore, we design several innovations to circumvent existing limitations. In particular, categorical matrix completion is designed to accurately impute the missing experiments, while margin sampling is also implemented for uncertainty estimation. The model is systematically tested on both simulated and real data. The simulation results reflect that our model can be robust to diverse scenarios, while the real data results demonstrate the wet-lab applicability of our model for high-throughput screening experiments. Lastly, we attribute the model success to its exploration ability by revealing the related matrix ranks and distinct experiment coverage comparisons.

The second topic is about proposing a network fusion method to ingrate multi-model pharmacogenomics data for drug target identification. Mining drug targets and mechanism of action (MoA) for novel anticancer drugs from pharmacogenomics is a path to enhance the efficiency of drug discovery efficiency. Recent approaches have successfully attempted to discover targets/MoA by characterizing drug similarities and communities using integrative methods on multi-modal or multi-omics drug information. However, the sparse and imbalanced community size structure of the drug network is yet to be considered in recent approaches. Consequently, we developed a novel network integration approach accounting for network structure by a Reciprocal nearest Neighbor and Contextual information Encoding (RNCE) approach. In addition, we proposed a tailor-made clustering algorithm to perform drug community detection on drug networks. RNCE and spectral clustering are proved to outperform state-of-the-art approaches in a series of tests, including network similarity tests and community detection tests comparing with two drug databases. The observed improvement of RNCE can contribute to the field of drug discovery and the related multi-modal/multi-omics integrative studies.

To conclude, two artificial intelligence methods for pharmacogenomics have been developed. We hope that the proposed methods can serve as useful tools for downstream studies in the era of personalized medicine.
Date of Award14 Aug 2020
Original languageEnglish
Awarding Institution
  • City University of Hong Kong
SupervisorKa Chun WONG (Supervisor)

Cite this

'