Multi-omics Data Integration to Dissect the Tumor Heterogeneity and Elucidate Cancer Subtype-specific Regulatory Mechanisms


Student thesis: Doctoral Thesis

View graph of relations



Awarding Institution
  • Kui Ming CHAN (Supervisor)
  • Xin WANG (Co-supervisor)
  • Charles DANKO (External person) (External Co-Supervisor)
Award date26 Oct 2021


Cancers are nowadays widely accepted to be highly heterogeneous with multiple risking factors and complicated etiology. This heterogeneity exists not only between different suffered individuals in the same organ (inter-tumor heterogeneity), but also exists in different cell populations within the same tumor (intra-tumor heterogeneity), leading to huge variances in clinical manifestations and therapeutic responses. Understanding the underlying molecular and biological mechanisms causing tumor heterogeneity is the key to personalized therapy. Traditional categorization of tumors based on histopathological features such as tumor size, number of metastatic lymph nodes, etc., has limited power in clinical decision-making. In the last decade, the rapid progress in high-throughput next-generation sequencing technologies makes it possible to profile the tumors from different angles, bringing us to the omics era, including (epi-)genomics, transcriptomics, proteomics, pathomics, etc. Compared to the previous transcriptome-based cancer molecular subtyping, the integration of multi-omics data could better capture the tumor heterogeneity from more comprehensive perspectives.

Different subtypes of cancers could have distinct molecular characteristics, leading to different clinical outcomes. Hence, understanding the subtype-specific regulatory mechanisms is essential for further targeted agent designing. Long non-coding RNAs are emerging as important regulators in tumorigenesis and tumor progression in a variety of regulatory ways. However, little is known about how lncRNAs function in specific cancer subtypes.

The work in this thesis focuses on integrating multi-omics data to better dissect the tumor heterogeneity and further elucidate cancer subtype-specific regulatory mechanisms based on systems biology analysis. The contents in each chapter are summarized below:

Chapter 1: An introduction of tumor heterogeneity and dissecting tumor heterogeneity from single-omics level to multi-omics level. We first reviewed the previous transcriptome-based cancer molecular subtyping and took pancreatic ductal adenocarcinoma (PDAC) as an example. We then summarized the advantages, commonly used data repositories, integration methods, and applications of multi-omics data integration in cancer studies.

Chapter 2: In this chapter, we performed an integrative analysis of multi-omics profiles to dissect the heterogeneity of PDAC and we originally identified four clinically relevant molecular subtypes (MPDACS). We further developed a deep-learning-based tissue classification framework (PDAC-SPA) for PDAC pathological image auto-delineation. Histology image-based spatial characteristics identified novel subgroups within the MPDACS1 subtype with improved prognostic power and different underlying regulatory mechanisms, which is beyond the detection by traditional bulk-tumor omics data.

Chapter 3: In this chapter, we applied an integrative network biology analysis based on multi-omics data to identify cancer subtype-specific master regulatory lncRNAs. We conducted two case studies to elaborate cancer subtype-specific regulatory mechanisms. In the first case study, we inferred a lncRNA regulatory network and prioritized six key master regulators with significant clinical associations in the squamous subtype of PDAC. In the second case study, we identified MIR200CHG as the master regulatory lncRNA underlying the mesenchymal subtype of gastric cancer (GC). We further confirmed the subtype-specific expression pattern of MIR200CHG and investigated its role in the epithelial-mesenchymal transition (EMT) process by in vitro assays.

Chapter 4: In this chapter, to bridge various high-throughput data and downstream functional enrichment analysis with high flexibility, we developed an R package, HTSanalyzeR2. It provides comparative functional enrichment analysis and enriched subnetwork analysis for mutual comparisons between time points or cells for series data such as time-course and single-cell RNA-seq data. To facilitate the usage for non-R users, we also developed an interactive web-based Shiny application to provide an online platform.

Chapter 5: In this chapter, we provide a summary of the thesis as well as brief future perspectives.