Decoding Gastroesophageal Cancer Genomes


Student thesis: Doctoral Thesis

View graph of relations


Related Research Unit(s)


Awarding Institution
Award date16 Jun 2022


Gastroesophageal cancer is a common disease globally. Esophageal squamous cell carcinoma (ESCC) is a primary subtype of esophagus cancer and is a canonical disease in the Chinese population. Gastroesophageal cancer usually demonstrates significant genomic instability with thousands of somatic mutations and hundreds of structural variations (SV). In addition to simple SV such as deletion and tandem duplication (TD), nearly 50% of SVs are from complex rearrangements and lead to a novel genomic sequence. They are the primary sources of high-level amplification of oncogenes, of which many genes are well-established therapeutic targets such as CDK4/6, EGFR, and ERBB2. Some complex patterns are identified, such as chromothripsis, chromplexy based on the type, and distribution of SVs. In our study, we would like first to reconstruct the derived genome sequence, which is a result of rearrangements, and then investigate their patterns and clinical significance. This method could enumerate potential patterns within the genome and aid in finding novel patterns. To this end, I focused on a tool development (FindRear) on finding complex rearrangements and also assessing their biological significance in gastroesophageal cancers.

FindRear was built based on the graph method, and it could find rearrangements with four or more breakpoints. It could identify several types of complex rearrangements, consisting of some rare rearrangements such as fold-back inversions mediated with template sequence insertions (TSIs). It also helps identify chromothripsis and ecDNA (extra-circular DNA). When we applied it in ESCC genomes from 528 patients, we found some rearrangements with long genomic chains similar to chromothripsis-like patterns but different in size and distribution, which is probably a novel pattern in ESCC genomes.

We also compare the genomic metrics between complex rearrangements and simple SVs within gastroesophageal cancers and assess their clinical significance. We found distinct types of SV signatures among simple SVs, of which we highlighted a class of tandem duplication (TD-c2) in ESCCs. It is characterized with ~100kb in size and is an important source of driver gene amplification. Compared to complex rearrangements, it significantly enriched in early DNA-replicating and chromatin-accessibility regions. Of interest, we noticed that fold-back inversions are more likely to occur near the centromere. We also unveiled a novel hotspot involving the super-enhancer of gene PTHLH driven by TD-c2. The functional study also confirmed the role of PTHLH in carcinogenesis.

In addition, we also explore the genetic evolution of gastroesophageal cancer. We sequenced and studied 120 patients (approximately 900 samples) with multi-regional omics-sequencing, consisting of whole-exome sequencing, methylation, RNA-seq, or TCR. These data demonstrate considerable intra-patient heterogeneity. This heterogeneity is evident between morphologically normal and tumor-tissue, dysplasia and matched tumors, primary tumor and metastasis, and distinct regions of the same tumor. We confirmed the prevalence of BRCAness signature in 10$\%$ ESCCs, and BRCAness ESCCs display better outcomes. ESCC patients with high-infiltrated tumors demonstrated high genetic heterogeneity, immune evasion, and immunosuppressive nature. Immunoediting activity depends on the immune contexture and could predict patient prognosis. Notably, I proposed an asymmetric evolutionary model for ESCC, during which subclonal expansion is not equal in different locations, especially for the upper of the primary tumor.

In conclusion, first, I developed a tool called FindRear for complex rearrangement identification; second, I found a novel type of fold-back inversion; third, I nominated a TD-hotspot; finally, I proposed an asymmetric evolution model for ESCC.