Method and Analysis based on the Haplotypes of Immune-related Genes


Student thesis: Doctoral Thesis

View graph of relations


Related Research Unit(s)


Awarding Institution
Award date7 Jul 2023


Immunotherapy is a widely used method for oncotherapy, but our comprehension of the intricacies and specificity of the immune system remains limited. The human leukocyte antigen (HLA) genes encode molecules that are essential to the immune system. T-cell receptors (TCRs) found on the outer surface of T cells play a crucial role in recognizing foreign peptides bound to HLA molecules, while B-cell receptors (BCRs) on B cells can combine with antigens directly. However, understanding the antitumor immune responses is challenging due to the TCR/BCR diversity and HLA alleles polymorphism. During the Ph.D. studies, I focused on haplotyping immune-related genes, including the HLA family, Killer cell immunoglobulin-like receptors (KIR) family, TRAV, TRBV, IGHV, IGHJ, and others, to decipher the relationship of immune-related genes in anti-tumor immune responses.

Sequencing data presents a challenge when reconstructing diploid haplotypes for entire HLA genes. The high heterogeneity within HLA alleles and high homogeneity across the genes complicate the identification of reference source loci for sequencing reads.  Most existing HLA typing software adopts a database-matching strategy that identifies the best-matching HLA alleles from the allele database from the next-generation sequencing (NGS) data. HLA LOH events frequently occur in cancer patients and might cause immune evasion in cancer evolution. HLA LOH events in tumor samples would cause allele imbalance due to the mix of normal and tumor cells. Therefore, reconstructing HLA haplotypes and detecting the HLA LOH events is vital for understanding the anti-tumor immune responses. We developed a tool named SpecHLA which affords accurate typing, haplotyping, and HLA LOH detection for eight common HLA genes (HLA-A, HLA-B, HLA-C, HLA-DPA1, HLA-DPB1, HLA-DQA1, HLADQB1, and HLA-DRB1). SpecHLA is suitable for various data types, including WGS, WES, RNAseq, 10x, HiC, PacBio, and ONT. Furthermore, we are developing SpecComplex, an extended version of SpecHLA that can accurately type immune-complex genes, including 27 HLA, 16 KIR, and 24 CYP typing. SpecComplex aims to cover more complex genes and provide a comprehensive analysis of immune-related genes to regulate anti-tumor immune responses.

The immune system requires a huge repertoire of TCRs/BCRs to recognize a wide variety of antigens, with complementarity determining region 3 (CDR3) of TCR/BCR being crucial to antigen-specific recognition. The diversity of CDR3 is determined by the recombination of VDJ gene fragments during T/B cell development. Investigating the V/J allele recombination bias of the immune repertoire and its relationship with HLA aids in comprehending HLA and antigen selection in the immune repertoire. We combined the HLA typing and TCR dataset from a total of 1,926 samples, of which 1,348 samples contain HLA typing information, 1,173 samples contain TCR-beta sequence data, and 476 samples contain TCR-alpha sequence data.  The study based on these data provides evidence for a relationship between HLA and TCR of CDR3 length, TCR diversity, and VJ gene usage. Our analysis of the TCR repertoire of cancer patients showed that certain HLA types and variable genes exhibit biased usage patterns.  Additionally, smoking and tissue type were found to influence VJ recombination bias scores in the TCR repertoire. The study suggests that these factors can affect the diversity and specificity of T-cell responses, which have associations with disease susceptibility and vaccine efficacy.

To gain new insights into cancer genetics and improve antitumor immunity, we study the somatic mutated SNPs (smSNPs) found only in cancer cells but not in matched normal cells.  We focused on their occurrence and distribution in the TCGA cancer dataset and found smSNPs are characterized predominantly by C>T mutations at CpG sites, and signatures SBS1 and DBS7 are more prevalent in smSNPs.  SBS1 is initiated by the spontaneous or enzymatic deamination of 5-methylcytosine to thymine, and DBS7 is related to the defective DNA mismatch repair mechanism. These findings suggested these mechanisms may be more prone to affect smSNPs compared to other somatic mutations. Understanding the specific signatures associated with smSNPs may provide insights into the underlying mechanisms of these mutations and their potential clinical implications.