Abstract
Immune checkpoint inhibitor (ICI) immunotherapy has shown great potential as a cancer treatment, leading to significant clinical improvements in numerous cases. However, it benefits a minority of patients, underscoring the importance of discovering reliable biomarkers that can be used to screen for potential beneficiaries and ultimately reduce the risk of overtreatment. We have written a review that thoroughly evaluates the latest advances in predictive biomarkers for ICI therapy from multiple perspectives, including tumor cells, the tumor immune microenvironment (TIME), body fluids, gut microbes, and metabolites. All biomarkers include those belonging to tumor cells-derived biomarkers, TIME-derived biomarkers, liquid biopsy biomarkers, gut microbiome biomarkers, metabolomics biomarkers. Among them, tumor cells-derived biomarkers include tumor mutational burden (TMB) biomarker, tumor neoantigen burden (TNB) biomarker, microsatellite instability (MSI) biomarker, PD-L1 expression biomarker, mutated gene biomarkers in pathways, epigenetic biomarkers. TIME-derived biomarkers include immune landscape of TIME biomarkers, Inhibitory checkpoints biomarkers, Immune repertoire biomarkers. We also discuss various biomarkers-related data detection and analysis methods. Moreover, we summarized the detailed information of these biomarkers, outlining the datasets they utilize, their advantages and disadvantages, as well as the evaluation methods or metrics. We also explored the limitations and challenges of biomarker research for ICI therapy, such as the impact of tumor heterogeneity, the lack of standardization in biomarker detection and analysis, and the difficulty in translating research findings into clinical practice. Furthermore, we present a comprehensive review of computer models for predicting the response to ICI therapy. Computer models to predict the response to Immune Checkpoint Inhibitor (ICI) therapy include knowledge-based mechanistic models and data-based machine learning (ML) models. Among the knowledge-based mechanistic models are pharmacokinetic/pharmacodynamic (PK/PD) models, partial differential equation(PDE) models, signal networks-based models, quantitative systems pharmacology (QSP) models, and agent-based models (ABMs). ML models include linear regression models, logistic regression models, support vector machine (SVM)/ random forest/ extra trees/ k-nearest neighbors (KNN) models, artificial neural network (ANN) and deep learning models. Additionally, there are hybrid models of systems biology and ML. We summarized the details of these models, outlining the datasets they utilize, their evaluation methods/metrics, and their respective strengths and limitations. By summarizing the major advances in the research on predictive biomarkers and computer models for the therapeutic effect and clinical utility of tumor ICI, we aim to assist researchers in choosing appropriate biomarkers or computer models for research exploration and help clinicians conduct precision medicine by selecting the best biomarkers.
Precision medicine in oncology is highly contingent upon the accurate identification and characterization of tumor-specific antigens (TSAs), arising from distinct genetic aberrations, offer clear targets for immune system-mediated elimination of cancer cells. With advancements in bioinformatics, tools like NetMHCpan and MHCflurry 2.0 utilize deep learning for predicting peptide- major histocompatibility complex (MHC) interactions, yet challenges remain in TSAs identification due to their diverse genesis. Proteogenomics indicates many TSAs originate from non-coding genomic regions, underscoring the value of RNA sequencing (RNA-seq) and mass spectrometry in their discovery. Herein, we detail the development of PreTSA, a novel toolkit that analyzes RNA-seq data and peptide liquid chromatography with tandem mass spectrometry (LC-MS/MS) data to isolate and examine TSAs in cancer cell lines and patient tumor samples. PreTSA integrates a comprehensive suite of bioinformatic tools and a database of publicly available omics data to navigate the multifaceted mutational landscape. This includes identifying mutant TSAs(mTSAs)that are derived from various sources such as single nucleotide variants (SNVs), insertions and deletions (INDELs), gene fusions, alternative splicing, and RNA editing. In addition, it identifies aberrantly expressed TSAs (aeTSAs). Finally, PreTSA assesses the immunogenicity of these TSAs through T-cell receptor (TCR) engagement.
Data downloaded from the National Center for Biotechnology Information and other databases were converted to FASTQ format and processed through a complex data analysis process. The process first uses different tools to preprocess, map, and detect mutations in RNA-seq data. Using tools such as STAR, data are mapped to the human reference genome and quantified for transcripts using Kallisto. Then, tools such as freeBayes are used to detect and classify SNV and indel mutations. When capturing possible peptides, special attention is given to single-base mutations and RNA editing mutations, also including the detection of gene fusion and variable splicing mutations. A sample-specific peptide database is generated through the transcriptional transformation of the aforementioned mutation types, only including transcripts with transcripts per million (TPM) values greater than 0. LC-MS/MS data are then used to search the sample-specific databases for different mutation types to identify MHC associated peptides (MAPs). Final MAPs are determined through specific screening criteria, including peptide length and MHC allele affinity levels. When detecting TSA candidates, it is necessary to ensure that the peptide is undetectable in normal tissues, or its RNA-encoded sequence is expressed at least 10 times higher than in medullary thymic epithelial cells (mTECs). Using the BLAT tool, the genomic locations of potential TSAs can be identified and further classified and evaluated. Finally, we tested the immunogenicity of the predicted neoantigens in eliciting T cell responses. We calculated the expression of all TSA sequences in mTECs and dendritic cells (DCs) using BamQuery, TSAs with a reads per hundred million (rphm) less than 8.55 were considered immunogenic. we finally retained 46 TSAs with strong immunogenicity, including 6 mTSAs and 40 aeTSAs. The results indicate that the neoantigens predicted by our process possess strong immunogenicity. In summary, this study has successfully developed the PreTSA toolkit to predict and validate TSAs by combining RNA-seq and LC-MS/MS data. This toolkit not only identifies various TSAs from different mutation types but also validates them using deep learning software. Moreover, this work demonstrates the powerful application of computational science in biomedicine, showcasing how advanced algorithms and data integration can enhance the effectiveness of personalized cancer immunotherapy. This provides new insights into understanding tumor immune responses and is expected to significantly advance personalized cancer immunotherapy.
In addition to research related to human tumor immunotherapy, our focus extends to the intricate subject of mating preferences within different ethnic populations, encompassing the major histocompatibility complex (MHC) and non-MHC regions.
The MHC region, located within the human genome, encodes a crucial set of immune system proteins. These proteins play a pivotal role in immune responses, specifically in antibody presentation and immune cell recognition. Meanwhile, the non-MHC regions also carry genetic information that may impact mate selection. Numerous studies in the past have suggested a human inclination towards MHC-disassortative mating. This phenomenon posits that individuals preferentially choose mates with dissimilar MHC genes, possibly as a mechanism to ensure offspring with a broader immune response diversity. However, this area of research has been fraught with controversies and conflicting findings. In this research, we acquired top-notch data on 111,048,944 single-nucleotide variants (SNVs) and 14,435,076 insertions and deletions (Indels) from 3,202 samples spanning 26 diverse ethnic groups sourced from the 1000 Genomes Project (1kGP). We divided each chromosome within the entire population and each distinct subgroup into multiple segments, each containing a sequence of 1000 loci. Even if the final segment had fewer than 1000 loci, it was still treated as an independent segment. Following this, we firstly calculated the disassortative mating coefficient (Pd) for every heterozygous SNV/INDEL site and the assortative mating coefficient (Pa) for each homozygous locus within both the MHC and non-MHC regions. Using our algorithm, we computed the average Pd and Pa for each bin. We then identified regions composed of continuous bins with p-values less than 0.05. Next, we formed distributions of region lengths based on different numbers of bins and fit these lengths with a normal distribution. Finally, we applied a strict right-tailed p-value cutoff of 1e-10 to select the length thresholds for strong disassortative or positive-assortative mating regions, thus obtaining robust results indicating non-random mating patterns across both MHC and non-MHC regions. This study underscores the critical role of computer science in genomics, demonstrating how our advanced computational techniques not only facilitated the efficient analysis of extensive genomic data but also significantly improved the accuracy of our genetic interpretations.
Our findings revealed that each of the 26 distinct populations demonstrated a preference for disassortative mating within the MHC regions, without any evidence of assortative mating. Assortative mating preferences were found in the non-MHC regions. Conversely, when considering the population as a whole, there was evidence of both assortative and disassortative mating patterns within the MHC regions. Notably, this dual pattern of mating preference was confined to just a single MHC region, where an overlap of both assortative and disassortative mating types was observed. In non-MHC regions, 16 populations exhibited both assortative and disassortative mating with no overlap between these regions. Additionally, the Pd values for disassortative mating in MHC regions were higher than those in non-MHC regions for 19 populations, suggesting a greater influence of MHC disassortative mating regions on mate selection. A minority of populations showed disassortative mate selection only in MHC regions. In areas of both homozygous and heterozygous mating in non-MHC regions, genes linked to olfactory receptors were identified, implying that human mate selection may be associated with olfactory recognition. These outcomes offer novel perspectives and comprehensive insights into the biological mechanisms underpinning human mate selection. Our findings have provided novel perspectives and comprehensive insights into the biological mechanisms underlying human mate choice. Not only do they add complexity to the understanding of mating preferences but also offer potential applications in areas like reproductive medicine and population genetics. By shedding light on both MHC and non-MHC regions, we have contributed to a richer, more nuanced understanding of human mating preferences. Although our research focused on the analysis of genetic factors, it lays the foundation for future studies to integrate these genetic elements with evolutionary, physiological, and even psychological aspects. It signifies a step forward in the multifaceted study of human attraction and selection, with potential implications for future research in anthropology, biology, and medicine.
| Date of Award | 3 May 2024 |
|---|---|
| Original language | English |
| Awarding Institution |
|
| Supervisor | Shuaicheng LI (Supervisor) |