TransHLA : A Hybrid Transformer Model for HLA-Presented Epitope Detection
Research output: Journal Publications and Reviews › RGC 21 - Publication in refereed journal › peer-review
Author(s)
Related Research Unit(s)
Detail(s)
Original language | English |
---|---|
Journal / Publication | GigaScience |
Publication status | Accepted/In press/Filed - 21 Jan 2025 |
Link(s)
DOI | DOI |
---|---|
Permanent Link | https://scholars.cityu.edu.hk/en/publications/publication(11ff1888-067c-4d82-a8c5-5ddc92ba60bd).html |
Abstract
Background: Precise prediction of epitope presentation on human leukocyte antigen (HLA) molecules is crucial for advancing vaccine development and immunotherapy. Conventional HLA-peptide binding affinity prediction tools often focus on specific alleles and lack a universal approach for comprehensive HLA site analysis. This limitation hinders efficient filtering of invalid peptide segments.
Results: We introduce TransHLA, a pioneering tool designed for epitope prediction across all HLA alleles, integrating Transformer and Residue CNN architectures. TransHLA utilizes the ESM2 large language model for sequence and structure embeddings, achieving high predictive accuracy. For HLA class I, it reaches an accuracy of 84.72% and an AUC of 91.95% on IEDB test data. For HLA class II, it achieves 79.94% accuracy and an AUC of 88.14%. Our case studies using datasets like CEDAR and VDJdb demonstrate that TransHLA surpasses existing models in specificity and sensitivity for identifying immunogenic epitopes and neoepitopes.
Conclusions: TransHLA significantly enhances vaccine design and immunotherapy by efficiently identifying broadly reactive peptides. Our resources, including data and code, are publicly accessible at https://github.com/SkywalkerLuke/TransHLA
The copyright holder for this preprint is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity.
Results: We introduce TransHLA, a pioneering tool designed for epitope prediction across all HLA alleles, integrating Transformer and Residue CNN architectures. TransHLA utilizes the ESM2 large language model for sequence and structure embeddings, achieving high predictive accuracy. For HLA class I, it reaches an accuracy of 84.72% and an AUC of 91.95% on IEDB test data. For HLA class II, it achieves 79.94% accuracy and an AUC of 88.14%. Our case studies using datasets like CEDAR and VDJdb demonstrate that TransHLA surpasses existing models in specificity and sensitivity for identifying immunogenic epitopes and neoepitopes.
Conclusions: TransHLA significantly enhances vaccine design and immunotherapy by efficiently identifying broadly reactive peptides. Our resources, including data and code, are publicly accessible at https://github.com/SkywalkerLuke/TransHLA
The copyright holder for this preprint is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity.
Research Area(s)
- Epitope Presentation, Pre-trained language model, Deep Learning
Citation Format(s)
TransHLA: A Hybrid Transformer Model for HLA-Presented Epitope Detection. / LU, Tianchi; Wang, Xueying; Nie, Wan et al.
In: GigaScience, 21.01.2025.
In: GigaScience, 21.01.2025.
Research output: Journal Publications and Reviews › RGC 21 - Publication in refereed journal › peer-review