A comprehensive benchmarking for evaluating TCR embeddings in modeling TCR-epitope interactions

Xikang Feng* (Co-first Author), Miaozhe Huo (Co-first Author), He Li, Yongze Yang, Yuepeng Jiang, Liang He, Shuai Cheng Li*

*Corresponding author for this work

Research output: Journal Publications and ReviewsRGC 21 - Publication in refereed journalpeer-review

1 Citation (Scopus)
5 Downloads (CityUHK Scholars)

Abstract

The complexity of T cell receptor (TCR) sequences, particularly within the complementarity-determining region 3 (CDR3), requires efficient embedding methods for applying machine learning to immunology. While various TCR CDR3 embedding strategies have been proposed, the absence of their systematic evaluations created perplexity in the community. Here, we extracted CDR3 embedding models from 19 existing methods and benchmarked these models with four curated datasets by accessing their impact on the performance of TCR downstream tasks, including TCR-epitope binding affinity prediction, epitope-specific TCR identification, TCR clustering, and visualization analysis. We assessed these models utilizing eight downstream classifiers and five downstream clustering methods, with the performance measured by a diverse range of metrics for precision, robustness, and usability. Overall, handcrafted embeddings outperformed data-driven ones in modeling TCR-epitope interactions. To further refine our comparative findings, we developed an all-in-one TCR CDR3 embedding package comprising all evaluated embedding models. This package will assist users in easily selecting suitable embedding models for their data. © The Author(s) 2025. Published by Oxford University Press.
Original languageEnglish
Article numberbbaf030
JournalBriefings in Bioinformatics
Volume26
Issue number1
Online published30 Jan 2025
DOIs
Publication statusPublished - Jan 2025

Funding

This study was funded by the RGC Healthy Longevity Catalyst Awards (CityU9080002, HLCA/E-107/23), the National Natural Science Foundation of China (32300527), and the Natural Science Basic Research Program of Shaanxi Province (2022JQ-644). The publication fee of this article is covered by the National Natural Science Foundation of China (32300527).

Research Keywords

  • benchmarking TCR CDR3 encoding
  • biological relevance of embeddings
  • data-driven and handcrafted embeddings
  • TCR-epitope interaction

Publisher's Copyright Statement

  • This full text is made available under CC-BY-NC 4.0. https://creativecommons.org/licenses/by-nc/4.0/

Fingerprint

Dive into the research topics of 'A comprehensive benchmarking for evaluating TCR embeddings in modeling TCR-epitope interactions'. Together they form a unique fingerprint.

Cite this