Skip to main navigation Skip to search Skip to main content

PhosF3C: a feature fusion architecture with fine-tuned protein language model and conformer for prediction of general phosphorylation site

Research output: Journal Publications and ReviewsRGC 21 - Publication in refereed journalpeer-review

7 Downloads (CityUHK Scholars)

Abstract

Protein phosphorylation, a key post-translational modification, provides essential insight into protein properties, making its prediction highly significant. Using the emerging capabilities of large language models (LLMs), we apply Low-Rank Adaptation (LoRA) fine-tuning to ESM2, a powerful protein large language model, to efficiently extract features with minimal computational resources, optimizing task-specific text alignment. Additionally, we integrate the conformer architecture with the feature coupling unit to enhance local and global feature exchange, further improving prediction accuracy. Our model achieves state-of-the-art performance, obtaining area under the curve scores of 79.5%, 76.3%, and 71.4% at the S, T, and Y sites of the general data sets. Based on the powerful feature extraction capabilities of LLMs, we conduct a series of analyses on protein representations, including studies on their structure, sequence, and various chemical properties [such as hydrophobicity (GRAVY), surface charge, and isoelectric point]. We propose a test method called linear regression tomography which is a top-down method using representation to explore the model’s feature extraction capabilities. Our resources, including data and code, are publicly accessible at https://github.com/SkywalkerLuke/PhosF3C © The Author(s) 2025.
Original languageEnglish
Article numberbbaf242
JournalBriefings in Bioinformatics
Volume26
Issue number3
Online published27 May 2025
DOIs
Publication statusPublished - May 2025

Funding

None declared.

Research Keywords

  • protein phosphorylation, large language model
  • large language model
  • LoRA
  • Conformer

Publisher's Copyright Statement

  • This full text is made available under CC-BY-NC 4.0. https://creativecommons.org/licenses/by-nc/4.0/

Fingerprint

Dive into the research topics of 'PhosF3C: a feature fusion architecture with fine-tuned protein language model and conformer for prediction of general phosphorylation site'. Together they form a unique fingerprint.

Cite this