Abstract
Protein phosphorylation, a key post-translational modification, provides essential insight into protein properties, making its prediction highly significant. Using the emerging capabilities of large language models (LLMs), we apply Low-Rank Adaptation (LoRA) fine-tuning to ESM2, a powerful protein large language model, to efficiently extract features with minimal computational resources, optimizing task-specific text alignment. Additionally, we integrate the conformer architecture with the feature coupling unit to enhance local and global feature exchange, further improving prediction accuracy. Our model achieves state-of-the-art performance, obtaining area under the curve scores of 79.5%, 76.3%, and 71.4% at the S, T, and Y sites of the general data sets. Based on the powerful feature extraction capabilities of LLMs, we conduct a series of analyses on protein representations, including studies on their structure, sequence, and various chemical properties [such as hydrophobicity (GRAVY), surface charge, and isoelectric point]. We propose a test method called linear regression tomography which is a top-down method using representation to explore the model’s feature extraction capabilities. Our resources, including data and code, are publicly accessible at https://github.com/SkywalkerLuke/PhosF3C © The Author(s) 2025.
| Original language | English |
|---|---|
| Article number | bbaf242 |
| Journal | Briefings in Bioinformatics |
| Volume | 26 |
| Issue number | 3 |
| Online published | 27 May 2025 |
| DOIs | |
| Publication status | Published - May 2025 |
Funding
None declared.
Research Keywords
- protein phosphorylation, large language model
- large language model
- LoRA
- Conformer
Publisher's Copyright Statement
- This full text is made available under CC-BY-NC 4.0. https://creativecommons.org/licenses/by-nc/4.0/
Fingerprint
Dive into the research topics of 'PhosF3C: a feature fusion architecture with fine-tuned protein language model and conformer for prediction of general phosphorylation site'. Together they form a unique fingerprint.Cite this
- APA
- Author
- BIBTEX
- Harvard
- Standard
- RIS
- Vancouver