PhosF3C: A Feature Fusion Architecture with Fine-Tuned Protein Language Model and Conformer for prediction of general phosphorylation site

Yuhuan Liu, Haitian Zhong, Jixiu Zhai, Xueying Wang*, Tianchi LU*

*Corresponding author for this work

Research output: Working PapersPreprint

Abstract

Protein phosphorylation, a key post-translational modification (PTM), provides essential insight into protein properties, making its prediction highly significant. Using the emerging capabilities of large language models (LLMs), we apply LoRA fine-tuning to ESM2, a powerful protein large language model, to efficiently extract features with minimal computational resources, optimizing task-specific text alignment. Additionally, we integrate the conformer architecture with the Feature Coupling Unit (FCU) to enhance local and global feature exchange, further improving prediction accuracy. Our model achieves state-of-the-art (SOTA) performance, obtaining AUC scores of 79.5%, 76.3%, and 71.4% at the S, T, and Y sites of the general data sets. Based on the powerful feature extraction capabilities of LLMs, we conduct a series of analyses on protein representations, including studies on their structure, sequence, and various chemical properties (such as Hydrophobicity (GRAVY), Surface Charge, and Isoelectric Point). We propose a test method called Linear Regression Tomography (LRT) which is a top-down method using representation to explore the model’s feature extraction capabilities, offering a pathway to improved interpretability.

The copyright holder for this preprint is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under a CC-BY-NC-ND 4.0 International license.
Original languageEnglish
PublisherbioRxiv
DOIs
Publication statusPublished - 25 Dec 2024

Research Keywords

  • deep learning
  • phosphorylation site
  • fine tune
  • Lora

Fingerprint

Dive into the research topics of 'PhosF3C: A Feature Fusion Architecture with Fine-Tuned Protein Language Model and Conformer for prediction of general phosphorylation site'. Together they form a unique fingerprint.

Cite this