PhaGenus: genus-level classification of bacteriophages using a Transformer model

Jiaojiao Guan, Cheng Peng, Jiayu Shang, Xubo Tang, Yanni Sun*

*Corresponding author for this work

Research output: Journal Publications and ReviewsRGC 21 - Publication in refereed journalpeer-review

9 Citations (Scopus)

Abstract

Motivation: Bacteriophages (phages for short), which prey on and replicate within bacterial cells, have a significant role in modulating microbial communities and hold potential applications in treating antibiotic resistance. The advancement of high-throughput sequencing technology contributes to the discovery of phages tremendously. However, the taxonomic classification of assembled phage contigs still faces several challenges, including high genetic diversity, lack of a stable taxonomy system and limited knowledge of phage annotations. Despite extensive efforts, existing tools have not yet achieved an optimal balance between prediction rate and accuracy.

Results: In this work, we develop a learning-based model named PhaGenus, which conducts genus-level taxonomic classification for phage contigs. PhaGenus utilizes a powerful Transformer model to learn the association between protein clusters and support the classification of up to 508 genera. We tested PhaGenus on four datasets in different scenarios. The experimental results show that PhaGenus outperforms state-of-the-art methods in predicting low-similarity datasets, achieving an improvement of at least 13.7%. Additionally, PhaGenus is highly effective at identifying previously uncharacterized genera that are not represented in reference databases, with an improvement of 8.52%. The analysis of the infants’ gut and GOV2.0 dataset demonstrates that PhaGenus can be used to classify more contigs with higher accuracy. © The Author(s) 2023. Published by Oxford University Press.
Original languageEnglish
Article numberbbad408
JournalBriefings in Bioinformatics
Volume24
Issue number6
Online published15 Nov 2023
DOIs
Publication statusPublished - Nov 2023

Bibliographical note

Research Unit(s) information for this publication is provided by the author(s) concerned.

Research Keywords

  • phage classification
  • genus level
  • transformer
  • protein cluster-based tokens

Fingerprint

Dive into the research topics of 'PhaGenus: genus-level classification of bacteriophages using a Transformer model'. Together they form a unique fingerprint.

Cite this