PhaVIP : Phage VIrion Protein classification based on chaos game representation and Vision Transformer

Research output: Chapters, Conference Papers, Creative and Literary WorksRGC 32 - Refereed conference paper (with host publication)peer-review

View graph of relations

Related Research Unit(s)

Detail(s)

Original languageEnglish
Title of host publicationISMB/ECCB 2023 Proceedings
PublisherOxford University Press
Pagesi30-i39
Publication statusPublished - Jun 2023

Publication series

NameBioinformatics
NumberSupplement_1
Volume39
ISSN (Print)1367-4803
ISSN (electronic)1367-4811

Conference

Title31st International Conference on Intelligent Systems for Molecular Biology and 22nd European Conference on Computational Biology (ISMB/ECCB 2023)
LocationCentre de Congrès de Lyon (in-person & virtual)
PlaceFrance
CityLYON
Period23 - 27 July 2023

Link(s)

Abstract

Motivation: As viruses that mainly infect bacteria, phages are key players across a wide range of ecosystems. Analyzing phage proteins is indispensable for understanding phages' functions and roles in microbiomes. High-throughput sequencing enables us to obtain phages in different microbiomes with low cost. However, compared to the fast accumulation of newly identified phages, phage protein classification remains difficult. In particular, a fundamental need is to annotate virion proteins, the structural proteins, such as major tail, baseplate, etc. Although there are experimental methods for virion protein identification, they are too expensive or time-consuming, leaving a large number of proteins unclassified. Thus, there is a great demand to develop a computational method for fast and accurate phage virion protein (PVP) classification.

Results: In this work, we adapted the state-of-the-art image classification model, Vision Transformer, to conduct virion protein classification. By encoding protein sequences into unique images using chaos game representation, we can leverage Vision Transformer to learn both local and global features from sequence "images". Our method, PhaVIP, has two main functions: classifying PVP and non-PVP sequences and annotating the types of PVP, such as capsid and tail. We tested PhaVIP on several datasets with increasing difficulty and benchmarked it against alternative tools. The experimental results show that PhaVIP has superior performance. After validating the performance of PhaVIP, we investigated two applications that can use the output of PhaVIP: phage taxonomy classification and phage host prediction. The results showed the benefit of using classified proteins over all proteins.

Availability and implementation: The web server of PhaVIP is available via: https://phage.ee.cityu.edu.hk/phavip. The source code of PhaVIP is available via: https://github.com/KennthShang/PhaVIP. 

© The Author(s) 2023. Published by Oxford University Press.

Research Area(s)

  • Bacteriophages, Virion, Amino Acid Sequence, Benchmarking, Microbiota

Citation Format(s)

PhaVIP: Phage VIrion Protein classification based on chaos game representation and Vision Transformer. / Shang, Jiayu; Peng, Cheng; Tang, Xubo et al.
ISMB/ECCB 2023 Proceedings. Oxford University Press, 2023. p. i30-i39 (Bioinformatics; Vol. 39, No. Supplement_1).

Research output: Chapters, Conference Papers, Creative and Literary WorksRGC 32 - Refereed conference paper (with host publication)peer-review

Download Statistics

No data available