Utilizing profile hidden Markov model databases for discovering viruses from metagenomic data: a comprehensive review

Runzhou Yu, Ziyi Huang, Theo Y.C. Lam, Yanni Sun*

*Corresponding author for this work

Research output: Journal Publications and ReviewsRGC 21 - Publication in refereed journalpeer-review

2 Citations (Scopus)
26 Downloads (CityUHK Scholars)

Abstract

Profile hidden Markov models (pHMMs) are able to achieve high sensitivity in remote homology search, making them popular choices for detecting novel or highly diverged viruses in metagenomic data. However, many existing pHMM databases have different design focuses, making it difficult for users to decide the proper one to use. In this review, we provide a thorough evaluation and comparison for multiple commonly used profile HMM databases for viral sequence discovery in metagenomic data. We characterized the databases by comparing their sizes, their taxonomic coverage, and the properties of their models using quantitative metrics. Subsequently, we assessed their performance in virus identification across multiple application scenarios, utilizing both simulated and real metagenomic data. We aim to offer researchers a thorough and critical assessment of the strengths and limitations of different databases. Furthermore, based on the experimental results obtained from the simulated and real metagenomic data, we provided practical suggestions for users to optimize their use of pHMM databases, thus enhancing the quality and reliability of their findings in the field of viral metagenomics. © The Author(s) 2024.
Original languageEnglish
Article numberbbae292
JournalBriefings in Bioinformatics
Volume25
Issue number4
Online published20 Jun 2024
DOIs
Publication statusPublished - Jul 2024

Funding

This work was supported by Hong Kong Research Grants Council (RGC) General Research Fund (GRF) [11206819,11217521] and Hong Kong Innovation and Technology Fund (ITF) [MRP/071/20X]

Research Keywords

  • profile hidden Markov models
  • virus detection
  • metagenomic data

Publisher's Copyright Statement

  • This full text is made available under CC-BY 4.0. https://creativecommons.org/licenses/by/4.0/

Fingerprint

Dive into the research topics of 'Utilizing profile hidden Markov model databases for discovering viruses from metagenomic data: a comprehensive review'. Together they form a unique fingerprint.

Cite this