Abstract
With advances in library construction protocols and next-generation sequencing technologies, viral metagenomic sequencing has become the major source for novel virus discovery. Conducting taxonomic classification for metagenomic data is an important means to characterize the viral composition in the underlying samples. However, RNA viruses are abundant and highly diverse, jeopardizing the sensitivity of comparison-based classification methods. To improve the sensitivity of read-level taxonomic classification, we developed an RNA-dependent RNA polymerase (RdRp) gene-based read classification tool RdRpBin. It combines alignment-based strategy with machine learning models in order to fully exploit the sequence properties of RdRp. We tested our method and compared its performance with the state-of-the-art tools on the simulated and real sequencing data. RdRpBin competes favorably with all. In particular, when the query RNA viruses share low sequence similarity with the known viruses (∼ 0.4), our tool can still maintain a higher F-score than the state-of-the-art tools. The experimental results on real data also showed that RdRpBin can classify more RNA viral reads with a relatively low false-positive rate. Thus, RdRpBin can be utilized to classify novel and diverged RNA viruses.
| Original language | English |
|---|---|
| Article number | bbac011 |
| Journal | Briefings in Bioinformatics |
| Volume | 23 |
| Issue number | 2 |
| Online published | 7 Feb 2022 |
| DOIs | |
| Publication status | Published - Mar 2022 |
Funding
This work was supported by Hong Kong Research Grants Council (RGC) General Research Fund (GRF) 11206819, Hong Kong Innovation and Technology Fund (ITF) MRP/071/20X, and City University of Hong Kong (9678241).
Research Keywords
- RNA virus
- RNA-dependent RNA polymerase
- Probabilistic Relational Neighbor Classifier
- Graph Neural Network
Publisher's Copyright Statement
- This full text is made available under CC-BY 4.0. https://creativecommons.org/licenses/by/4.0/
RGC Funding Information
- RGC-funded
Fingerprint
Dive into the research topics of 'RdRp-based sensitive taxonomic classification of RNA viruses for metagenomic data'. Together they form a unique fingerprint.Projects
- 2 Finished
-
ITF: Viral Metagenomic Sequencing As A Broad-spectrum Pathogen Detection Technology For Viral Diseases
SUN, Y. (Principal Investigator / Project Coordinator), Shi, M. (Co-Investigator) & Wang, S. (Co-Investigator)
1/04/21 → 31/03/25
Project: Research
-
GRF: Characterizing Quasispecies of Known and Novel Viruses from Metagenomic Data
SUN, Y. (Principal Investigator / Project Coordinator)
1/01/20 → 24/06/24
Project: Research
Cite this
- APA
- Author
- BIBTEX
- Harvard
- Standard
- RIS
- Vancouver