A binning tool to reconstruct viral haplotypes from assembled contigs

Jiao Chen, Jiayu Shang, Jianrong Wang, Yanni Sun*

*Corresponding author for this work

Research output: Journal Publications and ReviewsRGC 21 - Publication in refereed journalpeer-review

1 Citation (Scopus)
38 Downloads (CityUHK Scholars)

Abstract

Background: Infections by RNA viruses such as Influenza, HIV still pose a serious threat to human health despiteextensive research on viral diseases. One challenge for producing effective prevention and treatment strategies ishigh intra-species genetic diversity. As different strains may have different biological properties, characterizing thegenetic diversity is thus important to vaccine and drug design. Next-generation sequencing technology enablescomprehensive characterization of both known and novel strains and has been widely adopted for sequencing viralpopulations. However, genome-scale reconstruction of haplotypes is still a challenging problem. In particular,haplotype assembly programs often produce contigs rather than full genomes. As a mutation in one gene can maskthe phenotypic effects of a mutation at another locus, clustering these contigs into genome-scale haplotypes is stillneeded. 

Results: We developed a contig binning tool, VirBin, which clusters contigs into different groups so that each grouprepresents a haplotype. Commonly used features based on sequence composition and contig coverage cannoteffectively distinguish viral haplotypes because of their high sequence similarity and heterogeneous sequencingcoverage for RNA viruses. VirBin applied prototype-based clustering to cluster regions that are more likely to containmutations specific to a haplotype. The tool was tested on multiple simulated sequencing data with differenthaplotype abundance distributions and contig sizes, and also on mock quasispecies sequencing data. The benchmarkresults with other contig binning tools demonstrated the superior sensitivity and precision of VirBin in contig binningfor viral haplotype reconstruction. 

Conclusions: In this work, we presented VirBin, a new contig binning tool for distinguishing contigs from differentviral haplotypes with high sequence similarity. It competes favorably with other tools on viral contig binning. Thesource codes are available at: https://github.com/chjiao/VirBin.
Original languageEnglish
Article number544
JournalBMC Bioinformatics
Volume20
Online published4 Nov 2019
DOIs
Publication statusPublished - 2019

Research Keywords

  • RNA viral haplotype
  • K-means clustering
  • Contig binning

Publisher's Copyright Statement

  • This full text is made available under CC-BY 4.0. https://creativecommons.org/licenses/by/4.0/

Fingerprint

Dive into the research topics of 'A binning tool to reconstruct viral haplotypes from assembled contigs'. Together they form a unique fingerprint.

Cite this