A binning tool to reconstruct viral haplotypes from assembled contigs

Research output: Journal Publications and Reviews (RGC: 21, 22, 62)21_Publication in refereed journal

View graph of relations

Author(s)

Related Research Unit(s)

Detail(s)

Original languageEnglish
Article number544
Journal / PublicationBMC Bioinformatics
Volume20
Online published4 Nov 2019
Publication statusPublished - 2019

Abstract

Background: Infections by RNA viruses such as Influenza, HIV still pose a serious threat to human health despiteextensive research on viral diseases. One challenge for producing effective prevention and treatment strategies ishigh intra-species genetic diversity. As different strains may have different biological properties, characterizing thegenetic diversity is thus important to vaccine and drug design. Next-generation sequencing technology enablescomprehensive characterization of both known and novel strains and has been widely adopted for sequencing viralpopulations. However, genome-scale reconstruction of haplotypes is still a challenging problem. In particular,haplotype assembly programs often produce contigs rather than full genomes. As a mutation in one gene can maskthe phenotypic effects of a mutation at another locus, clustering these contigs into genome-scale haplotypes is stillneeded. 

Results: We developed a contig binning tool, VirBin, which clusters contigs into different groups so that each grouprepresents a haplotype. Commonly used features based on sequence composition and contig coverage cannoteffectively distinguish viral haplotypes because of their high sequence similarity and heterogeneous sequencingcoverage for RNA viruses. VirBin applied prototype-based clustering to cluster regions that are more likely to containmutations specific to a haplotype. The tool was tested on multiple simulated sequencing data with differenthaplotype abundance distributions and contig sizes, and also on mock quasispecies sequencing data. The benchmarkresults with other contig binning tools demonstrated the superior sensitivity and precision of VirBin in contig binningfor viral haplotype reconstruction. 

Conclusions: In this work, we presented VirBin, a new contig binning tool for distinguishing contigs from differentviral haplotypes with high sequence similarity. It competes favorably with other tools on viral contig binning. Thesource codes are available at: https://github.com/chjiao/VirBin.

Research Area(s)

  • RNA viral haplotype, K-means clustering, Contig binning