Abstract
Background: Infections by RNA viruses such as Influenza, HIV still pose a serious threat to human health despiteextensive research on viral diseases. One challenge for producing effective prevention and treatment strategies ishigh intra-species genetic diversity. As different strains may have different biological properties, characterizing thegenetic diversity is thus important to vaccine and drug design. Next-generation sequencing technology enablescomprehensive characterization of both known and novel strains and has been widely adopted for sequencing viralpopulations. However, genome-scale reconstruction of haplotypes is still a challenging problem. In particular,haplotype assembly programs often produce contigs rather than full genomes. As a mutation in one gene can maskthe phenotypic effects of a mutation at another locus, clustering these contigs into genome-scale haplotypes is stillneeded.
Results: We developed a contig binning tool, VirBin, which clusters contigs into different groups so that each grouprepresents a haplotype. Commonly used features based on sequence composition and contig coverage cannoteffectively distinguish viral haplotypes because of their high sequence similarity and heterogeneous sequencingcoverage for RNA viruses. VirBin applied prototype-based clustering to cluster regions that are more likely to containmutations specific to a haplotype. The tool was tested on multiple simulated sequencing data with differenthaplotype abundance distributions and contig sizes, and also on mock quasispecies sequencing data. The benchmarkresults with other contig binning tools demonstrated the superior sensitivity and precision of VirBin in contig binningfor viral haplotype reconstruction.
Conclusions: In this work, we presented VirBin, a new contig binning tool for distinguishing contigs from differentviral haplotypes with high sequence similarity. It competes favorably with other tools on viral contig binning. Thesource codes are available at: https://github.com/chjiao/VirBin.
Results: We developed a contig binning tool, VirBin, which clusters contigs into different groups so that each grouprepresents a haplotype. Commonly used features based on sequence composition and contig coverage cannoteffectively distinguish viral haplotypes because of their high sequence similarity and heterogeneous sequencingcoverage for RNA viruses. VirBin applied prototype-based clustering to cluster regions that are more likely to containmutations specific to a haplotype. The tool was tested on multiple simulated sequencing data with differenthaplotype abundance distributions and contig sizes, and also on mock quasispecies sequencing data. The benchmarkresults with other contig binning tools demonstrated the superior sensitivity and precision of VirBin in contig binningfor viral haplotype reconstruction.
Conclusions: In this work, we presented VirBin, a new contig binning tool for distinguishing contigs from differentviral haplotypes with high sequence similarity. It competes favorably with other tools on viral contig binning. Thesource codes are available at: https://github.com/chjiao/VirBin.
Original language | English |
---|---|
Article number | 544 |
Journal | BMC Bioinformatics |
Volume | 20 |
Online published | 4 Nov 2019 |
DOIs | |
Publication status | Published - 2019 |
Research Keywords
- RNA viral haplotype
- K-means clustering
- Contig binning
Publisher's Copyright Statement
- This full text is made available under CC-BY 4.0. https://creativecommons.org/licenses/by/4.0/