A binning tool to reconstruct viral haplotypes from assembled contigs
Research output: Journal Publications and Reviews (RGC: 21, 22, 62) › 21_Publication in refereed journal › peer-review
Author(s)
Related Research Unit(s)
Detail(s)
Original language | English |
---|---|
Article number | 544 |
Journal / Publication | BMC Bioinformatics |
Volume | 20 |
Online published | 4 Nov 2019 |
Publication status | Published - 2019 |
Link(s)
DOI | DOI |
---|---|
Attachment(s) | Documents
Publisher's Copyright Statement
|
Link to Scopus | https://www.scopus.com/record/display.uri?eid=2-s2.0-85074546599&origin=recordpage |
Permanent Link | https://scholars.cityu.edu.hk/en/publications/publication(dcd006a4-55b6-4ed3-9a54-12babea5931d).html |
Abstract
Background: Infections by RNA viruses such as Influenza, HIV still pose a serious threat to human health despiteextensive research on viral diseases. One challenge for producing effective prevention and treatment strategies ishigh intra-species genetic diversity. As different strains may have different biological properties, characterizing thegenetic diversity is thus important to vaccine and drug design. Next-generation sequencing technology enablescomprehensive characterization of both known and novel strains and has been widely adopted for sequencing viralpopulations. However, genome-scale reconstruction of haplotypes is still a challenging problem. In particular,haplotype assembly programs often produce contigs rather than full genomes. As a mutation in one gene can maskthe phenotypic effects of a mutation at another locus, clustering these contigs into genome-scale haplotypes is stillneeded.
Results: We developed a contig binning tool, VirBin, which clusters contigs into different groups so that each grouprepresents a haplotype. Commonly used features based on sequence composition and contig coverage cannoteffectively distinguish viral haplotypes because of their high sequence similarity and heterogeneous sequencingcoverage for RNA viruses. VirBin applied prototype-based clustering to cluster regions that are more likely to containmutations specific to a haplotype. The tool was tested on multiple simulated sequencing data with differenthaplotype abundance distributions and contig sizes, and also on mock quasispecies sequencing data. The benchmarkresults with other contig binning tools demonstrated the superior sensitivity and precision of VirBin in contig binningfor viral haplotype reconstruction.
Conclusions: In this work, we presented VirBin, a new contig binning tool for distinguishing contigs from differentviral haplotypes with high sequence similarity. It competes favorably with other tools on viral contig binning. Thesource codes are available at: https://github.com/chjiao/VirBin.
Results: We developed a contig binning tool, VirBin, which clusters contigs into different groups so that each grouprepresents a haplotype. Commonly used features based on sequence composition and contig coverage cannoteffectively distinguish viral haplotypes because of their high sequence similarity and heterogeneous sequencingcoverage for RNA viruses. VirBin applied prototype-based clustering to cluster regions that are more likely to containmutations specific to a haplotype. The tool was tested on multiple simulated sequencing data with differenthaplotype abundance distributions and contig sizes, and also on mock quasispecies sequencing data. The benchmarkresults with other contig binning tools demonstrated the superior sensitivity and precision of VirBin in contig binningfor viral haplotype reconstruction.
Conclusions: In this work, we presented VirBin, a new contig binning tool for distinguishing contigs from differentviral haplotypes with high sequence similarity. It competes favorably with other tools on viral contig binning. Thesource codes are available at: https://github.com/chjiao/VirBin.
Research Area(s)
- RNA viral haplotype, K-means clustering, Contig binning
Citation Format(s)
A binning tool to reconstruct viral haplotypes from assembled contigs. / Chen, Jiao; Shang, Jiayu; Wang, Jianrong et al.
In: BMC Bioinformatics, Vol. 20, 544, 2019.Research output: Journal Publications and Reviews (RGC: 21, 22, 62) › 21_Publication in refereed journal › peer-review
Download Statistics
No data available