De novo haplotype reconstruction in viral quasispecies using paired-end read guided path finding

Research output: Journal Publications and Reviews (RGC: 21, 22, 62)21_Publication in refereed journalpeer-review

48 Scopus Citations
View graph of relations



Original languageEnglish
Pages (from-to)2927-2935
Journal / PublicationBioinformatics
Issue number17
Online published3 Apr 2018
Publication statusPublished - 1 Sep 2018
Externally publishedYes


Motivation: RNA virus populations contain different but genetically related strains, all infecting an individual host. Reconstruction of the viral haplotypes is a fundamental step to characterize the virus population, predict their viral phenotypes and finally provide important information for clinical treatment and prevention. Advances of the next-generation sequencing technologies open up new opportunities to assemble full-length haplotypes. However, error-prone short reads, high similarities between related strains, an unknown number of haplotypes pose computational challenges for reference-free haplotype reconstruction. There is still much room to improve the performance of existing haplotype assembly tools. 
Results: In this work, we developed a de novo haplotype reconstruction tool named PEHaplo, which employs paired-end reads to distinguish highly similar strains for viral quasispecies data. It was applied on both simulated and real quasispecies data, and the results were benchmarked against several recently published de novo haplotype reconstruction tools. The comparison shows that PEHaplo outperforms the benchmarked tools in a comprehensive set of metrics. 
Availability and implementation: The source code and the documentation of PEHaplo are available at