Better ILP-Based Approaches to Haplotype Assembly

Research output: Journal Publications and ReviewsRGC 21 - Publication in refereed journalpeer-review

4 Scopus Citations
View graph of relations

Author(s)

Related Research Unit(s)

Detail(s)

Original languageEnglish
Pages (from-to)537-552
Journal / PublicationJournal of Computational Biology
Volume23
Issue number7
Online published27 Jun 2016
Publication statusPublished - Jul 2016

Abstract

Haplotype assembly is to directly construct the haplotypes of an individual from sequence fragments (reads) of the individual. Although a number of programs have been designed for computing optimal or heuristic solutions to the haplotype assembly problem, computing an optimal solution may take days or even months while computing a heuristic solution usually requires a trade-off between speed and accuracy. This article refines a previously known integer linear programming-based (ILP-based) approach to the haplotype assembly problem in twofolds. First, the read-matrices of some datasets (such as NA12878) come with a quality for each base in the reads. We here propose to utilize the qualities in the ILP-based approach. Secondly, we propose to use the ILP-based approach to improve the output of any heuristic program for the problem. Experiments with both real and simulated datasets show that the qualities of read-matrices help us find more accurate solutions without significant loss of speed. Moreover, our experimental results show that the proposed hybrid approach improves the output of ReFHap (the current leading heuristic) significantly (say, by almost 25% of the QAN50 score) without significant loss of speed, and can even find optimal solutions in much shorter time than the original ILP-based approach. Our program is available upon request to the authors.

Research Area(s)

  • diploid, haplotype assembly, integer linear programming

Citation Format(s)

Better ILP-Based Approaches to Haplotype Assembly. / CHEN, Zhi-Zhong; DENG, Fei; SHEN, Chao et al.
In: Journal of Computational Biology, Vol. 23, No. 7, 07.2016, p. 537-552.

Research output: Journal Publications and ReviewsRGC 21 - Publication in refereed journalpeer-review