SpecHap : a diploid phasing algorithm based on spectral graph theory

Research output: Journal Publications and ReviewsRGC 21 - Publication in refereed journalpeer-review

6 Scopus Citations
View graph of relations

Related Research Unit(s)

Detail(s)

Original languageEnglish
Article numbere114
Journal / PublicationNucleic Acids Research
Volume49
Issue number19
Online published17 Aug 2021
Publication statusPublished - 8 Nov 2021

Link(s)

Abstract

Haplotype phasing plays an important role in understanding the genetic data of diploid eukaryotic organisms. Different sequencing technologies (such as next-generation sequencing or third-generation sequencing) produce various genetic data that require haplotype assembly. Although multiple diploid haplotype phasing algorithms exist, only a few will work equally well across all sequencing technologies. In this work, we propose SpecHap, a novel haplotype assembly tool that leverages spectral graph theory. On both in silico and whole-genome sequencing datasets, SpecHap consumed less memory and required less CPU time, yet achieved comparable accuracy with state-of-art methods across all the test instances, which comprises sequencing data from next-generation sequencing, linked-reads, high-throughput chromosome conformation capture, PacBio single-molecule real-time, and Oxford Nanopore long-reads. Furthermore, SpecHap successfully phased an individual Ambystoma mexicanum, a species with gigantic diploid genomes, within 6 CPU hours and 945MB peak memory usage, while other tools failed to yield results either due to memory overflow (40GB) or time limit exceeded (5 days). Our results demonstrated that SpecHap is scalable, efficient, and accurate for diploid phasing across many sequencing platforms.

Research Area(s)

Download Statistics

No data available