Genome Assembly and Haplotyping Based on stLFT Data

Project: Research

View graph of relations


With the appearance of the Sanger sequencing, the complete genome sequencing becomes possible and the high-throughput of next-generation sequencing(NGS) technology largely increases the cost and time efficiency. However, it is inefficient for NGS with short reads whose lengths are typically hundreds of base pairs to correctly distinguish repetitive sequences from read overlaps and complex structural variations (SVs). Recently, a new technology called the single-tube long fragment read (stLFR) was published. stLFR adds barcodes to identify sub-fragments from the same original long DNA molecules and then performs the sequencing with NGS technology. With billions of unique barcodes, stLFR has been proved to be able to help high-quality downstream analysis including the de novo genome assembly, variant calling and long-range haplotyping. Furthermore, it is highly cost-effective since it utilizes the NGS to generate sequencing reads. In this project, we will take advantage of stLFR information and develop an integrated software, aiming to perform the human genome assembly and haplotyping. The method will largely increase the assembly quality with contigs larger than 200kb, scaffolds larger than 50Mb and 95% coverage of reference genome. Also the method will largely increase the accuracy of SNV haplotyping with a maximum error rate of 1 in 10 million. The software will achieve the whole genome assembly in one day with computing cost no more than 200 HKD and provide various downstream summary reports.


Project number9440236
Grant typeITF
Effective start/end date1/08/20 → …