Skip to main navigation Skip to search Skip to main content

mapAlign: An Efficient Approach for Mapping and Aligning Long Reads to Reference Genomes

Wen Yang, Lusheng Wang*

*Corresponding author for this work

Research output: Chapters, Conference Papers, Creative and Literary WorksRGC 32 - Refereed conference paper (with host publication)peer-review

Abstract

Long reads play an important role for the identification of structural variants, sequencing repetitive regions, phasing of alleles, etc. In this paper, we propose a new approach for mapping long reads to reference genomes. We also propose a new method to generate accurate alignments of the long reads and the corresponding segments of reference genome. The new mapping algorithm is based on the longest common sub-sequence with distance constraints. The new (local) alignment algorithms is based on the idea of recursive alignment of variable size k-mers. Experiments show that our new method can generate better alignments in terms of both identity and alignment scores for both Nanopore and SMRT data sets. In particular, our method can align 91.53% and 85.36% of letters on reads to identical letters on reference genomes for human individuals of Nanopore and SMRT data sets, respectively. The state-of-the-art method can only align 88.44% and 79.08% letters of reads for Nanopore and SMRT data sets, respectively. Our method is also faster than the state-of-the-art method.
Original languageEnglish
Title of host publicationBioinformatics Research and Applications - 16th International Symposium, ISBRA 2020, Proceedings
EditorsZhipeng Cai, Ion Mandoiu, Giri Narasimhan
PublisherSpringer 
Pages105-118
ISBN (Electronic)9783030578213
ISBN (Print)9783030578206
DOIs
Publication statusPublished - Dec 2020
Event16th International Symposium on Bioinformatics Research and Applications (ISBRA 2020) - Moscow, Russian Federation
Duration: 1 Dec 20204 Dec 2020

Publication series

NameLecture Notes in Computer Science
Volume12304
ISSN (Print)0302-9743
ISSN (Electronic)1611-3349

Conference

Conference16th International Symposium on Bioinformatics Research and Applications (ISBRA 2020)
PlaceRussian Federation
CityMoscow
Period1/12/204/12/20

Research Keywords

  • LCS with distance constraints
  • Local alignment of long reads
  • Long read mapping
  • Variable length k-mer alignment

Fingerprint

Dive into the research topics of 'mapAlign: An Efficient Approach for Mapping and Aligning Long Reads to Reference Genomes'. Together they form a unique fingerprint.

Cite this