TY - GEN
T1 - mapAlign
T2 - 16th International Symposium on Bioinformatics Research and Applications (ISBRA 2020)
AU - Yang, Wen
AU - Wang, Lusheng
PY - 2020/12
Y1 - 2020/12
N2 - Long reads play an important role for the identification of structural variants, sequencing repetitive regions, phasing of alleles, etc. In this paper, we propose a new approach for mapping long reads to reference genomes. We also propose a new method to generate accurate alignments of the long reads and the corresponding segments of reference genome. The new mapping algorithm is based on the longest common sub-sequence with distance constraints. The new (local) alignment algorithms is based on the idea of recursive alignment of variable size k-mers. Experiments show that our new method can generate better alignments in terms of both identity and alignment scores for both Nanopore and SMRT data sets. In particular, our method can align 91.53% and 85.36% of letters on reads to identical letters on reference genomes for human individuals of Nanopore and SMRT data sets, respectively. The state-of-the-art method can only align 88.44% and 79.08% letters of reads for Nanopore and SMRT data sets, respectively. Our method is also faster than the state-of-the-art method.
AB - Long reads play an important role for the identification of structural variants, sequencing repetitive regions, phasing of alleles, etc. In this paper, we propose a new approach for mapping long reads to reference genomes. We also propose a new method to generate accurate alignments of the long reads and the corresponding segments of reference genome. The new mapping algorithm is based on the longest common sub-sequence with distance constraints. The new (local) alignment algorithms is based on the idea of recursive alignment of variable size k-mers. Experiments show that our new method can generate better alignments in terms of both identity and alignment scores for both Nanopore and SMRT data sets. In particular, our method can align 91.53% and 85.36% of letters on reads to identical letters on reference genomes for human individuals of Nanopore and SMRT data sets, respectively. The state-of-the-art method can only align 88.44% and 79.08% letters of reads for Nanopore and SMRT data sets, respectively. Our method is also faster than the state-of-the-art method.
KW - LCS with distance constraints
KW - Local alignment of long reads
KW - Long read mapping
KW - Variable length k-mer alignment
KW - LCS with distance constraints
KW - Local alignment of long reads
KW - Long read mapping
KW - Variable length k-mer alignment
KW - LCS with distance constraints
KW - Local alignment of long reads
KW - Long read mapping
KW - Variable length k-mer alignment
UR - http://www.scopus.com/inward/record.url?scp=85090095277&partnerID=8YFLogxK
UR - https://www.scopus.com/record/pubmetrics.uri?eid=2-s2.0-85090095277&origin=recordpage
U2 - 10.1007/978-3-030-57821-3_10
DO - 10.1007/978-3-030-57821-3_10
M3 - RGC 32 - Refereed conference paper (with host publication)
SN - 9783030578206
T3 - Lecture Notes in Computer Science
SP - 105
EP - 118
BT - Bioinformatics Research and Applications - 16th International Symposium, ISBRA 2020, Proceedings
A2 - Cai, Zhipeng
A2 - Mandoiu, Ion
A2 - Narasimhan, Giri
PB - Springer
Y2 - 1 December 2020 through 4 December 2020
ER -