Designing seeds for similarity search in genomic DNA

Research output: Journal Publications and Reviews (RGC: 21, 22, 62)21_Publication in refereed journalpeer-review

45 Scopus Citations
View graph of relations

Author(s)

Detail(s)

Original languageEnglish
Pages (from-to)342-363
Journal / PublicationJournal of Computer and System Sciences
Volume70
Issue number3
Publication statusPublished - May 2005
Externally publishedYes

Abstract

Large-scale comparison of genomic DNA is of fundamental importance in annotating functional elements of genomes. To perform large comparisons efficiently, BLAST (Methods: Companion Methods Enzymol 266 (1996) 460, J. Mol. Biol. 215 (1990) 403, Nucleic Acids Res. 25(17) (1997) 3389) and other widely used tools use seeded alignment, which compares only sequences that can be shown to share a common pattern or "seed'' of matching bases. The literature suggests that the choice of seed substantially affects the sensitivity of seeded alignment, but designing and evaluating seeds is computationally challenging. This work addresses the problem of designing a seed to optimize performance of seeded alignment. We give a fast, simple algorithm based on finite automata for evaluating the sensitivity of a seed in a Markov model of ungapped alignments, along with extensions to mixtures and inhomogeneous Markov models. We give intuition and theoretical results on which seeds are good choices. Finally, we describe Mandala, a software tool for seed design, and show that it can be used to improve the sensitivity of alignment in practice. © Published by Elsevier Inc.

Research Area(s)

  • Biosequence comparison, Genomic DNA, Mandala, Seeded alignment, String matching

Bibliographic Note

Publication details (e.g. title, author(s), publication statuses and dates) are captured on an “AS IS” and “AS AVAILABLE” basis at the time of record harvesting from the data source. Suggestions for further amendments or supplementary information can be sent to lbscholars@cityu.edu.hk.

Citation Format(s)

Designing seeds for similarity search in genomic DNA. / Buhler, Jeremy; Keich, Uri; Sun, Yanni.

In: Journal of Computer and System Sciences, Vol. 70, No. 3, 05.2005, p. 342-363.

Research output: Journal Publications and Reviews (RGC: 21, 22, 62)21_Publication in refereed journalpeer-review