TY - JOUR
T1 - Near optimal multiple alignment within a band in polynomial time
AU - Ma, Bin
AU - Wang, Lusheng
AU - Li, Ming
PY - 2007/9
Y1 - 2007/9
N2 - Multiple sequence alignment is a fundamental problem in computational biology. Because of its notorious difficulties, aligning sequences within a constant band (c-diagonal) is a popular practice in bioinformatics with good practical results [D. Sankoff, J. Kruskal, Time Warps, String Edits, and Macromolecules: The Theory and Practice of Sequence Comparison, Addison-Wesley, 1983; W.R. Pearson, D. Lipman, Improved tools for biological sequence comparison, Proc. Natl. Acad. Sci. USA 85 (1988) 2444-2448; W.R. Pearson, Rapid and sensitive sequence comparison with FASTP and FASTA, Methods Enzymol. 183 (1990) 63-98; W.R. Pearson, Searching protein sequence libraries: Comparison of the sensitivity and selectivity of the Smith-Waterman and FASTA algorithms, Genomics 11 (1991) 635-650; S. Altschul, D. Lipman, Trees, stars, and multiple sequence alignment, SIAM J. Appl. Math. 49 (1989) 197-209; K. Chao, W.R. Pearson, W. Miller, Aligning two sequences within a specified diagonal band, CABIOS 8 (1992) 481-487; J.W. Fickett, Fast optimal alignment, Nucleic Acids Res. 12 (1984) 175-180; E. Ukkonen, Algorithms for approximate string matching, Inform. Control 64 (1985) 100-118; J.L. Spouge, Fast optimal alignment, CABIOS 7 (1991) 1-7]. However, the problem is still NP-hard for multiple sequences. In this paper, we present a theoretical study of this problem. In particular, for arbitrarily small ε > 0, we present polynomial time algorithms that produce a multiple alignment (not necessarily c-diagonal) with cost at most 1 + ε times the cost of the optimal c-diagonal alignment, under standard models of both SP alignment and consensus (star) alignment. Our algorithms for consensus alignment allow very general score schemes. In order to prove our main results, we also present similar results for SP alignment and consensus alignment, allowing only constant number of insertion and deletion gaps (of arbitrary length) per sequence on the average. These results are interesting in their own rights and they improve some results in [M. Li, B. Ma, L. Wang, Finding similar regions in many sequences, in: Proc. 31st ACM Symp. Theory of Computing, Atlanta, 1999, pp. 473-482].
AB - Multiple sequence alignment is a fundamental problem in computational biology. Because of its notorious difficulties, aligning sequences within a constant band (c-diagonal) is a popular practice in bioinformatics with good practical results [D. Sankoff, J. Kruskal, Time Warps, String Edits, and Macromolecules: The Theory and Practice of Sequence Comparison, Addison-Wesley, 1983; W.R. Pearson, D. Lipman, Improved tools for biological sequence comparison, Proc. Natl. Acad. Sci. USA 85 (1988) 2444-2448; W.R. Pearson, Rapid and sensitive sequence comparison with FASTP and FASTA, Methods Enzymol. 183 (1990) 63-98; W.R. Pearson, Searching protein sequence libraries: Comparison of the sensitivity and selectivity of the Smith-Waterman and FASTA algorithms, Genomics 11 (1991) 635-650; S. Altschul, D. Lipman, Trees, stars, and multiple sequence alignment, SIAM J. Appl. Math. 49 (1989) 197-209; K. Chao, W.R. Pearson, W. Miller, Aligning two sequences within a specified diagonal band, CABIOS 8 (1992) 481-487; J.W. Fickett, Fast optimal alignment, Nucleic Acids Res. 12 (1984) 175-180; E. Ukkonen, Algorithms for approximate string matching, Inform. Control 64 (1985) 100-118; J.L. Spouge, Fast optimal alignment, CABIOS 7 (1991) 1-7]. However, the problem is still NP-hard for multiple sequences. In this paper, we present a theoretical study of this problem. In particular, for arbitrarily small ε > 0, we present polynomial time algorithms that produce a multiple alignment (not necessarily c-diagonal) with cost at most 1 + ε times the cost of the optimal c-diagonal alignment, under standard models of both SP alignment and consensus (star) alignment. Our algorithms for consensus alignment allow very general score schemes. In order to prove our main results, we also present similar results for SP alignment and consensus alignment, allowing only constant number of insertion and deletion gaps (of arbitrary length) per sequence on the average. These results are interesting in their own rights and they improve some results in [M. Li, B. Ma, L. Wang, Finding similar regions in many sequences, in: Proc. 31st ACM Symp. Theory of Computing, Atlanta, 1999, pp. 473-482].
KW - Multiple sequence alignment within a band
KW - Polynomial time approximation algorithms
UR - http://www.scopus.com/inward/record.url?scp=34250222735&partnerID=8YFLogxK
UR - https://www.scopus.com/record/pubmetrics.uri?eid=2-s2.0-34250222735&origin=recordpage
U2 - 10.1016/j.jcss.2007.03.012
DO - 10.1016/j.jcss.2007.03.012
M3 - RGC 21 - Publication in refereed journal
SN - 0022-0000
VL - 73
SP - 997
EP - 1011
JO - Journal of Computer and System Sciences
JF - Journal of Computer and System Sciences
IS - 6
ER -