TY - GEN
T1 - Aligning sequences via an evolutionary tree
T2 - Proceedings of the 26th Annual ACM Symposium on the Theory of Computing
AU - Jiang, Tao
AU - Lawler, Eugene L.
AU - Wang, Lusheng
PY - 1994
Y1 - 1994
N2 - We study the following fundamental problem in computational molecular biology: Given a set of DNA sequences representing some species and a phylogenetic tree depicting the ancestral relationship among these species, compute an optimal alignment of the sequences by the means of constructing a minimum-cost evolutionary tree. The problem is an important variant of multiple sequence alignment, and is widely known as tree alignment. A more generalized version of the problem, called generalized tree alignment in this paper, is that we are given the DNA sequences only and still have to construct a minimum-cost evolutionary tree. The paper presents some hardness results as well as approximation algorithms. It is shown that tree alignment is NP-hard and generalized tree alignment is MAX SNP-hard. On the positive side, we design an efficient approximation algorithm with performance ratio 2 for tree alignment. The algorithm is then extended to a polynomial-time approximation scheme. The construction actually works for Steiner trees in any metric space, and thus implies a polynomial-time approximation scheme for planar Steiner trees under a given topology (with any constant degree). To our knowledge, this is the first polynomial-time approximation scheme in the fields of computational biology and Steiner trees. The contrast between the approximability of tree alignment and generalized tree alignment shows that a phylogenetic tree can indeed help in multiple alignment. The approximation algorithms may be useful in evolutionary genetics practice as they can provide a good initial alignment for the iterative method in [24].
AB - We study the following fundamental problem in computational molecular biology: Given a set of DNA sequences representing some species and a phylogenetic tree depicting the ancestral relationship among these species, compute an optimal alignment of the sequences by the means of constructing a minimum-cost evolutionary tree. The problem is an important variant of multiple sequence alignment, and is widely known as tree alignment. A more generalized version of the problem, called generalized tree alignment in this paper, is that we are given the DNA sequences only and still have to construct a minimum-cost evolutionary tree. The paper presents some hardness results as well as approximation algorithms. It is shown that tree alignment is NP-hard and generalized tree alignment is MAX SNP-hard. On the positive side, we design an efficient approximation algorithm with performance ratio 2 for tree alignment. The algorithm is then extended to a polynomial-time approximation scheme. The construction actually works for Steiner trees in any metric space, and thus implies a polynomial-time approximation scheme for planar Steiner trees under a given topology (with any constant degree). To our knowledge, this is the first polynomial-time approximation scheme in the fields of computational biology and Steiner trees. The contrast between the approximability of tree alignment and generalized tree alignment shows that a phylogenetic tree can indeed help in multiple alignment. The approximation algorithms may be useful in evolutionary genetics practice as they can provide a good initial alignment for the iterative method in [24].
UR - https://www.scopus.com/pages/publications/0027961628
UR - https://www.scopus.com/record/pubmetrics.uri?eid=2-s2.0-0027961628&origin=recordpage
M3 - RGC 32 - Refereed conference paper (with host publication)
SN - 897916638
SP - 760
EP - 769
BT - Conference Proceedings of the Annual ACM Symposium on Theory of Computing
PB - Association for Computing Machinery
Y2 - 23 May 1994 through 25 May 1994
ER -