TY - GEN
T1 - Finding longest common segments in protein structures in nearly linear time
AU - Ng, Yen Kaow
AU - Ono, Hirotaka
AU - Ge, Ling
AU - Li, Shuai Cheng
PY - 2012
Y1 - 2012
N2 - The Local/Global Alignment (Zemla, 2003), or LGA, is a popular method for the comparison of protein structures. One of the two components of LGA requires us to compute the longest common contiguous segments between two protein structures. That is, given two structures A = (a 1, ..., a n ) and B = (b 1, ..., b n ) where a k , b k ∈ ℝ 3, we are to find, among all the segments f = (a i ,...,a j ) and g = (b i ,...,b j ) that fulfill a certain criterion regarding their similarity, those of the maximum length. We consider the following criteria: (1) the root mean square deviation (RMSD) between f and g is to be within a given t ∈ ℝ; (2) f and g can be superposed such that for each k, i ≤ k ≤ j, ||a k - b k || ≤ t for a given t ∈ . We give an algorithm of time complexity when the first requirement applies, where is the maximum length of the segments fulfilling the criterion. We show an FPTAS which, for any ε∈ ℝ, finds a segment of length at least l, but of RMSD up to (1 + ε)t, in O(nlogn + n/ε) time. We propose an FPTAS which for any given ε∈ ℝ, finds all the segments f and g of the maximum length which can be superposed such that for each k, i ≤ k ≤ j, ||a k - b k || ≤ (1 + ε) t, thus fulfilling the second requirement approximately. The algorithm has a time complexity of O(nlog 2 n/ε 5) when consecutive points in A are separated by the same distance (which is the case with protein structures). © 2012 Springer-Verlag.
AB - The Local/Global Alignment (Zemla, 2003), or LGA, is a popular method for the comparison of protein structures. One of the two components of LGA requires us to compute the longest common contiguous segments between two protein structures. That is, given two structures A = (a 1, ..., a n ) and B = (b 1, ..., b n ) where a k , b k ∈ ℝ 3, we are to find, among all the segments f = (a i ,...,a j ) and g = (b i ,...,b j ) that fulfill a certain criterion regarding their similarity, those of the maximum length. We consider the following criteria: (1) the root mean square deviation (RMSD) between f and g is to be within a given t ∈ ℝ; (2) f and g can be superposed such that for each k, i ≤ k ≤ j, ||a k - b k || ≤ t for a given t ∈ . We give an algorithm of time complexity when the first requirement applies, where is the maximum length of the segments fulfilling the criterion. We show an FPTAS which, for any ε∈ ℝ, finds a segment of length at least l, but of RMSD up to (1 + ε)t, in O(nlogn + n/ε) time. We propose an FPTAS which for any given ε∈ ℝ, finds all the segments f and g of the maximum length which can be superposed such that for each k, i ≤ k ≤ j, ||a k - b k || ≤ (1 + ε) t, thus fulfilling the second requirement approximately. The algorithm has a time complexity of O(nlog 2 n/ε 5) when consecutive points in A are separated by the same distance (which is the case with protein structures). © 2012 Springer-Verlag.
UR - http://www.scopus.com/inward/record.url?scp=84863100656&partnerID=8YFLogxK
UR - https://www.scopus.com/record/pubmetrics.uri?eid=2-s2.0-84863100656&origin=recordpage
U2 - 10.1007/978-3-642-31265-6_27
DO - 10.1007/978-3-642-31265-6_27
M3 - RGC 32 - Refereed conference paper (with host publication)
SN - 9783642312649
VL - 7354 LNCS
T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
SP - 334
EP - 348
BT - Combinatorial Pattern Matching
PB - Springer Verlag
T2 - 23rd Annual Symposium on Combinatorial Pattern Matching, CPM 2012
Y2 - 3 July 2012 through 5 July 2012
ER -