TY - JOUR
T1 - On the closest string and substring problems
AU - LI, Ming
AU - MA, Bin
AU - WANG, Lusheng
PY - 2002/3
Y1 - 2002/3
N2 - The problem of finding a center string that is "close" to every given string arises in computational molecular biology and coding theory. This problem has two versions: the Closest String problem and the Closest Substring problem. Given a set of strings S = {s1, s2, . . ., sn}, each of length m, the Closest String problem is to find the smallest d and a string s of length m which is within Hamming distance d to each si ∈ S. This problem comes from coding theory when we are looking for a code not too far away from a given set of codes. Closest Substring problem, with an additional input integer L, asks for the smallest d and a string s, of length L, which is within Hamming distance d away from a substring, of length L, of each si. This problem is much more elusive than the Closest String problem. The Closest Substring problem is formulated from applications in finding conserved regions, identifying genetic drug targets and generating genetic probes in molecular biology. Whether there are efficient approximation algorithms for both problems are major open questions in this area. We present two polynomial time approxmation algorithms with approximation ratio 1 + ε for any small ε to settle both questions. © 2002 ACM.
AB - The problem of finding a center string that is "close" to every given string arises in computational molecular biology and coding theory. This problem has two versions: the Closest String problem and the Closest Substring problem. Given a set of strings S = {s1, s2, . . ., sn}, each of length m, the Closest String problem is to find the smallest d and a string s of length m which is within Hamming distance d to each si ∈ S. This problem comes from coding theory when we are looking for a code not too far away from a given set of codes. Closest Substring problem, with an additional input integer L, asks for the smallest d and a string s, of length L, which is within Hamming distance d away from a substring, of length L, of each si. This problem is much more elusive than the Closest String problem. The Closest Substring problem is formulated from applications in finding conserved regions, identifying genetic drug targets and generating genetic probes in molecular biology. Whether there are efficient approximation algorithms for both problems are major open questions in this area. We present two polynomial time approxmation algorithms with approximation ratio 1 + ε for any small ε to settle both questions. © 2002 ACM.
KW - Closest string and substring
KW - computer applications
KW - polynomial-time approximation scheme
UR - http://www.scopus.com/inward/record.url?scp=0141653290&partnerID=8YFLogxK
UR - https://www.scopus.com/record/pubmetrics.uri?eid=2-s2.0-0141653290&origin=recordpage
U2 - 10.1145/506147.506150
DO - 10.1145/506147.506150
M3 - RGC 21 - Publication in refereed journal
SN - 0004-5411
VL - 49
SP - 157
EP - 171
JO - Journal of the ACM
JF - Journal of the ACM
IS - 2
ER -