3D protein substructure identification and nearest neighbor search
3D 蛋白質子結構的鑒定和最近鄰查找
Student thesis: Master's Thesis
Author(s)
Related Research Unit(s)
Detail(s)
Awarding Institution | |
---|---|
Supervisors/Advisors |
|
Award date | 4 Oct 2010 |
Link(s)
Permanent Link | https://scholars.cityu.edu.hk/en/theses/theses(10b311c8-6840-47f4-b34c-6185e0b5d93f).html |
---|---|
Other link(s) | Links |
Abstract
Identifying the location of binding sites on proteins is of fundamental importance
for a wide range of applications including molecular docking, de novo drug design,
structure identification and comparison of functional sites. Structural genomics projects
are beginning to produce protein structures with unknown functions. Therefore, efficient
methods are required if all these structures are to be properly annotated. Identifying the
interface between two interacting proteins provides important clues to the function of a
protein and can reduce the search space required by docking algorithms to predict the
structures of complexes. Here we develop a software package for identifying similar
three-dimensional (3D) protein substructures via direct comparison of 3D structures.
We develop an efficient heuristic algorithm for finding protein 3D substructures in a
3D protein structure that are similar to a given 3D protein substructure. This algorithm
can also be used for searching a database of 3D protein structures. We also design an algorithm
for finding similar 3D substructures from two given 3D protein structures. Our
approach is to directly compare the substructures by computing a rigid transformation
such that the distance between a pair of corresponding points in the matched substructures
is bounded by a given value d. We propose an efficient local search heuristic approach
for finding an approximate rigid transformation. We implement the algorithms
and produce a software package that works well in practice. Experiments show that our
software package can find pairs of similar 3D substructures in reasonable time. We also
provide graphical interface to allow users to view the pairs of 3D protein substructures
from different angles.
Finding the closest objective for a query in a database is a classical problem in computer science. For some modern biological applications, computing the similarity between
two objects might be very time consuming. For example, it takes a long time to
compute the edit distance between two whole chromosomes and the alignment cost of
two 3D protein structures. So we study the nearest neighbor search problem in metric
space, where the pair-wise distance between two objects in the database is known
and we want to minimize the number of distances computed on-line between the query
and objects in the database in order to find the closest object. We have designed two
randomized approaches for indexing metric space databases, where objects are purely
described by their distances with each other. Analysis and experiments show that our
approaches need to compute O(log n) objects in order to find the closest object, where
n is the number of objects in the database.
- Proteins, Data processing, Analysis