3D protein substructure identification and nearest neighbor search

3D 蛋白質子結構的鑒定和最近鄰查找

Student thesis: Master's Thesis

View graph of relations

Author(s)

  • Yong YANG

Related Research Unit(s)

Detail(s)

Awarding Institution
Supervisors/Advisors
Award date4 Oct 2010

Abstract

Identifying the location of binding sites on proteins is of fundamental importance for a wide range of applications including molecular docking, de novo drug design, structure identification and comparison of functional sites. Structural genomics projects are beginning to produce protein structures with unknown functions. Therefore, efficient methods are required if all these structures are to be properly annotated. Identifying the interface between two interacting proteins provides important clues to the function of a protein and can reduce the search space required by docking algorithms to predict the structures of complexes. Here we develop a software package for identifying similar three-dimensional (3D) protein substructures via direct comparison of 3D structures. We develop an efficient heuristic algorithm for finding protein 3D substructures in a 3D protein structure that are similar to a given 3D protein substructure. This algorithm can also be used for searching a database of 3D protein structures. We also design an algorithm for finding similar 3D substructures from two given 3D protein structures. Our approach is to directly compare the substructures by computing a rigid transformation such that the distance between a pair of corresponding points in the matched substructures is bounded by a given value d. We propose an efficient local search heuristic approach for finding an approximate rigid transformation. We implement the algorithms and produce a software package that works well in practice. Experiments show that our software package can find pairs of similar 3D substructures in reasonable time. We also provide graphical interface to allow users to view the pairs of 3D protein substructures from different angles. Finding the closest objective for a query in a database is a classical problem in computer science. For some modern biological applications, computing the similarity between two objects might be very time consuming. For example, it takes a long time to compute the edit distance between two whole chromosomes and the alignment cost of two 3D protein structures. So we study the nearest neighbor search problem in metric space, where the pair-wise distance between two objects in the database is known and we want to minimize the number of distances computed on-line between the query and objects in the database in order to find the closest object. We have designed two randomized approaches for indexing metric space databases, where objects are purely described by their distances with each other. Analysis and experiments show that our approaches need to compute O(log n) objects in order to find the closest object, where n is the number of objects in the database.

    Research areas

  • Proteins, Data processing, Analysis