Succinct Modeling of Contacts in Protein Domains for Structural Alignment, Classification and Prediction
Project: Research
Researcher(s)
- Shuaicheng LI (Principal Investigator / Project Coordinator)Department of Computer Science
- Ming Li (Co-Investigator)
Description
The functions and evolution of proteins are often studied through their molecular structures. Advances in these studies depend heavily on the accuracy, efficiency, and suitability of methods for structural comparison and classification. In these methods, a protein molecule is usually modeled as a collection of 3D coordinates or a distance matrix. Current approaches to compare such models suffer from high time complexities and lack of performance guarantee.Recently, feature extraction methods have allowed more succinct representation of protein structures (through, for instance, a library of structural fragments) to enable more efficient algorithms for comparing proteins. The efficiency, however, came at a price. These representations have limited expressiveness in representing the constraints between residues that are non-local. In this project, we propose and study an efficient representation which is set out to encode the order-independent structural relationships which exist across multiple fragments. The model performed outstandingly in our initial study of applying it in several biological enquiries.We first use it to devise efficient algorithms for structural alignment. A preliminary study shows that using our model for fingerprinting protein structures, more accurate alignments can be obtained without introducing any runtime overhead. Furthermore, the model naturally gives rise to a variant of the distance matrix-based structure alignment problem -- a problem commonly believed to be hard due to its similarity to the sub-graph isomorphism problem. A preliminary analysis of the new problem indicates the existence of a polynomial time solution. Through insights obtained from the solution, better results may be possible for the structural alignment problem under other distance matrices such as DALI.The second problem which we study is regarding the use of the structural units encoded in our proposed representation as basic evolutionary or functional protein units, ordomains. Protein domains have been used for many purposes, e.g. to investigate structural evolution or to design move sets for protein molecular dynamics. Current SCOP and CATH database maintain a database of these domains, as classified by human experts. As a trial we constructed a hierarchy from the structural units captured through our representation. Using even a simple distance measure, very promising results were obtained. This lends evidence to the adequacy of the structural space encoded through our representation. We plan to pursue this hypothesis further with more diverse experimentations, and with more refined distance measures based on natural structural operations. Insights drawn from our experience in these applications of our encoded structural library will, in turn, help us improve it.The third problem we consider is the design of better move set for protein structure prediction. Existingab initiomethods are highly inefficient due mostly to the exploration of a large search space of improbable candidates. By studying the relations between the elements in the structural space generated by our representation, we plan to design a better move set for molecular dynamics which is non-redundant. This also opens the possibility of coding energy items implicitly into the library to relieve some dependency on energy function computation. While difficulties remain, we believe a successful implementation of these strategies will result in very dramatic improvements in the runtime ofab initioprotein structure prediction.The expected output of this project includes high quality research publications, algorithms, models software for structural alignment, structure prediction, and structure evolution and training of PhD students.Detail(s)
Project number | 9041901 |
---|---|
Grant type | GRF |
Status | Finished |
Effective start/end date | 1/11/13 → 27/10/17 |