Mutation region detection, protein complex prediction and protein docking

基因突變區域檢測和蛋白質複合物預測與蛋白質對接

Student thesis: Doctoral Thesis

View graph of relations

Author(s)

  • Wenji MA

Related Research Unit(s)

Detail(s)

Awarding Institution
Supervisors/Advisors
Award date2 Oct 2013

Abstract

In this thesis, we investigate three major problems arising in computational biology and bioinformatics. These problems are mutation region detection, protein complex prediction and protein docking. Linkage analysis serves as a way of finding locations of genes that cause genetic diseases. Linkage studies have facilitated the identification of several hundreds of human genes that can harbor mutations which by themselves lead to a disease phenotype. The fundamental problem in linkage analysis is to identify regions whose allele is shared by all or almost all affected members but by none or few unaffected members. Almost all the existing methods for linkage analysis are for families with clearly given pedigrees. Little work has been done for the case where the sampled individuals are closely related, but their pedigree is not known. This situation occurs very often when the individuals share a common ancestor at least six generations ago. Solving this case will tremendously extend the use of linkage analysis for finding genes that cause genetic diseases. In Chapter 2, we propose a mathematical model (the shared center problem) for inferring the allele-sharing status of a given set of individuals using a database of confirmed haplotypes as reference. We show the NP-completeness of the shared center problem and present a ratio-2 polynomial-time approximation algorithm for its minimization version (called the closest shared center problem). We then convert the approximation algorithm into a heuristic algorithm for the shared center problem. Based on this heuristic, we finally design a heuristic algorithm for mutation region detection. We further implement the algorithms to obtain a software package. Our experimental data shows that the software works very well. We have another strike on the SC problem in Chapter 3. We consider the parameterized complexity of the SC problem. First, we show that the SC problem is W[1]-hard with parameters d and n, where d and n are the radius and the number of (diseased or normal) individuals in the input, respectively. Then, we present two asymptotically optimal parameterized algorithms for the problem and apply them to linkage analysis. With the development of high-throughput methods for identifying protein-protein interactions, large scale interaction networks are available. Computational methods to analyze the networks to detect functional modules as protein complexes are becoming more important. However, most of the existing methods only make use of the proteinprotein interaction networks without considering the structural limitations of proteins to bind together. In Chapter 4, we design a new protein complex prediction method by extending the idea of using domain-domain interaction information. Here we formulate the problem into a maximum matching problem (which can be solved in polynomial time) instead of the binary integer linear programming approach (which can be NPhard in the worst case). We also add a step to predict domain-domain interactions which first searches the database Pfam using the hidden Markov model and then predicts the domain-domain interactions based on the database DOMINE and InterDom which contain confirmed DDIs. By adding the domain-domain interaction prediction step, we have more edges in the DDI graph and the recall value is increased significantly (at least doubled) comparing with the method of Ozawa et al.(2010) [1] while the average precision value is slightly better. We also combine our method with three other existing methods, such as COACH, MCL and MCODE. Experiments show that the precision of the combined method is improved. Conformational changes frequently occur when proteins interact with other proteins and finally form a protein complex. How to detect such changes in silico is a major problem. Existing methods for docking with conformational changes remain time-consuming, and they solve the problem only for a small portion of protein-protein complexes accurately. In Chapter 5,we present a more accurate method (FlexDoBi) for docking with conformational changes. FlexDoBi generates the possible conformational changes of the interface residues that transform the proteins from their unbound states to bound states. Based on the generated conformational changes, multi-dimensional scaling is performed to construct candidates for the bound structure. We develop the new energy items for determining the orientation of proteins and selecting of plausible conformational changes. Experimental results illustrate that FlexDoBi achieves better results than other methods for the same purpose. On 20 complexes, we obtained an average iRMSD of 1.55Å , which compares favorably with the average iRMSD of 1.94Å in the predictions from FiberDock. Compared with ZDOCK, our results are of 0.35Å less in average iRMSD on the medium difficulty group, and 0.81Å less on the difficulty group.

    Research areas

  • Mathematical models, Mutation (Biology), Protein-protein interactions