Mutation region detection, protein complex prediction and protein docking
基因突變區域檢測和蛋白質複合物預測與蛋白質對接
Student thesis: Doctoral Thesis
Author(s)
Related Research Unit(s)
Detail(s)
Awarding Institution | |
---|---|
Supervisors/Advisors |
|
Award date | 2 Oct 2013 |
Link(s)
Permanent Link | https://scholars.cityu.edu.hk/en/theses/theses(0349632f-6a21-4e0a-860b-74f296a376f4).html |
---|---|
Other link(s) | Links |
Abstract
In this thesis, we investigate three major problems arising in computational biology
and bioinformatics. These problems are mutation region detection, protein complex
prediction and protein docking.
Linkage analysis serves as a way of finding locations of genes that cause genetic diseases.
Linkage studies have facilitated the identification of several hundreds of human
genes that can harbor mutations which by themselves lead to a disease phenotype. The
fundamental problem in linkage analysis is to identify regions whose allele is shared by
all or almost all affected members but by none or few unaffected members. Almost all
the existing methods for linkage analysis are for families with clearly given pedigrees.
Little work has been done for the case where the sampled individuals are closely related,
but their pedigree is not known. This situation occurs very often when the individuals
share a common ancestor at least six generations ago. Solving this case will tremendously
extend the use of linkage analysis for finding genes that cause genetic diseases.
In Chapter 2, we propose a mathematical model (the shared center problem) for inferring
the allele-sharing status of a given set of individuals using a database of confirmed
haplotypes as reference. We show the NP-completeness of the shared center problem
and present a ratio-2 polynomial-time approximation algorithm for its minimization
version (called the closest shared center problem). We then convert the approximation
algorithm into a heuristic algorithm for the shared center problem. Based on this
heuristic, we finally design a heuristic algorithm for mutation region detection. We
further implement the algorithms to obtain a software package. Our experimental data
shows that the software works very well.
We have another strike on the SC problem in Chapter 3. We consider the parameterized complexity of the SC problem. First, we show that the SC problem is W[1]-hard
with parameters d and n, where d and n are the radius and the number of (diseased
or normal) individuals in the input, respectively. Then, we present two asymptotically
optimal parameterized algorithms for the problem and apply them to linkage analysis.
With the development of high-throughput methods for identifying protein-protein
interactions, large scale interaction networks are available. Computational methods to
analyze the networks to detect functional modules as protein complexes are becoming
more important. However, most of the existing methods only make use of the proteinprotein
interaction networks without considering the structural limitations of proteins
to bind together. In Chapter 4, we design a new protein complex prediction method by
extending the idea of using domain-domain interaction information. Here we formulate
the problem into a maximum matching problem (which can be solved in polynomial
time) instead of the binary integer linear programming approach (which can be NPhard
in the worst case). We also add a step to predict domain-domain interactions
which first searches the database Pfam using the hidden Markov model and then predicts
the domain-domain interactions based on the database DOMINE and InterDom which
contain confirmed DDIs. By adding the domain-domain interaction prediction step, we
have more edges in the DDI graph and the recall value is increased significantly (at
least doubled) comparing with the method of Ozawa et al.(2010) [1] while the average
precision value is slightly better. We also combine our method with three other existing
methods, such as COACH, MCL and MCODE. Experiments show that the precision of
the combined method is improved.
Conformational changes frequently occur when proteins interact with other proteins
and finally form a protein complex. How to detect such changes in silico is a
major problem. Existing methods for docking with conformational changes remain
time-consuming, and they solve the problem only for a small portion of protein-protein
complexes accurately. In Chapter 5,we present a more accurate method (FlexDoBi) for
docking with conformational changes. FlexDoBi generates the possible conformational
changes of the interface residues that transform the proteins from their unbound states
to bound states. Based on the generated conformational changes, multi-dimensional scaling is performed to construct candidates for the bound structure. We develop the
new energy items for determining the orientation of proteins and selecting of plausible
conformational changes. Experimental results illustrate that FlexDoBi achieves better
results than other methods for the same purpose. On 20 complexes, we obtained
an average iRMSD of 1.55Å , which compares favorably with the average iRMSD of
1.94Å in the predictions from FiberDock. Compared with ZDOCK, our results are of
0.35Å less in average iRMSD on the medium difficulty group, and 0.81Å less on the
difficulty group.
- Mathematical models, Mutation (Biology), Protein-protein interactions