Algorithms for Pedigree Comparison and Haplotype Assembly
譜系比較和單體型裝配算法
Student thesis: Doctoral Thesis
Author(s)
Related Research Unit(s)
Detail(s)
Awarding Institution | |
---|---|
Supervisors/Advisors |
|
Award date | 12 Aug 2016 |
Link(s)
Permanent Link | https://scholars.cityu.edu.hk/en/theses/theses(04393ebe-d78e-45de-bacf-bfcd5c47d81d).html |
---|---|
Other link(s) | Links |
Abstract
In this thesis, we mainly study the algorithms for pedigree comparison and haplotype assembly. Besides, we study interface prediction of protein-protein interaction.
Reconstruction of ancestral relationships among genera, species and populations is a core task in evolutionary biology. Reconstruction of pedigree is required in practice due to legal or medical reasons. Evaluating reconstruction methods require comparing the inferred pedigree and the known pedigrees. We propose three models for comparison of pedigrees, the maximum pedigree isomorphism problem, the maximum paternal-path-preserved mapping problem and the minimum edge-cutting mapping problem. For the maximum pedigree isomorphism problem, we prove that the problem is NP-hard and give a fixed-parameter algorithm for the problem. For the maximum paternal-path-preserved mapping problem, we give a dynamic-programming algorithm to find the mapping that preserves the maximum number of paternal paths between the two input pedigrees. For the minimum edge-cutting mapping problem, we prove that the problem is NP-hard and give a fixed-parameter algorithm. This algorithm is useful in practice when comparing two similar pedigrees.
Haplotype information is essential to the complete description and interpretation of genomes, genetic diversity and genetic ancestry. Knowledge of complete haplotype structure in individuals is essential for advancing personalized medicine. Haplotype assembly is to directly construct the haplotypes of an individual from sequence fragments of the individual. We propose better ILP-based approach to the haplotype assembly problem.
Another important topic in bioinformatics is protein-protein interaction. The protein-protein interaction plays a key role in the control of many biological functions, such as drug design and functional analysis. How to build more effective models based on sequence information, structure information and physicochemical characteristics, is the key technology in protein-protein interface prediction. In this thesis, we study the protein-protein interface prediction problem. We propose a novel method for identifying residues on interfaces from an input protein with both sequence and 3D structure information, based on hexagon structure similarity.
Gaining insights of various binding abilities will deepen our understanding on interaction. Determination of binding sites of protein-protein interaction is widely applied in molecular biology research. Therefore, many efficient methods have been developed for identifying binding sites. In this thesis, we calculate structural neighboring property through Voronoi diagram. Using 6,438 complexes, we study local biases of structural neighboring property on interface.
Reconstruction of ancestral relationships among genera, species and populations is a core task in evolutionary biology. Reconstruction of pedigree is required in practice due to legal or medical reasons. Evaluating reconstruction methods require comparing the inferred pedigree and the known pedigrees. We propose three models for comparison of pedigrees, the maximum pedigree isomorphism problem, the maximum paternal-path-preserved mapping problem and the minimum edge-cutting mapping problem. For the maximum pedigree isomorphism problem, we prove that the problem is NP-hard and give a fixed-parameter algorithm for the problem. For the maximum paternal-path-preserved mapping problem, we give a dynamic-programming algorithm to find the mapping that preserves the maximum number of paternal paths between the two input pedigrees. For the minimum edge-cutting mapping problem, we prove that the problem is NP-hard and give a fixed-parameter algorithm. This algorithm is useful in practice when comparing two similar pedigrees.
Haplotype information is essential to the complete description and interpretation of genomes, genetic diversity and genetic ancestry. Knowledge of complete haplotype structure in individuals is essential for advancing personalized medicine. Haplotype assembly is to directly construct the haplotypes of an individual from sequence fragments of the individual. We propose better ILP-based approach to the haplotype assembly problem.
Another important topic in bioinformatics is protein-protein interaction. The protein-protein interaction plays a key role in the control of many biological functions, such as drug design and functional analysis. How to build more effective models based on sequence information, structure information and physicochemical characteristics, is the key technology in protein-protein interface prediction. In this thesis, we study the protein-protein interface prediction problem. We propose a novel method for identifying residues on interfaces from an input protein with both sequence and 3D structure information, based on hexagon structure similarity.
Gaining insights of various binding abilities will deepen our understanding on interaction. Determination of binding sites of protein-protein interaction is widely applied in molecular biology research. Therefore, many efficient methods have been developed for identifying binding sites. In this thesis, we calculate structural neighboring property through Voronoi diagram. Using 6,438 complexes, we study local biases of structural neighboring property on interface.