Identification of linked regions and reconstruction of tandem repeats duplication history
基因連鎖區域識別及重建串聯重複序列的複製歷史
Student thesis: Master's Thesis
Author(s)
Related Research Unit(s)
Detail(s)
Awarding Institution | |
---|---|
Supervisors/Advisors |
|
Award date | 2 Oct 2008 |
Link(s)
Permanent Link | https://scholars.cityu.edu.hk/en/theses/theses(22a05d2d-9d12-47ed-898f-ea480faf32e4).html |
---|---|
Other link(s) | Links |
Abstract
In this thesis, we study two important problems in computational biology and
bioinformatics. Those two problems are identi¯cation of linked regions and recon-
struction of tandem repeats duplication history.
With the knowledge of large number of SNPs in human genome and the fast
development in high-throughput genotyping technologies, identi¯cation of linked
regions in linkage analysis through allele sharing status determination will play an
ever important role, while consideration of recombination fractions becomes un-
necessary. In Chapter 2, we have developed a rule-based program that identi¯es
linked regions for underlined diseases using allele sharing information among fam-
ily members. Our program uses high-density SNP genotype data and works in the
face of genotyping errors. It works on nuclear family structures with two or more
siblings. The program graphically displays allele sharing status for all members
in a pedigree and identi¯es regions that are potentially linked to the underlined
diseases according to user-speci¯ed inheritance mode and penetrance. Extensive
simulations based on the Chi-square model for recombination show that our pro-
gram identi¯es linked regions with high sensitivity and accuracy. Graphical display
of allele sharing status helps to detect misspeci¯cation of inheritance mode and penetrance, as well as mislabeling or misdiagnosis. Allele sharing determination
may represent the future direction of linkage analysis due to its better adaptation
to high-density SNP genotyping data.
The genomes of many species are dominated by short segments repeated con-
secutively. It is estimated that over 10% of the human genome consists of re-
peated segments. About 10-25% of all known proteins have some form of repeated
structures. Computing the duplication history of a tandem repeated region is an
important problem in computational biology [7, 25, 12]. In Chapter 3, we design
a polynomial-time approximation scheme (PTAS) for the case where the size of
the duplication block is 1. Our PTAS is faster than the existing PTAS [12]. For
example, to achieve a ratio of 1:5, our PTAS takes O(n5) time while the previous
PTAS in [12] takes O(n11) time. We also design a ratio-6 polynomial-time approx-
imation algorithm for the case where the size of each duplication block is at most
2. This is the ¯rst polynomial-time approximation algorithm with a guaranteed
ratio for this case.
- Genetic recombination, Computer algorithms