Identification of linked regions and reconstruction of tandem repeats duplication history

基因連鎖區域識別及重建串聯重複序列的複製歷史

Student thesis: Master's Thesis

View graph of relations

Author(s)

  • Zhanyong WANG

Related Research Unit(s)

Detail(s)

Awarding Institution
Supervisors/Advisors
Award date2 Oct 2008

Abstract

In this thesis, we study two important problems in computational biology and bioinformatics. Those two problems are identi¯cation of linked regions and recon- struction of tandem repeats duplication history. With the knowledge of large number of SNPs in human genome and the fast development in high-throughput genotyping technologies, identi¯cation of linked regions in linkage analysis through allele sharing status determination will play an ever important role, while consideration of recombination fractions becomes un- necessary. In Chapter 2, we have developed a rule-based program that identi¯es linked regions for underlined diseases using allele sharing information among fam- ily members. Our program uses high-density SNP genotype data and works in the face of genotyping errors. It works on nuclear family structures with two or more siblings. The program graphically displays allele sharing status for all members in a pedigree and identi¯es regions that are potentially linked to the underlined diseases according to user-speci¯ed inheritance mode and penetrance. Extensive simulations based on the Chi-square model for recombination show that our pro- gram identi¯es linked regions with high sensitivity and accuracy. Graphical display of allele sharing status helps to detect misspeci¯cation of inheritance mode and penetrance, as well as mislabeling or misdiagnosis. Allele sharing determination may represent the future direction of linkage analysis due to its better adaptation to high-density SNP genotyping data. The genomes of many species are dominated by short segments repeated con- secutively. It is estimated that over 10% of the human genome consists of re- peated segments. About 10-25% of all known proteins have some form of repeated structures. Computing the duplication history of a tandem repeated region is an important problem in computational biology [7, 25, 12]. In Chapter 3, we design a polynomial-time approximation scheme (PTAS) for the case where the size of the duplication block is 1. Our PTAS is faster than the existing PTAS [12]. For example, to achieve a ratio of 1:5, our PTAS takes O(n5) time while the previous PTAS in [12] takes O(n11) time. We also design a ratio-6 polynomial-time approx- imation algorithm for the case where the size of each duplication block is at most 2. This is the ¯rst polynomial-time approximation algorithm with a guaranteed ratio for this case.

    Research areas

  • Genetic recombination, Computer algorithms