Repeats in Genome Rearrangements
重複片斷在基因重組中的作用研究
Student thesis: Doctoral Thesis
Author(s)
Related Research Unit(s)
Detail(s)
Awarding Institution | |
---|---|
Supervisors/Advisors |
|
Award date | 29 Aug 2017 |
Link(s)
Permanent Link | https://scholars.cityu.edu.hk/en/theses/theses(702ba3e0-71f4-4d23-aeef-fb03549a9250).html |
---|---|
Other link(s) | Links |
Abstract
Comparative genomics studies show that genome rearrangement events often occur between two genomes. Genome rearrangement events play an important role in speciation. Sorting genomic permutations by rearrangement operations is a classic problem in studying genome rearrangements. Many tools or algorithms for calculating rearrangement scenarios have been proposed. Very often, the calculated rearrangement scenario is not unique for the same pair of permutations, especially when the genomic distance between the two permutations is large. Hence, how to know whether the calculated scenarios are solid and biologically meaningful becomes an essential task.
Up to now, several mechanisms for genome rearrangements have been studied. One important theory is that genome rearrangements may be mediated by repeats, especially for inversion events. Many inversion regions are found to be flanked by a pair of inverted repeats. As a result, whether there are repeats at the breakpoints of the calculated rearrangement events can shed a light on deciding whether the calculated rearrangement events is solid and biologically meaningful.
To study the role of repeats in genome rearrangements including transpositions, block interchanges and inversions, we developed a new tool named GRSR (Genome Rearrangement Scenarios and Repeats) for deriving genome rearrangement scenarios from multiple unichromosomal genome sequences and checking whether there are repeats at the breakpoints of each calculated rearrangement event. The input of the GRSR tool is a set of unichromosomal genome sequences and the output is pairwise rearrangement scenarios which are series of transpositions, block interchanges and inversions. Besides, for each calculated rearrangement event, GRSR checks whether there are repeats at the breakpoints of this rearrangement event. We applied the GRSR tool to compare the complete genomes of 25 Pseudomonas aeruginosa strains, 31 Escherichia coli strains, 28 Mycobacterium tuberculosis strains, 24 Shewanella strains respectively. From the calculated results, we found many examples supporting the theory that the existence of repeats at the breakpoints of a rearrangement event can make the sequences at the breakpoints remain unchanged before and after the rearrangement event. We also found several examples which may explain breakpoint reuse.
For the instable regions where insertion or deletion events happened, we developed a pipeline to search for directed repeat pairs on the flank of every instable sequence. We applied our pipeline on 25 Pseudomonas aeruginosa strains and found 27 pairs of directed repeats existing in the instable regions, suggesting that insertions or deletions may also be mediated by repeats. We also studied the association of transposase and integrase with instable regions in 25 Pseudomonas aeruginosa strains and found that on the average, 14% and 12% of instable regions in the 25 strains covered transposase genes and integrase genes, respectively.
Up to now, several mechanisms for genome rearrangements have been studied. One important theory is that genome rearrangements may be mediated by repeats, especially for inversion events. Many inversion regions are found to be flanked by a pair of inverted repeats. As a result, whether there are repeats at the breakpoints of the calculated rearrangement events can shed a light on deciding whether the calculated rearrangement events is solid and biologically meaningful.
To study the role of repeats in genome rearrangements including transpositions, block interchanges and inversions, we developed a new tool named GRSR (Genome Rearrangement Scenarios and Repeats) for deriving genome rearrangement scenarios from multiple unichromosomal genome sequences and checking whether there are repeats at the breakpoints of each calculated rearrangement event. The input of the GRSR tool is a set of unichromosomal genome sequences and the output is pairwise rearrangement scenarios which are series of transpositions, block interchanges and inversions. Besides, for each calculated rearrangement event, GRSR checks whether there are repeats at the breakpoints of this rearrangement event. We applied the GRSR tool to compare the complete genomes of 25 Pseudomonas aeruginosa strains, 31 Escherichia coli strains, 28 Mycobacterium tuberculosis strains, 24 Shewanella strains respectively. From the calculated results, we found many examples supporting the theory that the existence of repeats at the breakpoints of a rearrangement event can make the sequences at the breakpoints remain unchanged before and after the rearrangement event. We also found several examples which may explain breakpoint reuse.
For the instable regions where insertion or deletion events happened, we developed a pipeline to search for directed repeat pairs on the flank of every instable sequence. We applied our pipeline on 25 Pseudomonas aeruginosa strains and found 27 pairs of directed repeats existing in the instable regions, suggesting that insertions or deletions may also be mediated by repeats. We also studied the association of transposase and integrase with instable regions in 25 Pseudomonas aeruginosa strains and found that on the average, 14% and 12% of instable regions in the 25 strains covered transposase genes and integrase genes, respectively.