Project Details
Description
In this project, we will design algorithms for two important problems in computational biology.The first problem is pan-genome analysis emphasizing scaffold comparison and rearrangementevent identification and the second problem is the chromosomal haplotype assembly problem.A pan-genome (or supra-genome) describes the full complement of genes in a clade (typically forspecies in bacteria and archaea), which can have large variation in gene content among closelyrelated strains). For pan-genome analysis, the genomes from different strains of the same speciesare decomposed to core segments (in all the strains), dispensable segments (in two or morestrains) and strain specific segments (in one strain only) by using some multiple sequencealignment tools. Various statistics analyses have been done after the decomposition. Here we arethe first to propose to do scaffold comparison of pan-genomes and study various kinds ofrearrangement events and the mechanisms behind those events. Since the genomes within thesame species are much more similar to each other, pan-genome comparison allows us to observethe most recent evolutionary events and the sequence segments nearby those events. It will bevery helpful to reveal the mechanisms under those rearrangement operations. We have somepreliminary finding, i.e., in bacteria such as E. Coli and Pseudomonas aeruginosa, it is verypopular that a pair of inverted TEs is associated with the two ends of a reversal segment. Thismechanism can also explain why breakpoint reuses happen for reversal events.In this project, we propose to develop tools and algorithms for pan-genome scaffold comparisonand rearrangement events (such as reversal, block interchange, insertion and deletion) analysis.Haplotypes play a crucial role in genetic analysis and have many applications such as genedisease diagnoses, association studies, ancestry inference, etc. Due to the current sequencingtechniques, the reads are decomposed into a set of disjoint blocks, where the reads fromdifferent blocks do not overlap. Consequently, the assembled haplotype usually containsthousands of small disjoint pieces (blocks). Even with the 3rd generation sequencing techniquesuch as PACBIO, it is estimated that each chromosome may still contain about 100 blocks.Obtaining one piece of haplotype covering the whole chromosome remains a challenge problemand has attracted lots of attentions recently. This problem is referred to as the chromosomalhaplotype assembly problem.In this project, we propose to use the sequencing data for a family containing at least threeindividuals (instead of one individual) to infer the haplotypes of individuals for the wholechromosome (resulting in one block for each chromosome). We will design algorithms anddevelop software packages to solve the problem.?
| Project number | 9042346 |
|---|---|
| Grant type | GRF |
| Status | Finished |
| Effective start/end date | 1/01/17 → 17/06/21 |
Keywords
- Algorithms , Sorting by Reversal , Gene Rearrangement , Haplotype assembly , Haplotype inference
Fingerprint
Explore the research topics touched on by this project. These labels are generated based on the underlying awards/grants. Together they form a unique fingerprint.
Research output
- 10 RGC 21 - Publication in refereed journal
-
Genetic and clinical analysis in Chinese patients with retinitis pigmentosa caused by EYS mutations
Sun, Y., Li, J.-K., He, W., Wang, Z.-S., Bai, J.-Y., Xu, L., Xing, B., Zhang, J.-G., Wang, L., Li, W. & Chen, F., Mar 2020, In: Molecular genetics & genomic medicine. 8, 3, e1117.Research output: Journal Publications and Reviews › RGC 21 - Publication in refereed journal › peer-review
Open AccessFile4 Link opens in a new tab Citations (Scopus)70 Downloads (CityUHK Scholars) -
Genetic and clinical findings of panel-based targeted exome sequencing in a northeast Chinese cohort with retinitis pigmentosa
Sun, Y. (Co-first Author), Li, W. (Co-first Author), Li, J.-K., Wang, Z.-S., Bai, J.-Y., Xu, L., Xing, B., Yang, W., Wang, Z.-W., Wang, L.-S., He, W. & Chen, F., Apr 2020, In: Molecular genetics & genomic medicine. 8, 4, e1184.Research output: Journal Publications and Reviews › RGC 21 - Publication in refereed journal › peer-review
Open AccessFile15 Link opens in a new tab Citations (Scopus)44 Downloads (CityUHK Scholars) -
Improved Practical Algorithms for Rooted Subtree Prune and Regraft (rSPR) Distance and Hybridization Number
YAMADA, K., CHEN, Z.-Z. & WANG, L., Sept 2020, In: Journal of Computational Biology. 27, 9, p. 1422–1432 11 p.Research output: Journal Publications and Reviews › RGC 21 - Publication in refereed journal › peer-review
4 Link opens in a new tab Citations (Scopus)