Chromosome Structure Inference, Alignment and Application

  • LI, Shuaicheng (Principal Investigator / Project Coordinator)
  • Lin, Yu (Co-Investigator)

Project: Research

Project Details

Description

The genome-deciphering efforts in the last 30 years advanced the understanding on the genomesequences intensively. We are now in a position to reconstruct the 3D structures of entiregenomes, an enterprise that will advance our understanding of chromosome interactions—andhence genome activities—tremendously. Towards that aim, the community expects to amass avery large collection of chromosome interaction data under many different conditions,resolutions, and for a variety of genomes, large and small. Their processing and analysis willpresent many challenges to the field of bioinformatics. This project identifies and addresses afew of these challenges.The first problem we hope to address is on structure inference. The availability of high-resolutiondata yields us more information at the expenses of a larger problem size. Currentmethods will flounder at the level of kilobase pairs. Most of them suffer from large memoryrequirement at high resolutions, e.g. a resolution of one kilobase pair for the human genome willconsume terabytes of computer memory. Processing time presents another issue. The fastestmethod currently has a time complexity of O(n3), rendering them inefficient for large datasets.We propose a new algorithm based on divide-and-conquer and dimension reduction. Thealgorithm showed practical runtimes in initial tests, solving structures of one million points in~30 minutes. To enhance accuracy, we will experiment with new representation models toincorporate additional information such as transcripts, non-coding RNAs, gene expressions, etc.into our inference.The second problem is chromosome structure comparison. The ability to do so will allowanalyzing chromosomal variations from cell to cell, and discovering chromosomal differencesbetween related species. Existing structural alignment tools are mostly for protein structure,which are a few magnitudes smaller. At the chromosome scale, a very rough resolution of ~10kwill already result in ~0.3 million points; in comparison, a protein structure alignment task hasonly several hundred points to align—current methods fare poorly even at a few thousand points.A completely different approach is required for alignments at the chromosome scale. Ourexperiences show two types of indexing, local and remote, works well for structural alignment.We plan to develop an algorithm around these indexing for the chromosome structural alignment.We plan to build tools around our algorithms to investigate two important biological problems.First, we plan to study the chromosomal changes brought about by the rearrangements in themammalian X chromosome. It is suggested that the differences between the human and mouse Xchromosomes are rearrangements during the course of evolution. As a pilot study, we hope that astudy on the structural changes in the chromosomes due to these rearrangements could shed lighton the causes, or the effects, of the rearrangements. We will also study chromosomal changesbrought about by the cancer-causing HPV (human papillomavirus). During long-term latency inhuman tissues, HPV persistently leads to organizational changes in the genome of the host cell,or simply integrates into it. HPV integration into the host genome can cause multiple effects,including changes in chromosome structure nearby the integrated location. We will examine thecauses and the affects in this project at chromosome structural data and tools.Finally, to make our tools accessible to the community, we plan to build a web-based userinterface, which integrates chromosome data analysis and visualization tools.The expected output of this project includes high quality research publications, algorithms,software for chromosome structure inference, alignment, and their applications. In addition, thisproject will provide interesting and worthy research topics for postgraduate students.
Project number9042181
Grant typeGRF
StatusFinished
Effective start/end date1/11/1529/10/19

Keywords

  • bioinformatics,computational biology,Algorithms,Alignment,

Fingerprint

Explore the research topics touched on by this project. These labels are generated based on the underlying awards/grants. Together they form a unique fingerprint.