Identification and Characterization of Coupling DNA Motifs on Chromatin Interaction Regions in Multiple Human Cell Lines

Project: Research

View graph of relations


In human, the protein-DNA binding interactions are the central activities in generegulation. In particular, DNA motifs such as transcription factor binding sites are thekey factors.On the other hand, we observe that chromatin interactions are yet to be incorporatedinto the current genome-wide DNA motif identification studies in a systematic andcomputationally exhaustive way at the time of writing. To assess its feasibility, the PIhas conducted a preliminary study to reveal the coupling DNA motifs on chromatininteractions in the human K562 cells. It has been accepted as a discovery note on theOxford Bioinformatics journal in September 2015 with the PI name carrying the solefirst authorship as well as the sole corresponding authorship.However, the preliminary study is based on an ad hoc computational pipeline and limitedto a single cell type. Therefore, this project aims at developing novel computationalmethods, first-of-its-kind, for identifying the coupling DNA motifs on long-rangechromatin interactions. It is novel because the past methods did not explicitly take thelong-range coupling relationships into consideration. As evidenced by previous successes,it is planned to develop novel probabilistic graphical models for identifying the couplingDNA motifs. A mathematical example is given in the main text. (Phase A; Objective 1)Another aim is to generalize the computational methods (either the existing ad hocpipeline developed by PI or the novel methods proposed in this proposal) to at least fourhuman cell types for novel comparative genomics (e.g. K562, GM06990, IMR90, and H1-ESC). After the novel DNA motifs are identified, we propose to develop our owncompetitive-edge genome informatics techniques (with PI's past experience from thekmerHMM and SignalSpider studies published in Nucleic Acids Research 2013 andBioinformatics 2015 respectively) to characterize the novel functions and properties ofthe identified DNA motifs. (Phase B; Objective 2)To achieve long-term impact, the identified DNA motifs will then be applied to high-impactcancer association and gene expression prediction studies (with PI's past humandisease experience from the SNPdryad study published in Bioinformatics 2014). At theend of the project, the deliverables (the developed methods and the identified DNAmotifs) will be released as open-source software and database respectively. PI's pastPhD supervisor from Toronto, Zhaolei Zhang, has agreed to provide wet-lab cross-validationsaccording to the support letter attached. (Phase C; Objectives 3 and 4)The overall research plan is outlined in Figure 1.


Project number9048072
Grant typeECS
Effective start/end date1/09/1629/10/19

    Research areas

  • Bioinformatics Tools , Bioinformatics Databases , Cell Lines , ,