Heterodimeric DNA Motif Synthesis and Validations
Project: Research
Researcher(s)
- Ka Chun WONG (Principal Investigator / Project Coordinator)Department of Computer Science
- You Qiang SONG (Co-Investigator)
Description
In human, DNA motifs are prevalent and important for gene regulation in different tissues at different developmental stages. Although considerable efforts on elucidating individual DNA motif patterns have been made, our knowledge on heterodimeric DNA motifs are still obscure (e.g. around 25,000 heterodimeric DNA motifs still have not been found in human). Therefore, we propose to develop novel computational models for heterodimeric DNA motif synthesis with extensive validations. For illustrative purposes, preliminary initial testing results are given for each project phase.In Phase A for Objective 1, we propose to develop the first-of-its-kind prediction models on how two individual DNA motifs are oriented and overlapped with each other to synthesize heterodimeric DNA motifs at nucleotide level. We have extensively tested various classifiers and regression methods with different time complexities on 618 heterodimeric DNA motifs under 10-fold cross-validations. The preliminary results (e.g. AUROCs>0.8 and correlation coefficients>0.75) demonstrated its feasibility on our Intel Xeon servers.In Phase B for Objective 2, we propose to develop the first-of-its-kind heterodimeric DNA motif pattern models with a focus on probabilistic graphical modeling. Our initial input-output hidden Markov models (IOHMMs) have been validated on the experimentally verified datasets across 49 DNA-binding family combinations. It is observed that the synthesized patterns are statistically similar to the original patterns (p=0.003). The linear complexity nature of the underlying motif synthesis processes using max-product algorithms further demonstrates that our proposed approaches could be scaled and promising.In Phase C for Objective 3, we aim at concatenating the models in Phases A and B sequentially to infer the unknown heterodimeric DNA motifs which have been estimated to be around 25,000 in human. To assess its potential, we have conducted preliminary experiments and found that our initial testing approach can even “rescue” the existing heterodimeric DNA motif patterns previously published on Nature under leave-one-out cross-validations. The co-I, Dr. You-Qiang Song from HKU, has agreed to provide biochemical validations (e.g. Chromatin Immnuo-Precipitation with sequencing (ChIP-seq) and with quantitative PCR (ChIP-qPCR)) on the novel heterodimeric DNA motifs in this proposed study. His support letter is attached as an appendix in this proposal.In Phase C for Objective 4, we will release the developed algorithms and the related data as open-source software and public data repositories respectively for long-term impact and scientific reproducibility.As a summary, the research flowchart with backup paths is drawn as Figure 1.Detail(s)
Project number | 9042627 |
---|---|
Grant type | GRF |
Status | Finished |
Effective start/end date | 1/12/18 → 29/11/22 |