MotifHyades : expectation maximization for de novo DNA motif pair discovery on paired sequences

Research output: Journal Publications and Reviews (RGC: 21, 22, 62)21_Publication in refereed journal

1 Scopus Citations
View graph of relations


Related Research Unit(s)


Original languageEnglish
Pages (from-to)3028-3035
Journal / PublicationBioinformatics
Issue number19
Early online date13 Jun 2017
StatePublished - 1 Oct 2017


Motivation: In higher eukaryotes, protein-DNA binding interactions are the central activities in gene regulation. In particular, DNA motifs such as transcription factor binding sites are the key components in gene transcription. Harnessing the recently available chromatin interaction data, computational methods are desired for identifying the coupling DNA motif pairs enriched on long-range chromatin-interacting sequence pairs (e.g. promoter-enhancer pairs) systematically.
Results: To fill the void, a novel probabilistic model (namely, MotifHyades) is proposed and developed for de novo DNA motif pair discovery on paired sequences. In particular, two expectation maximization algorithms are derived for efficient model training with linear computational complexity. Under diverse scenarios, MotifHyades is demonstrated faster and more accurate than the existing ad hoc computational pipeline. In addition, MotifHyades is applied to discover thousands of DNA motif pairs with higher gold standard motif matching ratio, higher DNase accessibility and higher evolutionary conservation than the previous ones in the human K562 cell line. Lastly, it has been run on five other human cell lines (i.e. GM12878, HeLa-S3, HUVEC, IMR90, and NHEK), revealing another thousands of novel DNA motif pairs which are characterized across a broad spectrum of genomic features on long-range promoter–enhancer pairs.