Skip to main navigation Skip to search Skip to main content

Probabilistic Inference on Multiple Normalized Genome-Wide Signal Profiles with Model Regularization

Ka-Chun Wong*, Chengbin Peng, Shankai Yan, Cheng Liang

*Corresponding author for this work

Research output: Journal Publications and ReviewsRGC 21 - Publication in refereed journalpeer-review

Abstract

Understanding genome-wide protein-DNA interaction signals forms the basis for further focused studies in gene regulation. In particular, the chromatin immunoprecipitation with massively parallel DNA sequencing technology (ChIP-Seq) can enable us to measure the in vivo genome-wide occupancy of the DNA-binding protein of interest in a single run. Multiple ChIP-Seq runs thus inherent the potential for us to decipher the combinatorial occupancies of multiple DNA-binding proteins. To handle the genome-wide signal profiles from those multiple runs, we propose to integrate regularized regression functions (i.e., LASSO, Elastic Net, and Ridge Regression) into the well-established SignalRanker and FullSignalRanker frameworks, resulting in six additional probabilistic models for inference on multiple normalized genome-wide signal profiles. The corresponding model training algorithms are devised with computational complexity analysis. Comprehensive benchmarking is conducted to demonstrate and compare the performance of nine related probabilistic models on the ENCODE ChIP-Seq datasets. The results indicate that the regularized SignalRanker models, in contrast to the original SignalRanker models, can demonstrate excellent inference performance comparable to the FullSignalRanker models with low model complexities and time complexities. Such a feature is especially valuable in the context of the rapidly growing genome-wide signal profile data in the recent years.
Original languageEnglish
Article number7750605
Pages (from-to)43-50
JournalIEEE Transactions on Nanobioscience
Volume16
Issue number1
Online published21 Nov 2016
DOIs
Publication statusPublished - Jan 2017

Research Keywords

  • Bioinformatics
  • ChIP-Seq
  • classification
  • expectation maximization
  • genome informatics
  • high-throughput sequencing
  • ranking
  • transcription factor binding sites

RGC Funding Information

  • RGC-funded

Fingerprint

Dive into the research topics of 'Probabilistic Inference on Multiple Normalized Genome-Wide Signal Profiles with Model Regularization'. Together they form a unique fingerprint.

Cite this