Probabilistic Inference on Multiple Normalized Genome-Wide Signal Profiles with Model Regularization
Research output: Journal Publications and Reviews (RGC: 21, 22, 62) › 21_Publication in refereed journal › peer-review
Author(s)
Related Research Unit(s)
Detail(s)
Original language | English |
---|---|
Article number | 7750605 |
Pages (from-to) | 43-50 |
Journal / Publication | IEEE Transactions on Nanobioscience |
Volume | 16 |
Issue number | 1 |
Online published | 21 Nov 2016 |
Publication status | Published - Jan 2017 |
Link(s)
Abstract
Understanding genome-wide protein-DNA interaction signals forms the basis for further focused studies in gene regulation. In particular, the chromatin immunoprecipitation with massively parallel DNA sequencing technology (ChIP-Seq) can enable us to measure the in vivo genome-wide occupancy of the DNA-binding protein of interest in a single run. Multiple ChIP-Seq runs thus inherent the potential for us to decipher the combinatorial occupancies of multiple DNA-binding proteins. To handle the genome-wide signal profiles from those multiple runs, we propose to integrate regularized regression functions (i.e., LASSO, Elastic Net, and Ridge Regression) into the well-established SignalRanker and FullSignalRanker frameworks, resulting in six additional probabilistic models for inference on multiple normalized genome-wide signal profiles. The corresponding model training algorithms are devised with computational complexity analysis. Comprehensive benchmarking is conducted to demonstrate and compare the performance of nine related probabilistic models on the ENCODE ChIP-Seq datasets. The results indicate that the regularized SignalRanker models, in contrast to the original SignalRanker models, can demonstrate excellent inference performance comparable to the FullSignalRanker models with low model complexities and time complexities. Such a feature is especially valuable in the context of the rapidly growing genome-wide signal profile data in the recent years.
Research Area(s)
- Bioinformatics, ChIP-Seq, classification, expectation maximization, genome informatics, high-throughput sequencing, ranking, transcription factor binding sites
Citation Format(s)
Probabilistic Inference on Multiple Normalized Genome-Wide Signal Profiles with Model Regularization. / Wong, Ka-Chun; Peng, Chengbin; Yan, Shankai et al.
In: IEEE Transactions on Nanobioscience, Vol. 16, No. 1, 7750605, 01.2017, p. 43-50.Research output: Journal Publications and Reviews (RGC: 21, 22, 62) › 21_Publication in refereed journal › peer-review