Discovering Protein-DNA Binding Cores by Aligned Pattern Clustering

Research output: Journal Publications and Reviews (RGC: 21, 22, 62)21_Publication in refereed journal

2 Scopus Citations
View graph of relations

Author(s)

  • En-Shiun Annie Lee
  • Ho-Yin (Antonio) Sze-To
  • Man-Hon Wong
  • Kwong-Sak Leung
  • Andrew K. C. Wong

Related Research Unit(s)

Detail(s)

Original languageEnglish
Article number7229304
Pages (from-to)254-263
Journal / PublicationIEEE/ACM Transactions on Computational Biology and Bioinformatics
Volume14
Issue number2
Online published28 Aug 2015
Publication statusPublished - Mar 2017

Abstract

Understanding binding cores is of fundamental importance in deciphering Protein-DNA (TF-TFBS) binding and gene regulation. Limited by expensive experiments, it is promising to discover them with variations directly from sequence data. Although existing computational methods have produced satisfactory results, they are one-to-one mappings with no site-specific information on residue/nucleotide variations, where these variations in binding cores may impact binding specificity. This study presents a new representation for modeling binding cores by incorporating variations and an algorithm to discover them from only sequence data. Our algorithm takes protein and DNA sequences from TRANSFAC (a Protein-DNA) Binding Database as input; discovers from both sets of sequences conserved regions in Aligned Pattern Clusters (APCs); associates them as Protein-DNA Co-Occurring APCs; ranks the Protein-DNA Co-Occurring APCs according to their cooccurrence, and among the top ones, finds 3-dimensional structures to support each binding core candidate. If successful, candidates are verified as binding cores. Otherwise, homology modeling is applied to their close matches in PDB to attain new chemically feasible binding cores. Our algorithm obtains binding cores with higher precision and much faster runtime (≥1600×) than that of its contemporaries, discovering candidates that do not co-occur as one-to-one associated patterns in the raw data. Availability: http://www.pami.uwaterloo.ca/-ealee/files/tcbbPnDna2015/Release.zip

Research Area(s)

  • Aligned pattern cluster, Association rule mining, Binding cores, Protein-DNA binding

Citation Format(s)

Discovering Protein-DNA Binding Cores by Aligned Pattern Clustering. / Lee, En-Shiun Annie; Sze-To, Ho-Yin (Antonio); Wong, Man-Hon; Leung, Kwong-Sak; Lau, Terrence Chi-Kong; Wong, Andrew K. C.

In: IEEE/ACM Transactions on Computational Biology and Bioinformatics, Vol. 14, No. 2, 7229304, 03.2017, p. 254-263.

Research output: Journal Publications and Reviews (RGC: 21, 22, 62)21_Publication in refereed journal