Deleterious Non-Synonymous Single Nucleotide Polymorphism Predictions on Human Transcription Factors

Research output: Journal Publications and ReviewsRGC 21 - Publication in refereed journalpeer-review

5 Scopus Citations
View graph of relations


Related Research Unit(s)


Original languageEnglish
Pages (from-to)327-333
Journal / PublicationIEEE/ACM Transactions on Computational Biology and Bioinformatics
Issue number1
Online published21 Nov 2018
Publication statusPublished - Jan 2020


Transcription factors are the major components of human gene regulation. In particular, they bind onto specific DNA sequences and regulate neighborhood genes in different tissues at different developmental stages. Non-synonymous single nucleotide polymorphisms on its protein-coding sequences could result in undesired consequences in human. Therefore, it is necessary to develop methods for predicting any abnormality among those non-synonymous single nucleotide polymorphisms.

To address it, we have developed and compared different strategies to predict deleterious non-synonymous single nu-cleotide polymorphisms (also known as missense mutations) on the protein-coding sequences of human TFs. Taking advantage of evolutionary conservation signals, we have developed and compared different classifiers with different feature sets as computed from different evolutionarily related sequence collections. The results indicate that the classic ensemble algorithm, Adaboost with decision stumps, with orthologous sequence collection has performed the best (namely, TFmedic). We have further compared TFmedic with other state-of-the-arts methods (i.e. PolyPhen-2 and SIFT) on PolyPhen-2’s own datasets, demonstrating that TFmedic can outperform the others. As applications, we have further applied TFmedic to all possible missense mutations on all human transcription factors; the proteome-wide results reveal interesting insights, consistent with the existing physiochemical knowledge. A case study with the actual 3D structure is conducted, revealing how TFmedic can be contributed to protein-DNA binding complex studies.

Research Area(s)

  • Applied Data Mining, Applied Machine Learning, Databases, DNA, Forestry, Information entropy, Predictive models, Proteins, Single Nucleotide Polymorphism, Support vector machines, Transcription Factors

Citation Format(s)