TY - JOUR
T1 - Mining of protein-protein interfacial residues from massive protein sequential and spatial data
AU - Wang, Debby D.
AU - Zhou, Weiqiang
AU - Yan, Hong
PY - 2015/1/1
Y1 - 2015/1/1
N2 - It is a great challenge to process big data in bioinformatics. In this paper, we addressed the problem of identifying protein-protein interfacial residues from massive protein structural data. A protein set, comprising 154 993 residues, was analyzed. We applied the three-dimensional alpha shape modeling to the search of surface and interfacial residues in this set, and adopted the spatially neighboring residue profiles to characterize each residue. These residue profiles, which revealed the sequential and spatial information of proteins, translated the original data into a large matrix. After vertically and horizontally refining this matrix, we comparably implemented a series of popular learning procedures, including neuro-fuzzy classifiers (NFCs), CART, neighborhood classifiers (NECs), extreme learning machines (ELMs) and naive Bayesian classifiers (NBCs), to predict the interfacial residues, aiming to investigate the sensitivity of these massive structural data to different learning mechanisms. As a consequence, ELMs, CART and NFCs performed better in terms of computational costs; NFCs, NBCs and ELMs provided favorable prediction accuracies. Overall, NFCs, NBCs and ELMs are favourable choices for fastly and accurately handling this type of data. More importantly, the marginal differences between the prediction performances of these methods imply the insensitivity of this type of data to different learning mechanisms.
AB - It is a great challenge to process big data in bioinformatics. In this paper, we addressed the problem of identifying protein-protein interfacial residues from massive protein structural data. A protein set, comprising 154 993 residues, was analyzed. We applied the three-dimensional alpha shape modeling to the search of surface and interfacial residues in this set, and adopted the spatially neighboring residue profiles to characterize each residue. These residue profiles, which revealed the sequential and spatial information of proteins, translated the original data into a large matrix. After vertically and horizontally refining this matrix, we comparably implemented a series of popular learning procedures, including neuro-fuzzy classifiers (NFCs), CART, neighborhood classifiers (NECs), extreme learning machines (ELMs) and naive Bayesian classifiers (NBCs), to predict the interfacial residues, aiming to investigate the sensitivity of these massive structural data to different learning mechanisms. As a consequence, ELMs, CART and NFCs performed better in terms of computational costs; NFCs, NBCs and ELMs provided favorable prediction accuracies. Overall, NFCs, NBCs and ELMs are favourable choices for fastly and accurately handling this type of data. More importantly, the marginal differences between the prediction performances of these methods imply the insensitivity of this type of data to different learning mechanisms.
KW - 3D alpha shape modeling
KW - CART
KW - Extreme learning machines (ELMs)
KW - Joint mutual information (JMI)
KW - Naive Bayesian classifiers (NBCs)
KW - Neighborhood classifiers (NECs)
KW - Neuro-fuzzy classifiers (NFCs)
KW - Protein-protein interface prediction
KW - Residue sequence profile
UR - http://www.scopus.com/inward/record.url?scp=84911424558&partnerID=8YFLogxK
UR - https://www.scopus.com/record/pubmetrics.uri?eid=2-s2.0-84911424558&origin=recordpage
U2 - 10.1016/j.fss.2014.01.017
DO - 10.1016/j.fss.2014.01.017
M3 - RGC 21 - Publication in refereed journal
SN - 0165-0114
VL - 258
SP - 101
EP - 116
JO - Fuzzy Sets and Systems
JF - Fuzzy Sets and Systems
ER -