TY - JOUR
T1 - Space Structure and Clustering of Categorical Data
AU - Qian, Yuhua
AU - Li, Feijiang
AU - Liang, Jiye
AU - Liu, Bing
AU - Dang, Chuangyin
PY - 2016/10
Y1 - 2016/10
N2 - Learning from categorical data plays a fundamental role in such areas as pattern recognition, machine learning, data mining, and knowledge discovery. To effectively discover the group structure inherent in a set of categorical objects, many categorical clustering algorithms have been developed in the literature, among which $k$-modes-type algorithms are very representative because of their good performance. Nevertheless, there is still much room for improving their clustering performance in comparison with the clustering algorithms for the numeric data. This may arise from the fact that the categorical data lack a clear space structure as that of the numeric data. To address this issue, we propose, in this paper, a novel data-representation scheme for the categorical data, which maps a set of categorical objects into a Euclidean space. Based on the data-representation scheme, a general framework for space structure based categorical clustering algorithms (SBC) is designed. This framework together with the applications of two kinds of dissimilarities leads two versions of the SBC-type algorithms. To verify the performance of the SBC-type algorithms, we employ as references four representative algorithms of the $k$-modes-type algorithms. Experiments show that the proposed SBC-type algorithms significantly outperform the $k$-modes-type algorithms.
AB - Learning from categorical data plays a fundamental role in such areas as pattern recognition, machine learning, data mining, and knowledge discovery. To effectively discover the group structure inherent in a set of categorical objects, many categorical clustering algorithms have been developed in the literature, among which $k$-modes-type algorithms are very representative because of their good performance. Nevertheless, there is still much room for improving their clustering performance in comparison with the clustering algorithms for the numeric data. This may arise from the fact that the categorical data lack a clear space structure as that of the numeric data. To address this issue, we propose, in this paper, a novel data-representation scheme for the categorical data, which maps a set of categorical objects into a Euclidean space. Based on the data-representation scheme, a general framework for space structure based categorical clustering algorithms (SBC) is designed. This framework together with the applications of two kinds of dissimilarities leads two versions of the SBC-type algorithms. To verify the performance of the SBC-type algorithms, we employ as references four representative algorithms of the $k$-modes-type algorithms. Experiments show that the proposed SBC-type algorithms significantly outperform the $k$-modes-type algorithms.
KW - Categorical data
KW - clustering
KW - dissimilarity
KW - k-modes-type algorithms
KW - space structure
UR - http://www.scopus.com/inward/record.url?scp=84943192835&partnerID=8YFLogxK
UR - https://www.scopus.com/record/pubmetrics.uri?eid=2-s2.0-84943192835&origin=recordpage
U2 - 10.1109/TNNLS.2015.2451151
DO - 10.1109/TNNLS.2015.2451151
M3 - RGC 21 - Publication in refereed journal
SN - 2162-237X
VL - 27
SP - 2047
EP - 2059
JO - IEEE Transactions on Neural Networks and Learning Systems
JF - IEEE Transactions on Neural Networks and Learning Systems
IS - 10
M1 - 7287764
ER -