TY - JOUR
T1 - Agglomerative Info-Clustering
T2 - Maximizing Normalized Total Correlation
AU - Chan, Chung
AU - Al-Bashabsheh, Ali
AU - Zhou, Qiaoqiao
PY - 2021/3
Y1 - 2021/3
N2 - We show that, under the info-clustering framework, correlated random variables can be clustered in an agglomerative manner. While the existing divisive approach successively segregates the random variables into subsets with increasing multivariate mutual information, our agglomerative approach successively merges subsets of random variables sharing a large amount of normalized total correlation. We show that both approaches result in the same hierarchy of clusters, but the agglomerative approach is an order of magnitude faster than the divisive one. The uniqueness of the hierarchy produced by the two approaches is due to a fundamental connection that we uncover between the well-known total correlation and the recently proposed measure of multivariate mutual information. We implement the new algorithm and provide a data structure for efficient storage and retrieval of the hierarchical clustering solution.
AB - We show that, under the info-clustering framework, correlated random variables can be clustered in an agglomerative manner. While the existing divisive approach successively segregates the random variables into subsets with increasing multivariate mutual information, our agglomerative approach successively merges subsets of random variables sharing a large amount of normalized total correlation. We show that both approaches result in the same hierarchy of clusters, but the agglomerative approach is an order of magnitude faster than the divisive one. The uniqueness of the hierarchy produced by the two approaches is due to a fundamental connection that we uncover between the well-known total correlation and the recently proposed measure of multivariate mutual information. We implement the new algorithm and provide a data structure for efficient storage and retrieval of the hierarchical clustering solution.
KW - agglomerative clustering
KW - Clustering algorithms
KW - Correlation
KW - Entropy
KW - Lattices
KW - minimum norm base
KW - multivariate mutual information
KW - Mutual information
KW - principal sequence
KW - principal sequence of partitions
KW - Random variables
KW - Turning
KW - agglomerative clustering
KW - Clustering algorithms
KW - Correlation
KW - Entropy
KW - Lattices
KW - minimum norm base
KW - multivariate mutual information
KW - Mutual information
KW - principal sequence
KW - principal sequence of partitions
KW - Random variables
KW - Turning
KW - agglomerative clustering
KW - Clustering algorithms
KW - Correlation
KW - Entropy
KW - Lattices
KW - minimum norm base
KW - multivariate mutual information
KW - Mutual information
KW - principal sequence
KW - principal sequence of partitions
KW - Random variables
KW - Turning
UR - http://www.scopus.com/inward/record.url?scp=85097182354&partnerID=8YFLogxK
UR - https://www.scopus.com/record/pubmetrics.uri?eid=2-s2.0-85097182354&origin=recordpage
U2 - 10.1109/TIT.2020.3040492
DO - 10.1109/TIT.2020.3040492
M3 - 21_Publication in refereed journal
VL - 67
SP - 2001
EP - 2011
JO - IRE Transactions on Information Theory
JF - IRE Transactions on Information Theory
SN - 0018-9448
IS - 3
ER -