TY - JOUR
T1 - Gene expression data clustering and visualization based on a binary hierarchical clustering framework
AU - Szeto, Lap Keung
AU - Liew, Alan Wee-Chung
AU - Yan, Hong
AU - Tang, Sy-Sen
PY - 2003/8
Y1 - 2003/8
N2 - Gene expression data analysis has recently emerged as an active area of research. An important tool for unsupervised analysis of gene expression data is cluster analysis. Although many clustering algorithms have been proposed for such task, problems such as estimating the right number of clusters and adapting to different cluster characteristics are still not satisfactorily addressed. In this paper, we propose a binary hierarchical clustering (BHC) algorithm for the clustering of gene expression data. The BHC algorithm involves two major steps: (i) the fuzzy C-means algorithm and the average linkage hierarchical clustering algorithm are used to partition the data into two classes, and (ii) the Fisher linear discriminant analysis is applied to the two classes to refine and assess whether the partition is acceptable. The BHC algorithm recursively partitions the subclasses until all clusters cannot be partition any further. It does not require the number of clusters to be supplied in advance nor does it place any assumption about the size of each cluster or the class distribution. The BHC algorithm naturally leads to a tree structure representation, where the clustering results can be visualized easily. © 2003 Elsevier Science Ltd. All rights reserved.
AB - Gene expression data analysis has recently emerged as an active area of research. An important tool for unsupervised analysis of gene expression data is cluster analysis. Although many clustering algorithms have been proposed for such task, problems such as estimating the right number of clusters and adapting to different cluster characteristics are still not satisfactorily addressed. In this paper, we propose a binary hierarchical clustering (BHC) algorithm for the clustering of gene expression data. The BHC algorithm involves two major steps: (i) the fuzzy C-means algorithm and the average linkage hierarchical clustering algorithm are used to partition the data into two classes, and (ii) the Fisher linear discriminant analysis is applied to the two classes to refine and assess whether the partition is acceptable. The BHC algorithm recursively partitions the subclasses until all clusters cannot be partition any further. It does not require the number of clusters to be supplied in advance nor does it place any assumption about the size of each cluster or the class distribution. The BHC algorithm naturally leads to a tree structure representation, where the clustering results can be visualized easily. © 2003 Elsevier Science Ltd. All rights reserved.
KW - Binary hierarchical clustering framework
KW - Fisher linear discriminant analysis
KW - Fuzzy C-means clustering
KW - Gene expression data analysis
KW - Hierarchical clustering
KW - Tree visualization
UR - http://www.scopus.com/inward/record.url?scp=0042515179&partnerID=8YFLogxK
UR - https://www.scopus.com/record/pubmetrics.uri?eid=2-s2.0-0042515179&origin=recordpage
U2 - 10.1016/S1045-926X(03)00033-8
DO - 10.1016/S1045-926X(03)00033-8
M3 - RGC 21 - Publication in refereed journal
SN - 1045-926X
VL - 14
SP - 341
EP - 362
JO - Journal of Visual Languages and Computing
JF - Journal of Visual Languages and Computing
IS - 4
ER -