Gene expression data analysis plays an important role in DNA research. One of the most commonly used methods for unsupervised analysis of gene expression data is clustering. Although many clustering algorithms have been proposed for such task, problems such as estimating the right number of clusters and adapting to different cluster characteristics have still not been satisfactorily addressed. In this thesis, I use a binary hierarchical clustering (BHC) algorithm for gene expression data analysis. The BHC algorithm is applied in two main steps. Firstly, the average linkage hierarchical clustering algorithm is applied to the data to partition them into two classes. Secondly, the Fisher linear discriminant is applied to the two classes to refine the classification and assess whether the partition is acceptable. The BHC algorithm first partitions all data and then recursively partitions the subclasses until all clusters cannot be split any further. It does not require the number of clusters to be known in advance nor does it place any assumption about the size of each cluster or the class distribution. The BHC algorithm naturally leads to a tree structure representation, where the clustering results can be visualized easily. Keywords: Gene expression data analysis, fuzzy C-means clustering, hierarchical clustering, Fisher linear discriminant analysis, binary hierarchical clustering framework, tree visualization
| Date of Award | 15 Jul 2005 |
|---|
| Original language | English |
|---|
| Awarding Institution | - City University of Hong Kong
|
|---|
| Supervisor | Hong YAN (Supervisor) |
|---|
- DNA microarrays
- Statistical methods
- Gene expression
Clustering analysis of microarray gene expression data
SZETO, L. K. (Author). 15 Jul 2005
Student thesis: Master's Thesis