TY - JOUR
T1 - A computational approach to gene expression data extraction and analysis
AU - Liew, Alan Wee-Chung
AU - Szeto, Lap Keung
AU - Tang, Sy-Sen
AU - Yan, Hong
AU - Yang, Mengsu
PY - 2004/11
Y1 - 2004/11
N2 - The rapid advancement of DNA microarray technology has revolutionalized genetic research in bioscience. Due to the enormous amount of gene expression data generated by such technology, computer processing and analysis of such data has become indispensable. In this paper, we present a computational framework for the extraction, analysis and visualization of gene expression data from microarray experiments. A novel, fully automated, spot segmentation algorithm for DNA microarray images, which makes use of adaptive thresholding, morphological processing and statistical intensity modeling, is proposed to: (i) segment the blocks of spots, (ii) generate the grid structure, and (iii) to segment the spot within each subregion. For data analysis, we propose a binary hierarchical clustering (BHC) framework for the clustering of gene expression data. The BHC algorithm involves two major steps. Firstly, the fuzzy C-means algorithm and the average linkage hierarchical clustering algorithm are used to split the data into two classes. Secondly, the Fisher linear discriminant analysis is applied to the two classes to assess whether the split is acceptable. The BHC algorithm is applied to the sub-classes recursively and ends when all clusters cannot be split any further. BHC does not require the number of clusters to be known in advance. It does not place any assumption about the number of samples in each cluster or the class distribution. The hierarchical framework naturally leads to a tree structure representation for effective visualization of gene expressions.
AB - The rapid advancement of DNA microarray technology has revolutionalized genetic research in bioscience. Due to the enormous amount of gene expression data generated by such technology, computer processing and analysis of such data has become indispensable. In this paper, we present a computational framework for the extraction, analysis and visualization of gene expression data from microarray experiments. A novel, fully automated, spot segmentation algorithm for DNA microarray images, which makes use of adaptive thresholding, morphological processing and statistical intensity modeling, is proposed to: (i) segment the blocks of spots, (ii) generate the grid structure, and (iii) to segment the spot within each subregion. For data analysis, we propose a binary hierarchical clustering (BHC) framework for the clustering of gene expression data. The BHC algorithm involves two major steps. Firstly, the fuzzy C-means algorithm and the average linkage hierarchical clustering algorithm are used to split the data into two classes. Secondly, the Fisher linear discriminant analysis is applied to the two classes to assess whether the split is acceptable. The BHC algorithm is applied to the sub-classes recursively and ends when all clusters cannot be split any further. BHC does not require the number of clusters to be known in advance. It does not place any assumption about the number of samples in each cluster or the class distribution. The hierarchical framework naturally leads to a tree structure representation for effective visualization of gene expressions.
KW - Binary hierarchical clustering
KW - cDNA microarray
KW - Gene expression clustering
KW - Gene expression data analysis
KW - Microarray image segmentation
KW - Tree visualization
UR - http://www.scopus.com/inward/record.url?scp=4944262651&partnerID=8YFLogxK
UR - https://www.scopus.com/record/pubmetrics.uri?eid=2-s2.0-4944262651&origin=recordpage
U2 - 10.1023/B:VLSI.0000042490.35986.84
DO - 10.1023/B:VLSI.0000042490.35986.84
M3 - RGC 22 - Publication in policy or professional journal
SN - 1387-5485
VL - 38
SP - 237
EP - 258
JO - Journal of VLSI Signal Processing Systems for Signal, Image, and Video Technology
JF - Journal of VLSI Signal Processing Systems for Signal, Image, and Video Technology
IS - 3
ER -