Representative distance : A new similarity measure for class discovery from gene expression data

Research output: Journal Publications and Reviews (RGC: 21, 22, 62)21_Publication in refereed journalpeer-review

14 Scopus Citations
View graph of relations


Related Research Unit(s)


Original languageEnglish
Article number6261551
Pages (from-to)341-351
Journal / PublicationIEEE Transactions on Nanobioscience
Issue number4
Publication statusPublished - 2012


Similarity measurement is one of the most important stages in the process of cancer discovery from gene expression data. Traditional distance functions, such as the Euclidean distance, the correlation coefficient measure, the cosine distance, and so on, are selected to quantify the similarity between two cancer samples. However, these measures do not take into account the properties of cancer samples and do not consider the relationships among the genes in gene expression data. In order to explore the properties of cancer samples and the relationships among genes, we design a new similarity measure called representative distance (RD) to identify cancer samples in gene expression data. Specifically, RD does not compute the distance between two cancer samples using all the genes, but only calculates the similarity using representative genes selected by the affinity propagation algorithm. Then, a similarity matrix is constructed based on the representative distance. Finally, the spectral clustering algorithm is adopted to partition the similarity matrix, and discover the biological meaningful samples. To our knowledge, this is the first time in which the representative distance is applied to class discovery for gene expression data. Experiments on real cancer datasets indicate that our similarity measure can i) outperform most of the traditional distance measures, ii) identify cancer samples correctly in most of the datasets. © 2011 IEEE.

Research Area(s)

  • Cancer discovery, distance, microarray, similarity measure