A New Framework for Ensemble-based Class Discovery from Gene Expression Patterns

Project: Research

View graph of relations


To perform successful diagnosis and treatment of cancer, discovering and classifying cancer types correctly is essential. Class discovery is one of the most important tasks in cancer classification using gene expression data, and includes two challenging tasks: (1) accurately assigning the samples to their corresponding classes, and (2) correctly estimating the number of classes given a set of unknown microarray data. Consensus clustering approaches are effective in addressing task (1) due to their stability and robustness compared with adopting just a single clustering solution. However, most focus on general datasets, and do not take into account the special characteristics of gene expression data. Also, within the ensemble-based clustering context, most existing cluster validity indices designed for task (2) do not take into account the relative importance of the different clustering solutions in the ensemble, as well as their possible dependencies, on the accuracy of the cluster validation process. To address these problems, this project proposes a new framework. It integrates a new prior-knowledge-based consensus clustering approach that assigns different confidence factors to the individual solutions in the cluster ensemble based on their conformances to the characteristics of gene expression data, with a new cluster validity index that takes into account the relative importance of the individual clustering solutions, as well as the degree of dependency between the solutions, to allow a more complete characterization of the tradeoff between data representation accuracy and parsimony.


Project number7002314
Grant typeSRG
Effective start/end date1/04/081/03/11