Detection of Hyperplanar Co-cluster Patterns in Multidimensional Singular Vector Spaces

Project: Research

View graph of relations

Description

A major task in gene expression data analysis is to find coherent patterns, which involvesubsets of but not all samples in each dimension. For example, we may want to detect asubset of genes that are co-regulated under a subset of conditions during a subset oftime points. This requires co-clustering techniques, which perform clustering in alldirections simultaneously in multidimensional data.We have recently discovered that after an M-th order tensor is factorized using highorder singular value decomposition (HOSVD), we can convert an M-dimensional co-clusteringproblem to the analysis of two-dimensional (2D) data with M singularmatrices. The coherence in the tensor data is reflected in the hyperplanar relationsamong the singular vectors. The 2D data analysis problem can be solved usinghyperplane detection algorithms. In this project, we will develop novel methods for co-clusteringtensor data based on these useful properties. We will investigate effectiveprocedures to detect different hyperplanar co-clusters, study robust methods to reducethe influence of noise and outliers, design efficient computer algorithms, and apply ourtechniques to real-world multidimensional data analysis problems.Our approach will have several advantages in comparison with existing methods. Forexample, our algorithm will be able to model different types of co-cluster patterns usingthe same formulation. Some commonly studied co-clusters can be viewed as special casesof our model. In many cases, tensor decomposition provides significant data reduction. Inaddition, HOSVD decomposes a co-cluster so that we can study it in individualdimensions. Our technique will also establish useful links between co-clustering andlinear algebraic relations among singular vectors. In our method, HOSVD can be used toreduce the noise effect by separating tensor data into signal subspace and noisesubspace and we only need to consider the singular vectors in the signal subspace. Ourmodel can also incorporate several other powerful noise reduction techniques, such asgeneralized projections, relaxation labeling and data imputation and restorationprocedures.Our work will lead to a new class of robust methods for co-clustering tensor data, whichis an important problem but is difficult and often considered intractable mathematicallyand computationally. We will provide a systematic strategy to design robust computeralgorithms for the detection of hyperplanar co-cluster patterns. The techniquesdeveloped in this project will find many practical applications, such as in genomicsequence, gene expression and image data analysis, and disease diagnosis such as cancertype and sub-type identification.

Detail(s)

Project number9042065
Grant typeGRF
StatusFinished
Effective start/end date1/01/1528/05/19

    Research areas

  • Scientific computing,Co-cluster patterns,Gene expression data analysis,Hyperplane,