Tensor Factorization and Learning for Object Detection and Recognition in Digital Images

Student thesis: Doctoral Thesis

Abstract

Matrix decompositions and their higher order generalizations as tensors are invading research trends in signal and image processing. Being able to reproduce latent factors and efficiently accommodate the large data constraints, matrix and tensor decompositions are successfully employed in solving vital problems in computer vision fields including heath care systems, surveillance, human computer interactions etc.

In the first phase of this work we investigate the problem of analysing facial expressions from images and propose a novel feature selection strategy using singular value decomposition (SVD) based co-clustering. The low-rank approximation of the feature matrix from several expression samples is studied to group the subset of samples and important features to form co-clusters. A discriminative set of features is then selected, based on their non-inclusive information in co-clusters. The proposed method does not simply combine input features through a transformation, but considers coherent patterns with a subset of samples and a subset of features. The selected features retain their original physical meanings and are closely related to the samples that have specific characteristics. Experiments on publically available image databases validate the existence and effectiveness of these learned facial features.

The Higher Order Singular Value Decomposition (HOSVD) can be used as a major tool for decomposing a tensor into N-mode singular vectots, as the tensor is built up on the natural N-modes. In the second phase, the matrix based decomposition is extended to its higher form, i.e. tensors and a well-known problem of object tracking is studied using the N-mode incremental SVD. We specifically designed a system to track a human face automatically in video as well as recognize its underlying facial expressions. We illustrated the tensor based modelling of video data along with the incremental tensor learning of mode matrices. Through our self-prepared and benchmark videos, we demonstrated the effectiveness of the system. The tensor based learning is not just limited to multimedia, but can be further extended to build several other applications, such as moving object detection. The problem of background subtraction can be treated as a low rank tensor representation, where the low rank contents represents the background information. The spatial differences between them can yield better motion estimation. We also investigated this problem of using incremental tensor learning and achieved significant improvements in segmenting videos.

In the later part of this work, the CUR decomposition based methods are studied for graph matching problems and are applied in finding image correspondences. The low rank column and row matrices from CUR can be integrated into the matching framework to make this application feasible for a number of points in images. The tensor form of CUR decomposition is introduced to confront relaxation labelling without accessing all the entries of the compatibility tensor. The proposed method is validated on synthetic and real images datasets in comparison with several state-of-the-art algorithms. Extensive experiments on benchmark images datasets in the presence of a cluttered background indicates the robustness of the proposed method.

Through our results we also demonstrate, how tensor methods are appreciated as natural data structures in many computer vision problems, while remodelling the tensor problem into the 2-D matrix problem for the sake of computational suitability, so that we may lose frequent attributes. We believe that the findings from this research can foster new research trends and novel applications. We expect that learning with tensors would be more useful and practicable and can further progress with challenging problems of big data.
Date of Award16 Oct 2018
Original languageEnglish
Awarding Institution
  • City University of Hong Kong
SupervisorHong YAN (Supervisor)

Keywords

  • Tensor factorization, Facial expression recognition, Coclustering, Object tracking, graph matching, CUR decomposition.

Cite this

'