Nonnegative Matrix Factorization on Manifold with Its Applications in Classification and Clustering

流形正則非負矩陣分解方法及其在分類與聚類中的應用

Student thesis: Doctoral Thesis

View graph of relations

Author(s)

Related Research Unit(s)

Detail(s)

Awarding Institution
Supervisors/Advisors
Award date14 Nov 2018

Abstract

Nonnegative matrix factorization (NMF) is a well-known paradigm for data representation, which represents data as an additive combination of a set of nonnegative basis vectors. As an effective technique for dimensionality reduction and discovering the underlying structures in data, NMF has been successfully applied in data mining and machine learning. It is observed that improving the discriminability of NMF will result in the performance of NMF for data analysis superior to the conventional one. Therefore, in this thesis, we studied the problem of improving the discriminability of NMF for difference applications.

At first, we consider the application of NMF based community detection. Community structure is the most significant attribute of networks, which is often identified to help discover the underlying organization of networks. Taking higher-order information among the nodes into consideration, we propose a novel framework named mixed hypergraph regularized nonnegative matrix factorization (MHGNMF), where the higher-order information is encoded into NMF by hypergraph. The hypergraph regularization term forces the nodes within the identical hyperedge to be projected onto the same latent subspace, so that a more discriminative representation is achieved. In the proposed framework, we generate a set of hyperedges by mixing two kinds of neighbors for each centroid, which makes full use of topological connection information and structural similarity information. By testing on two artificial benchmarks and eight real-world networks, the proposed framework demonstrates better detection results than other state-of-the-art methods.

Secondly, we design a novel model for NMF based classification, namely dual embedding regularized NMF (DENMF). Traditional NMF based classification methods first perform NMF or its variants on the input data samples to obtain their low-dimensional representations, which are successively classified by means of a typical classifier (e.g., kNN and SVM). Such a stepwise manner may overlook the dependency between the two processes, resulting in the compromise of the classification accuracy. While the basic idea of DENMF is to elegantly unify the two processes by formulating a constrained optimization model, so that the solution of DENMF can find the low-dimensional representations and assignment matrix simultaneously. Specifically, input data samples are projected onto a couple of low-dimensional spaces, i.e., feature and label spaces, and locally linear embedding is employed to preserve the identical local geometric structure in different spaces. Experimental results over five benchmark datasets demonstrate that DENMF can achieve higher classification accuracy than state-of-the-art algorithms.

Lastly, we propose a novel symmetric NMF (SNMF) based semi-supervised clustering method, namely pairwise constraint propagation induced symmetric nonnegative matrix factorization (PCPSNMF). In contrast to existing SNMF based clustering methods that empirically construct the similarity matrix and rigidly introduce the supervisory information to the assignment matrix,the proposed PCPSNMF is capable of learning the similarity and assignment matrices adaptively and simultaneously by formulating a single constrained optimization problem. Specifically, a small amount of supervisory information in the form of pair-wise constraints is introduced in a flexible way to guide the construction of the similarity matrix, and the two matrices communicate with each other to achieve mutual refinement until convergence. Experimental results over several benchmark image data sets demonstrate that PCPSNMF is less sensitive to initialization and produces higher clustering performance, compared with state-of-the-art methods.