Learning Methods for Adaptive Graphs in Data Analysis

數據分析中的自適應圖表示學習方法

Student thesis: Doctoral Thesis

View graph of relations

Author(s)

Related Research Unit(s)

Detail(s)

Awarding Institution
Supervisors/Advisors
Award date30 Aug 2021

Abstract

The graph is a ubiquitous data structure that provides a natural way to represent the entities and relationships between them. Learning a graph to uncover the underlying structure of the data plays a vital role in data analysis. However, in real-world applications, the data is usually high dimensional, noisy, and even differently distributed. Thus constructing high-quality graphs that can accurately describe the relationships between the data samples is a challenging problem. This thesis focuses on designing learning-based graph construction algorithms. The main contributions are summarized as follows:

First, we propose a novel semi-supervised affinity matrix learning algorithm, i.e., learning an affinity matrix of data samples under the supervision of a small number of pairwise constraints (PCs). By observing that both the matrix encoding pairwise constraints named pairwise constraint matrix (PCM) and the empirically constructed affinity matrix (EAM) express the similarities between samples, we assume that both of them are generated from a latent affinity matrix (LAM) that can depict the ideal pairwise relationships between samples. Specifically, PCM can be thought of as a partial observation of the LAM, while EAM is a fully observed one but corrupted with noise/outliers. To this end, we transform the affinity matrix learning problem as the recovery of the LAM guided by the PCM and EAM, which can be formulated as a convex optimization problem. Extensive experiments on benchmark data sets indicate the proposed model outperforms some state-of-the-art algorithms in constrained clustering and dimensionality reduction.

Secondly, by further considering the imbalance characteristic of PCs, we propose a novel imbalance-aware pairwise constraint propagation (PCP) method to investigate the pairwise relationships between samples. Specifically, PCs are composed of must-link constrains and cannot-link constraints. Unlike the existing methods including the above proposed model that adopt a single representation, we propose using two separate carriers to represent the two types of links. And the propagation is driven by the structure embedded in data samples and the regularization of the local, global, and complementary structures of the two carries. Our method is elegantly cast as a well-posed constrained optimization model, which can be efficiently solved. Experimental results demonstrate that the proposed PCP method can generate more high-fidelity PCs than the recent PCP algorithms. In addition, the augmented PCs by our method produce higher accuracy than state-of-the-art PCP methods when applied to constrained clustering. This is the first PCP method taking the imbalance property of PCs into account to the best of our knowledge.

Thirdly, to further adaptively learn the similarity relationship, we propose a new semi-supervised low-rank graph learning method. To be specific, we linearly approximate each sample with others under the regularization of the low-rankness of the matrix formed by the approximation coefficient vectors of all the samples. In the meanwhile, based on the above proposed PCP method, we enhance the dissimilarity information of the available PCs. We seamlessly combine the two adversarial learning processes to achieve mutual guidance. We cast our method as a well-posed constrained optimization problem and provide an efficient alternating iterative algorithm to solve it. Experimental results on the classification task indicate the effectiveness and efficiency of the proposed method.

Lastly, we extend the affinity matrix learning problem from single view to multi-view to obtain more accurate pairwise relationships between samples. As aforementioned, all the above proposed models are semi-supervised methods that need some supervisory information. However, in real-world applications, sometimes there is no supervisory information available; thus, we propose an unsupervised affinity matrix learning algorithm by fully exploring the raw data. Specifically, from the perspective of the multi-view representation, we explore the complementary information across different views with a structural tensor low-rank norm to improve the confidence of the similarity relationship. We explicitly formulate the method as a constrained optimization problem and provide an efficient algorithm to solve it. We experimentally demonstrate the advantages of our approach over state-of-the-art ones when applying the constructed graphs for data classification and clustering.