Spectral Algorithms for Community Detection in Graphs with Applications in Automatic Text Summarization

基於譜算法的圖社區發現及其在文本摘要中的應用

Student thesis: Doctoral Thesis

View graph of relations

Related Research Unit(s)

Detail(s)

Awarding Institution
Supervisors/Advisors
Award date6 Sep 2019

Abstract

This thesis proposes three new spectral algorithms for the detection of communities of nodes in large-scale networks.

The majority of existing methods of community detection in networks produce disjoint communities of nodes. However, many real-world networks naturally involve overlapping communities allowing multiple community memberships for each node. On the other hand, most existing algorithms focus on extracting communities from undirected networks in which all connections are reciprocal, making them unable to deal with networks involving unidirectional connections.

The spectral clustering algorithm is a popular method of non-overlapping community detection in undirected networks that is well-known for its computational efficiency and its strong theoretical foundation. This thesis proposes three different community detection algorithms that also rely on spectral approaches but that extend the scope of the traditional spectral clustering. The first and second proposed algorithms tackle the task of community detection in directed networks. Taking the directionality of connections into account, the two proposed spectral algorithms are able to partition nodes in directed networks respectively with a cyclic and an acyclic pattern of connections between communities of nodes. The two proposed methods are shown to extract meaningful communities from real-world networks including the global network of business agreements between Internet Service Providers. The third proposed method is a spectral method for overlapping community detection that extends the theoretical foundation of spectral clustering. The algorithm outperforms state-of-the-art methods of overlapping community detection on real-world social networks with millions of nodes and edges.

As an application of community detection in graphs, this thesis also proposes a new method of automatic text summarization. Sentence clustering is often performed by automatic summarization systems in the process of identifying important sentences in large corpora of documents. The proposed graph-based summarizer relies on the definition of a graph-like structure with sentences as nodes from which overlapping communities of sentences are extracted using an approach similar to our proposed spectral clustering algorithm for overlapping community detection. Key sentences are then extracted by using the concept of hypergraph transversal which allows to identify the sentences belonging to the most relevant communities. The proposed summarization framework outperforms state-of-the-art summarization systems in terms of efficiency and effectiveness on real-world datasets of news articles. Specifically, the combined usage of sentence communities and hypergraph transversals leads to an increase of 6% of ROUGE-SU4 F-measure compared to state-of-the-art summarization systems.

    Research areas

  • Community Detection, Spectral Clustering, Complex Networks, Directed Graphs, Cyclic Graphs, Acyclic Graphs, Overlapping Communities, Normalized Cut, Text Summarization, Hypergraph, Hypergraph Transversals, Submodular Functions