Community Detection in Directed Networks
有向圖中的社區檢測
Student thesis: Doctoral Thesis
Author(s)
Related Research Unit(s)
Detail(s)
Awarding Institution | |
---|---|
Supervisors/Advisors |
|
Award date | 6 Jul 2022 |
Link(s)
Permanent Link | https://scholars.cityu.edu.hk/en/theses/theses(53dc42b9-2084-4490-9bbe-544ff38ad051).html |
---|---|
Other link(s) | Links |
Abstract
Network data has attracted long-running and continuing interests and has a broad spectrum of applications in various domains such as social networks and biological networks. One of the fundamental interests in network data is its community structure, where nodes within the same community tend to share similar characteristics. Most existing methods focus on undirected networks, where nodes are linked with undirected edges and nodes in the same community are more tightly linked than those in different communities. Yet many real network data come with directed edges, indicating asymmetric information flow among nodes. In this thesis, we first introduce the background of community detection in network data. Then, we move on to our contribution in community detection in directed networks, including directed community detection with network embedding and the overlapped stochastic co-block model.
To detect communities in directed networks, people try to convert directed networks into undirected ones and then apply existing community detection methods for undirected networks. Yet such methods completely ignore the difference between sending and receiving edges in directed networks and thus often lead to sub-optimal performance. To accommodate the different sending and receiving patterns of nodes in directed networks, a network embedding model is developed. Specifically, the network embedding model decomposes each node into an out-node that sends edges and an in-node that receives edges and introduces two sets of vectors to represent the out- and in-nodes in a low-dimensional space. Thus, it allows for the detection of different out- and in-communities of the same nodes. An efficient alternative updating scheme is developed to tackle the resultant optimization task, and numerical experiments on both simulated and real examples indicate that the proposed method is superior than most existing methods. Moreover, asymptotic consistencies of the proposed method are also established.
Despite the widely reported success, existing community detection methods have been primarily focused on exclusive community structure, while the extension to overlapped community structure remains largely lacking. However, overlapped community structure is prevalent in directed and undirected networks. To further allow nodes in directed networks to belong to multiple communities, we formulate the stochastic co-block model as a generative model, where the unknown community membership is modeled as a multinomial distribution or a joint Bernoulli distribution, corresponding to the exclusive or overlapped community structure. Most importantly, its generic identifiability with exclusive or overlapped community structure is established. Furthermore, an efficient alternative updating scheme based on the variational EM algorithm is developed to solve the resultant optimization task, and numerical experiments on both simulated and real examples demonstrate that the developed model achieves superior performance against some popular competitors.
In the end, a conclusion including a short summary and some future work is provided. In future work, we first introduce a degree-corrected network embedding model, which is a possible extension of the network embedding model. Then, we consider a promising direction to extend the overlapped stochastic co-block model to general bipartite networks.
To detect communities in directed networks, people try to convert directed networks into undirected ones and then apply existing community detection methods for undirected networks. Yet such methods completely ignore the difference between sending and receiving edges in directed networks and thus often lead to sub-optimal performance. To accommodate the different sending and receiving patterns of nodes in directed networks, a network embedding model is developed. Specifically, the network embedding model decomposes each node into an out-node that sends edges and an in-node that receives edges and introduces two sets of vectors to represent the out- and in-nodes in a low-dimensional space. Thus, it allows for the detection of different out- and in-communities of the same nodes. An efficient alternative updating scheme is developed to tackle the resultant optimization task, and numerical experiments on both simulated and real examples indicate that the proposed method is superior than most existing methods. Moreover, asymptotic consistencies of the proposed method are also established.
Despite the widely reported success, existing community detection methods have been primarily focused on exclusive community structure, while the extension to overlapped community structure remains largely lacking. However, overlapped community structure is prevalent in directed and undirected networks. To further allow nodes in directed networks to belong to multiple communities, we formulate the stochastic co-block model as a generative model, where the unknown community membership is modeled as a multinomial distribution or a joint Bernoulli distribution, corresponding to the exclusive or overlapped community structure. Most importantly, its generic identifiability with exclusive or overlapped community structure is established. Furthermore, an efficient alternative updating scheme based on the variational EM algorithm is developed to solve the resultant optimization task, and numerical experiments on both simulated and real examples demonstrate that the developed model achieves superior performance against some popular competitors.
In the end, a conclusion including a short summary and some future work is provided. In future work, we first introduce a degree-corrected network embedding model, which is a possible extension of the network embedding model. Then, we consider a promising direction to extend the overlapped stochastic co-block model to general bipartite networks.