A Structural and Empirical Analysis of Network-based Diffusion of Information and Virus
網絡中信息與病毒傳播的結構和實證分析
Student thesis: Doctoral Thesis
Author(s)
Related Research Unit(s)
Detail(s)
Awarding Institution | |
---|---|
Supervisors/Advisors |
|
Award date | 31 Mar 2022 |
Link(s)
Permanent Link | https://scholars.cityu.edu.hk/en/theses/theses(4f055ab9-6b6a-40c3-85be-937c51bea4ec).html |
---|---|
Other link(s) | Links |
Abstract
Diffusion is a ubiquitous process across disciplines, ranging from the diffusion of information, knowledge, and innovation to the spread of diseases, social behaviors, and norms. The advent and rapid proliferation of many social applications and media platforms have dramatically shifted the way we communicate and participate in a variety of social processes. As social ties among individuals provide the primary pathways along which interactions occur, the way we are connected and embedded in networks is of prime significance to affect various personal and collective social outcomes.
Although plenty of theoretical studies have been dedicated to this field, attempts to find empirical evidence have been largely hampered due to the lack of sufficient data suitable for such analysis. Recently, the growing availability of large-scale digital traces of human communication, along with the development of computational techniques to analyze them, provides us an unprecedented opportunity for novel investigations of diffusion in networks on a large scale. Leveraging large-scale datasets on human communication, four studies are conducted to investigate the diffusion of information and virus in networks.
Study 1 formulates a root-aware approach to quantifying the virality of cascades with a proper consideration of the root node in a diffusion tree. With applications on synthetic and empirical cascade data, this study shows the properties and potential utility of the proposed virality measure. Based on the preferential attachment mechanism, a cascade growth model is further introduced to mimic the diffusion process. The proposed model enables the interpolation between broadcast and viral spreading during the growth of cascades. Through numerical simulations, the effectiveness of the proposed model in characterizing the virality of growing cascades is also demonstrated.
Study 2 presents a comparative study of two kinds of information cascades induced by conspiracy theories and science news, respectively. The study finds that conspiracy cascades tend to propagate in a multigenerational branching process whereas science cascades are more likely to grow in a breadth-first manner. Specifically, conspiracy cascades are larger, involve more users and generations, persist longer, and are more viral and bursty than science cascades. Content analysis further reveals that conspiracy cascades are much more concerned with political and controversial topics and contain more negative and emotional words. Moreover, conspiracy cascades are more likely to be driven by a broader set of users than science cascades, thereby imposing challenges on the management of misinformation.
Study 3 presents an exploratory investigation of the way to capture personal social influence using the local network structure. To do so, we first consider the number of weakly and strongly connected components in one’s contact neighborhood and further take the coexposure network of social neighbors into consideration. Leveraging large-scale datasets collected from a knowledge-sharing platform, the analysis results show empirical evidence that the diversity of the local network structure is able to provide valuable insights to predict personal online social influence and the inclusion of coexposure network provides an additional ingredient to achieve that goal. After synthetically controlling several possible confounding factors through matching experiments, we present further evidence that social context diversity plays a nonnegligible role in elevating personal social influence.
Study 4 addresses the spatial spread of COVID-19 and associated socio-economic outcomes. Leveraging city-level mobility and case data, the analysis shows that the spatial spread of COVID-19 can be well explained by a local diffusion process in the mobility network rather than a global diffusion process, indicating the effectiveness of the implemented disease prevention and control measures. Based on the constructed case prediction model, it’s estimated that there could be very different social outcomes varying with the outbreak area. During the epidemic control period, we observe that human mobility experienced substantial reductions and the mobility network underwent remarkable local and global structural changes toward containing the spread of COVID-19.
These studies contribute to a further understanding of network-based diffusion processes in the age of big data. They also highlight the role of network structure in characterizing and driving online and offline diffusion processes and pave a way for the application of network analysis in addressing other kinds of social processes.
Although plenty of theoretical studies have been dedicated to this field, attempts to find empirical evidence have been largely hampered due to the lack of sufficient data suitable for such analysis. Recently, the growing availability of large-scale digital traces of human communication, along with the development of computational techniques to analyze them, provides us an unprecedented opportunity for novel investigations of diffusion in networks on a large scale. Leveraging large-scale datasets on human communication, four studies are conducted to investigate the diffusion of information and virus in networks.
Study 1 formulates a root-aware approach to quantifying the virality of cascades with a proper consideration of the root node in a diffusion tree. With applications on synthetic and empirical cascade data, this study shows the properties and potential utility of the proposed virality measure. Based on the preferential attachment mechanism, a cascade growth model is further introduced to mimic the diffusion process. The proposed model enables the interpolation between broadcast and viral spreading during the growth of cascades. Through numerical simulations, the effectiveness of the proposed model in characterizing the virality of growing cascades is also demonstrated.
Study 2 presents a comparative study of two kinds of information cascades induced by conspiracy theories and science news, respectively. The study finds that conspiracy cascades tend to propagate in a multigenerational branching process whereas science cascades are more likely to grow in a breadth-first manner. Specifically, conspiracy cascades are larger, involve more users and generations, persist longer, and are more viral and bursty than science cascades. Content analysis further reveals that conspiracy cascades are much more concerned with political and controversial topics and contain more negative and emotional words. Moreover, conspiracy cascades are more likely to be driven by a broader set of users than science cascades, thereby imposing challenges on the management of misinformation.
Study 3 presents an exploratory investigation of the way to capture personal social influence using the local network structure. To do so, we first consider the number of weakly and strongly connected components in one’s contact neighborhood and further take the coexposure network of social neighbors into consideration. Leveraging large-scale datasets collected from a knowledge-sharing platform, the analysis results show empirical evidence that the diversity of the local network structure is able to provide valuable insights to predict personal online social influence and the inclusion of coexposure network provides an additional ingredient to achieve that goal. After synthetically controlling several possible confounding factors through matching experiments, we present further evidence that social context diversity plays a nonnegligible role in elevating personal social influence.
Study 4 addresses the spatial spread of COVID-19 and associated socio-economic outcomes. Leveraging city-level mobility and case data, the analysis shows that the spatial spread of COVID-19 can be well explained by a local diffusion process in the mobility network rather than a global diffusion process, indicating the effectiveness of the implemented disease prevention and control measures. Based on the constructed case prediction model, it’s estimated that there could be very different social outcomes varying with the outbreak area. During the epidemic control period, we observe that human mobility experienced substantial reductions and the mobility network underwent remarkable local and global structural changes toward containing the spread of COVID-19.
These studies contribute to a further understanding of network-based diffusion processes in the age of big data. They also highlight the role of network structure in characterizing and driving online and offline diffusion processes and pave a way for the application of network analysis in addressing other kinds of social processes.
- Computational Social Science, Computational Communication, Network Analysis, Network Structure, Diffusion