Machine Learning-Enhanced Graph Analytics: Advancements and Applications to Pandemic and Infodemic Analysis
基於機器學習的圖論分析: 大流行病以及信息流行病分析的進展與應用
Student thesis: Doctoral Thesis
Author(s)
Related Research Unit(s)
Detail(s)
Awarding Institution | |
---|---|
Supervisors/Advisors |
|
Award date | 20 May 2024 |
Link(s)
Permanent Link | https://scholars.cityu.edu.hk/en/theses/theses(21a32d72-dc74-4aa9-a567-b8ed46e6c97e).html |
---|---|
Other link(s) | Links |
Abstract
This dissertation presents a comprehensive investigation into the application of machine learning techniques and graph analytics to enhance pandemic and infodemic analysis. Through this research, we demonstrate how advanced artificial intelligence (AI) techniques can drive societal resilience in the face of global health crises.
We first focus on the critical role of digital contact tracing (DCT) strategies in mitigating the spread of viruses, such as coronavirus disease 2019 (COVID-19), while maintaining societal function. To highlight the importance of machine learning in DCT optimization, we propose a novel taxonomy classifying DCT strategies into forward, backward, and proactive contact tracing. We then categorize various DCT applications developed during the COVID-19 pandemic and propose machine learning techniques to tackle associated computational epidemiology problems for pandemic response. We address the challenges of machine learning-based DCT, offer potential solutions, and present a case study demonstrating how these insights can inform future pandemic responses.
We next examine the problem of infodemics, which refers to the rapid spread of false or misleading information through online social networks, particularly intensified during the COVID-19 pandemic. To address this issue, we leverage retweet networks to model social relationships and introduce a novel framework called MEGA, Machine Learning-Enhanced Graph Analytics, to optimize learning performance on massive graphs. By optimizing hyperparameters in vertex embedding of graph neural networks, we demonstrate how feature-based vertex embeddings preserve important subgraph features. We also illustrate how statistical features of graph datasets can be used to facilitate efficient feature engineering, enhancing the overall learning efficiency on massive graphs. The application of this methodology to infodemic risk analysis, involving detection of spambots and identification of influential spreaders, demonstrates superior computational efficiency and classification accuracy for computing accurate infodemic risk scores.
We first focus on the critical role of digital contact tracing (DCT) strategies in mitigating the spread of viruses, such as coronavirus disease 2019 (COVID-19), while maintaining societal function. To highlight the importance of machine learning in DCT optimization, we propose a novel taxonomy classifying DCT strategies into forward, backward, and proactive contact tracing. We then categorize various DCT applications developed during the COVID-19 pandemic and propose machine learning techniques to tackle associated computational epidemiology problems for pandemic response. We address the challenges of machine learning-based DCT, offer potential solutions, and present a case study demonstrating how these insights can inform future pandemic responses.
We next examine the problem of infodemics, which refers to the rapid spread of false or misleading information through online social networks, particularly intensified during the COVID-19 pandemic. To address this issue, we leverage retweet networks to model social relationships and introduce a novel framework called MEGA, Machine Learning-Enhanced Graph Analytics, to optimize learning performance on massive graphs. By optimizing hyperparameters in vertex embedding of graph neural networks, we demonstrate how feature-based vertex embeddings preserve important subgraph features. We also illustrate how statistical features of graph datasets can be used to facilitate efficient feature engineering, enhancing the overall learning efficiency on massive graphs. The application of this methodology to infodemic risk analysis, involving detection of spambots and identification of influential spreaders, demonstrates superior computational efficiency and classification accuracy for computing accurate infodemic risk scores.