Identifying and Evaluating Causal Relations in Social Science Research Publications on Social Media: A Hypothesis-based Knowledge Graph Approach

Student thesis: Doctoral Thesis

Abstract

Synthesizing existing knowledge is an essential task in any research area. A synthesized knowledge base can help researchers detect hidden knowledge and evaluate the contribution of a new research publication. Classical literature review methods, such as review analysis and meta-analysis, are limited to a small set of publications. Newly emerged analysis methods, such as topic modeling and word embedding, cannot extract complex information such as causal relations and cannot provide explainable results. Therefore, I proposed a new literature review method to synthesize knowledge on social media from social science publications and derived a set of important insights from the synthesized knowledge. The main steps of the study are as follows.

First, I clarified the conceptual elements in causal relation knowledge and proposed a hypothesis-based knowledge graph method for collecting causal relation knowledge from social media publications. Causal relations are core knowledge in social science as most empirical studies aim to explore the causal relations among variables. Causal relation knowledge in publications commonly involves information about variables (such as the conceptual meanings of variables), relations (such as the direction, polarity, and strength of relations), and test result on whether the relations among variables are confirmed. The heterogeneous information is difficult to collect and represent. The knowledge graph language is powerful owing to its computable syntax and expressive semantics. In light of existing cases of knowledge graph construction in disciplines such as biomedical science, I proposed using a hypothesis-based knowledge graph approach for knowledge base construction from social science publications. In particular, I built an ontology to describe the complex concept and relation information involved in the causal relation knowledge. Moreover, I proposed to employ hypothesis statements, which are commonly provided in empirical social science publications, as an advantageous data source for causal information extraction. I also formulated an information extraction pipeline based on this data source.

Second, I implemented the proposed method and constructed a causal relation graph that represents the causal relation knowledge on social media in social science publications. In particular, I collected 16,185 SSCI publications on social media and selected 3028 publications that are judged to have explanatory hypotheses. I transformed the information extraction steps in the proposed approach into three types of and seven language processing tasks. I employed the corresponding language processing models in the natural language processing area and manually coded a publication sample to train these models. These trained models achieved good test accuracy and automatically extracted causal information from the publications. Finally, I constructed a causal graph with 234 leaf concepts and 2,171 hypothesized causal relations.

Third, I analyzed the constructed causal relation graph in terms of its concepts, relations, triads, and publications’ contribution to it, and derived a set of insightful conclusions. The analysis results show that “whom/user” are more popular and important than other Ws in Lasswell’ model (Lasswell, 1948). That is, users’ online perceptions and behaviors are hypothesized and confirmed more frequently than users’ offline perceptions and behaviors and variables of the communicator, content, and channel in social media. In addition, the most popular and important causal relations are from “whom/user” to “whom/user.” That is, most hypothesized and confirmed causal relations in publications are among users’ online perceptions and behaviors. The results also indicate that indirect causal relations affect the generation of a direct relation, which is consistent with the triadic closure theory. This indicates that indirect causal relations in existing causal relations are hidden knowledge. Another important result is that the added concept number and relation number either decrease or remain unchanged despite the yearly increase in the number of publications being published. At the same time, publication contribution in terms of adding new concepts and relations is decreasing and more publications are merely replicating existing causal relations. Moreover, publication contribution has little correlation with publication citation.

The main contributions of this study are as follows. First, it proposes a methodology framework for synthesizing causal relation knowledge on social media from social science publications. In particular, the framework clarifies what conceptual elements causal relation knowledge contains, how to represent these elements in a knowledge graph ontology, and how to extract these elements from social media publications using natural language processing models with detailed procedures. Second, it constructed a causal relation knowledge graph and identified a set of insightful conclusions by analyzing the knowledge graph. In particular, conclusions such as which concepts and relations are popular and important, whether indirect relations can affect the generation of direct relations, and the variation of publication contribution to knowledge are identified based on the constructed causal relation knowledge graph.
Date of Award6 Jul 2021
Original languageEnglish
Awarding Institution
  • City University of Hong Kong
SupervisorJian Hua Jonathan ZHU (Supervisor) & Yingcai Wu (External Supervisor)

Keywords

  • knowledge measurement
  • knowledge graph
  • knowledge graph construction
  • social media

Cite this

'