Investigating and Enhancing Information Quality in Social Media: From Content and Network Perspectives

探究和提升社交媒體中的信息品質: 從內容和網絡的角度

Student thesis: Doctoral Thesis

View graph of relations

Author(s)

Related Research Unit(s)

Detail(s)

Awarding Institution
Supervisors/Advisors
Award date28 Jul 2017

Abstract

An explosive growth of low quality user-generated contents on various social media has become a significant concern for both enterprises and users because it will pose a challenge on information retrieval services and users’ decision making, compromise users’ satisfaction towards the companies, even cause tremendous loss to both consumers and merchants if the user-generated information is deceptive and misleading. Thus, effective methodologies for identifying low (high) quality information in social media should be developed to maintain the integrity and cleanliness of online social media, and how the low (high) quality information in social media will influence consumers' purchase decision and users' attention.
In this dissertation, we conduct three studies to investigate the aforementioned issues. In the first study, we propose a social media attention model to examine the moderating role of information quality (IQual) and emotion on users' social network structure, information quantity and social media attention on Twitter. We find both information quantity and social media network will positively affect social media attention. Besides, social media content including IQual and information emotion will negatively moderate the relationship between information quantity and attention; however, information emotion has a positive moderation effect on social network and attention. Our findings suggest that low IQual individuals adopt a radical strategy to attract social media attention by releasing low IQual messages with embedded URLs frequently while high IQual users utilize a conservative strategy to post high IQual posts in a stable way. In the second study, we manage to design of a novel and comprehensive detection methodology that combines word-, topic- and user-based features to improve the effectiveness of detecting the low quality information (social spam) in social publishing zone. The proposed methodology exploits a generative probabilistic topic model, namely the Labeled Latent Dirichlet Allocation (L-LDA), for mining the latent semantics from user-generated comments, and an incremental learning approach for tackling the changing feature space. In the last study, considering the aforementioned features are easy to be manipulated by the spammers, we propose a Hybrid Spammer Detection Framework (HSDF) to detect spam in social community zone. In this framework, many structural features such as closeness centrality, betweenness centrality, eigenvector centrality, degree centrality, hub and authority score, proximity prestige, eccentricity, and clustering coefficient are included besides various semantical and behavioral features. We find that by adding structural features the average performance can be improved especially in terms of the recall. However, semantical and behavioral features are still among the most discriminative features especially for the psychological sentiments features. Moreover, our newly proposed features—in(out)-degree clustering coefficient and retweeted(mention) intensity are also proved to be effective and reliable.