Who are the spoilers in social media marketing? Incremental learning of latent semantics for social spam detection

Research output: Journal Publications and ReviewsRGC 21 - Publication in refereed journalpeer-review

24 Scopus Citations
View graph of relations


Original languageEnglish
Pages (from-to)51-81
Journal / PublicationElectronic Commerce Research
Issue number1
Online published8 Oct 2016
Publication statusPublished - Mar 2017


With the rise of social web, there has also been a great concern about the quality of user-generated content on social media sites (SMSs). Deceptive comments harm users’ trust in online social media and cause financial loss to firms. Previous studies use various features and classification algorithms to detect and filter social spam on several social media platforms. However, to the best of our knowledge, previous studies have not exploited both probabilistic topic modeling and incremental learning to detect social spam on SMSs. Thus, the main contribution of this paper is design of a novel detection methodology that combines topic- and user-based features to improve the effectiveness of social spam detection. The proposed methodology exploits a probabilistic generative model, namely the labeled latent Dirichlet allocation (L-LDA), for mining the latent semantics from user-generated comments, and an incremental learning approach for tackling the changing feature space. An experiment based on a large dataset extracted from YouTube demonstrates the effectiveness of our proposed methodology, which achieves an average accuracy of 91.17 % in social spam detection. Our statistical analysis reveals that topic-based features significantly improve social spam detection, which has significant implications for business practice.

Research Area(s)

  • Big data, Incremental learning, Machine learning, Social spam, Spam detection, Topic modeling