Big Data Analytics for Detecting Deceptive Product Comments in Online Social Media

Project: Research

View graph of relations


While Gartner predicted that around 10-15 percent of online reviews were fake, recent studiesreveal that fake reviews has become increasingly more prevalent and up to 30 percent of onlinereviews could be fake. The Food and Drug Administration of China has just released the names often e-Commerce firms that disseminate a large number of deceptive comments about medicalproducts in China. A year ago, the Guardian reported that the attorney general of New York hadset up a fake yoghurt shop in Brooklyn to ensnare companies that posted deceptive reviews atGoogle, Yahoo, Yelp, and other online social media; 19 companies were eventually caught andfined a total of $350K. Deceptive product comments, which are supposedly to boost perpetrators’sales, can bias consumers’ purchase decisions and may have a detrimental effect on the sales ofother firms. Accordingly, there is a pressing need to develop an effective and efficient methodologyto detect and filter deceptive product comments posted to online social media.However, with the explosive growth of user-contributed product comments in online social mediasuch as Twitter and Sina Weibo, existing deceptive product comment detection methods simplycannot scale up with these “big data”. For instance, both content- and behavior-based methodsutilize the feature of near duplicate contents to detect deceptive reviews. Nevertheless, thecomputational complexity of near duplicate comment detection is characterized by O(N2), where Nis the number of comments. Given tens of millions of comments posted to online social mediaevery day, existing near duplicate detection methods become impractical. On the other hand, fewstudies have examined the diffusion patterns of online comments for detecting deceptive reviews inonline social networks. To the best of our knowledge, none of the work reported in existingliterature has examined the problem of scalable feature extraction (e.g., extracting near duplicateand diffusion pattern features) from big data for detecting deceptive product comments.Guided by the Design Science research methodology, the goal of the proposed research project isto design novel artifacts (e.g., big data analytics methods and their instantiations) to fill theaforementioned research gaps. In particular, the aims of our research are as follows:1. To design novel big data analytics methods for scalable feature extraction from big socialmedia data to enhance deceptive product comment detection.2. To empirically evaluate the effectiveness, efficiency, and usability of the proposed big dataanalytics methods for detecting deceptive product comments through a series of controlledexperiments and a field test involving real-world enterprise users.3. To reflect and discover sound design principles related to both the design processes anddesign artifacts for building enterprise-class big data analytics services.The big data analytics service for deceptive product comment detection developed through thisproject will be adopted by e-Commerce firms and online opinion monitoring agencies to improvethe overall quality and hygiene of bilingual online product comments. We plan for technologytransfer via our industrial partner,, the second largest e-Commerce firm in the GreaterChina area. The proposed research project will foster fair trading and enhance consumer welfare inthe Greater China area. In addition, the design principles of big data analytics services uncoveredthrough this project enable enterprises of this region to be able to tap into the big data through thecost-effective development of various kinds of big data analytics services. As a result, they candevelop more effective business strategies and become more competitive in the global marketplaceby leveraging the business intelligence extracted from big data.


Project number9042255
Grant typeGRF
Effective start/end date1/01/1624/06/20

    Research areas

  • Big Data,Big Data Analytics,Deception Detection,Social Media,