Mining and analyzing customer opinions/sentiments of Web 2.0 for business applications

  • Kaiquan XU

Student thesis: Doctoral Thesis

Abstract

With the explosive growth in the amount of user-generated review data in the era of Web 2.0 (social networking sites, blogs, mini-blogs, discussion forums, online shopping websites), there is a pressing need to develop effective methods and tools to automatically extract valuable business intelligence from these user opinion data. In addition, the large numbers of user reviews often contain information about competitors and have become a new source for mining competitive intelligence. Many companies have begun to use social networking sites (SNS) as an important channel and platform to do online marketing and reputation management. Thus, analyzing users' sentiments on SNS has become key for these business applications. Semantically annotating opinion data is an effective way to mine valuable information from the large number of customer opinions. Although supervised machine learning approaches have been explored for semantically annotating user opinions to facilitate market intelligence generation, such approaches often require numerous manually labelled training examples to produce accurate semantic annotations. In this study, we propose an active learning approach that can train a state-of-the-art, large-margin classifier with substantially fewer labelled training examples, and yet produce accurate semantic annotations of user opinions. In particular, our active learning method is underpinned by a novel query function that can efficiently locate the most informative unlabeled examples such that a large-margin classifier can learn the optimal parameter values based on them. Rigorous evaluation involving a benchmark test and an empirical test with real-world opinion data extracted from Amazon.com reveals that the proposed active learning method can train effective classifiers with far fewer training examples and yet achieve similar performance to a typical state-of-the-art classifier without active learning. For effectively mining competitive intelligence from customer opinions of Web 2.0, we propose a novel graphical model to extract and visualize comparative relations between products from customer reviews, with the interdependencies among relations taken into consideration, to help companies discover potential risks and further design new products and marketing strategies. Our experiments on a corpus of Amazon customer reviews show that our proposed method can extract comparative relations more accurately than the benchmark methods. To analyze users' sentiments on SNS, a "sentiment community" is proposed as a tool. The sentiment communities with different polarities on SNS usually represent groups of users with certain preferences in common. Thus, discovering sentiment communities is very useful for enterprises to do customer segmentation and target marketing. A novel method based on an optimization technique is proposed for discovering users' sentiment communities, and comprehensive experimental evaluations are executed to demonstrate the method's effectiveness. In summary, this dissertation covers the topics of semantically annotating opinion data, extracting comparative opinions, and analyzing users' sentiments on SNS. This work opens the door to analyzing the rich consumer-generated data of Web 2.0 and SNS for enterprises to use in business applications.
Date of Award3 Oct 2011
Original languageEnglish
Awarding Institution
  • City University of Hong Kong
SupervisorShaoyi Stephen LIAO (Supervisor)

Keywords

  • Consumer satisfaction
  • Web 20
  • Data mining
  • Evaluation

Cite this

'