Learning Domain Specific Opinion Lexicons for the Context-Sensitive Opinion Retrieval of Bilingual Online Comments
Project: Research
Researcher(s)
- Yiu Keung Raymond LAU (Principal Investigator / Project Coordinator)Department of Information Systems
- Peter D BRUZA (Co-Investigator)
- Kam-Fai Wong (Co-Investigator)
Description
Do you want to know people’s opinions about “Kowloon Shangri-La hotel” or “the Hong Kong-made electric MyCar”? The management of Kowloon Shangri-La or the manufacturer of MyCar is keen to learn about customers’ perceptions of its services and products for more customer-centric promotion and marketing. In the Web 2.0 era, user-contributed data is the norm, and there is an explosive growth of the number of user-contributed comments on the Web. Manually browsing through all the online comments has become impractical. There is a pressing need to develop automated opinion retrieval systems that organizations or individuals can use to more efficiently retrieve and analyze the online comments about various entities. For example, if the Hong Kong firm1 that exported Aqua Dot toys to the U.S. had been able to utilize an opinion retrieval system to monitor consumers’ online comments about its toys on an ongoing basis, it might have recalled its products, which were found to be contaminated, much earlier, and hence minimized both the financial loss suffered and the damage to the company’s reputation.Opinion retrieval involves multi-disciplinary research such as information retrieval, text mining, and computational linguistics. In the field of information retrieval (IR), opinion retrieval is seen as a special kind of document retrieval and ranking process that aims at retrieving views on certain entities such as products, people, organizations rather than simply retrieving topical information on the entities. One sub-task commonly set in opinion retrieval is to determine the orientation (or polarity) of an opinionated expression (such as whether it is a positive or negative expression). For research on opinion retrieval, a Blog Track of the annual TREC conference2 has been established to benchmark the performance of state-of-the-art opinion retrieval systems. Commercial opinion retrieval systems such as Reuters NewsScope Sentiment Engine3 can extract English language sentiments related to a target company according to basic linguistic cues and a set of pre-defined sentiment indicators. To enable an automated opinion retrieval process to be applied to a wide range of business domains and languages, it is desirable that the opinion retrieval system used can automatically learn domain specific sentiment indicators (i.e., an opinion lexicon), because constructing sentiment indicators manually is very labor-intensive and the expertise required for their construction may not even be available for certain domains.Nevertheless, automated opinion lexicon construction involves several fundamental research challenges, as does opinion retrieval in general. First, there is inevitably a degree of uncertainty related to the identification of targeted entities and the associated sentiments expressed in natural language. Second, it is difficult to accurately determine the polarity of a sentiment across various domains. For example, the sentiment “unpredictable” has a negative orientation in the context of “automotive”. However, it has a positive orientation in the context of “movie”, such as an “unpredictable plot”. Finally, opinion retrieval not only applies to an entity but it may also apply to the finer-grained entity feature level (e.g., whether the “gearbox” of MyCar is good or not).The aim of the proposed research project is to leverage on our team’s successful research in the areas of automatic domain ontology extraction (as related to the extraction of entities, entity features, and their relationships), context-sensitive information retrieval (as related to predicting the context-dependent polarity of sentiments), informational inference (as related to inferring the implicit relationships between entities and sentiments), and bilingual information processing (as related to the issue of bilingual opinion retrieval) in developing a novel context-sensitive opinion retrieval methodology that can be applied to a variety of problem domains. In particular, the automated construction of domain-specific opinion lexicons will be explored to support context-sensitive opinion retrieval. The practical implication of our proposed research is that a more effective bilingual opinion retrieval technology will be developed to support Hong Kong organizations in extracting business intelligence (BI) from online comments to improve the quality of their products and services and enhancing their competitiveness in the global market. According to the 2009 Gartner executive programs survey, BI applications have been seen as the top technology priority by the chief information officers around the world for the fourth year in a row.Detail(s)
Project number | 9041569 |
---|---|
Grant type | GRF |
Status | Finished |
Effective start/end date | 1/08/10 → 2/04/14 |