A weakly-supervised extractive framework for sentiment-preserving document summarization

Research output: Journal Publications and ReviewsRGC 21 - Publication in refereed journalpeer-review

9 Scopus Citations
View graph of relations

Author(s)

Detail(s)

Original languageEnglish
Pages (from-to)1401–1425
Journal / PublicationWorld Wide Web
Volume22
Issue number4
Online published30 May 2018
Publication statusPublished - Jul 2019

Abstract

The popularity of social media sites provides new ways for people to share their experiences and convey their opinions, leading to an explosive growth of user-generated content. Text data, owing to the amazing expressiveness of natural language, is of great value for people to explore various kinds of knowledge. However, considerable user-generated text contents are longer than what a reader expects, making automatic document summarization a necessity to facilitate knowledge digestion. In this paper, we focus on the reviews-like sentiment-oriented textual data. We propose the concept of Sentiment-preserving Document Summarization (SDS), aiming at summarizing a long textual document to a shorter version while preserving its main sentiments and not sacrificing readability. To tackle this problem, using deep neural network-based models, we devise an end-to-end weakly-supervised extractive framework, consisting of a hierarchical document encoder, a sentence extractor, a sentiment classifier, and a discriminator to distinguish the extracted summaries from the natural short reviews. The framework is weakly-supervised in that no ground-truth summaries are used for training, while the sentiment labels are available to supervise the generated summary to preserve the sentiments of the original document. In particular, the sentence extractor is trained to generate summaries i) making the sentiment classifier predict the same sentiment category as the original longer documents, and ii) fooling the discriminator into recognizing them as human-written short reviews. Experimental results on two public datasets validate the effectiveness of our framework.

Research Area(s)

  • document summarization, end-to-end neural network, reinforcement learning, sentiment-preserving