A weakly-supervised extractive framework for sentiment-preserving document summarization
Research output: Journal Publications and Reviews › RGC 21 - Publication in refereed journal › peer-review
Author(s)
Related Research Unit(s)
Detail(s)
Original language | English |
---|---|
Pages (from-to) | 1401–1425 |
Journal / Publication | World Wide Web |
Volume | 22 |
Issue number | 4 |
Online published | 30 May 2018 |
Publication status | Published - Jul 2019 |
Link(s)
Abstract
The popularity of social media sites provides new ways for people to share their experiences and convey their opinions, leading to an explosive growth of user-generated content. Text data, owing to the amazing expressiveness of natural language, is of great value for people to explore various kinds of knowledge. However, considerable user-generated text contents are longer than what a reader expects, making automatic document summarization a necessity to facilitate knowledge digestion. In this paper, we focus on the reviews-like sentiment-oriented textual data. We propose the concept of Sentiment-preserving Document Summarization (SDS), aiming at summarizing a long textual document to a shorter version while preserving its main sentiments and not sacrificing readability. To tackle this problem, using deep neural network-based models, we devise an end-to-end weakly-supervised extractive framework, consisting of a hierarchical document encoder, a sentence extractor, a sentiment classifier, and a discriminator to distinguish the extracted summaries from the natural short reviews. The framework is weakly-supervised in that no ground-truth summaries are used for training, while the sentiment labels are available to supervise the generated summary to preserve the sentiments of the original document. In particular, the sentence extractor is trained to generate summaries i) making the sentiment classifier predict the same sentiment category as the original longer documents, and ii) fooling the discriminator into recognizing them as human-written short reviews. Experimental results on two public datasets validate the effectiveness of our framework.
Research Area(s)
- document summarization, end-to-end neural network, reinforcement learning, sentiment-preserving
Citation Format(s)
A weakly-supervised extractive framework for sentiment-preserving document summarization. / Ma, Yun; Li, Qing.
In: World Wide Web, Vol. 22, No. 4, 07.2019, p. 1401–1425.
In: World Wide Web, Vol. 22, No. 4, 07.2019, p. 1401–1425.
Research output: Journal Publications and Reviews › RGC 21 - Publication in refereed journal › peer-review