TY - JOUR
T1 - Review Authorship Attribution in a Similarity Space
AU - Qian, Tie-Yun
AU - Liu, Bing
AU - Li, Qing
AU - Si, Jianfeng
PY - 2015
Y1 - 2015
N2 - Authorship attribution, also known as authorship classification, is the problem of identifying the authors (reviewers) of a set of documents (reviews). The common approach is to build a classifier using supervised learning. This approach has several issues which hurts its applicability. First, supervised learning needs a large set of documents from each author to serve as the training data. This can be difficult in practice. For example, in the online review domain, most reviewers (authors) only write a few reviews, which are not enough to serve as the training data. Second, the learned classifier cannot be applied to authors whose documents have not been used in training. In this article, we propose a novel solution to deal with the two problems. The core idea is that instead of learning in the original document space, we transform it to a similarity space. In the similarity space, the learning is able to naturally tackle the issues. Our experiment results based on online reviews and reviewers show that the proposed method outperforms the state-of-the-art supervised and unsupervised baseline methods significantly.
AB - Authorship attribution, also known as authorship classification, is the problem of identifying the authors (reviewers) of a set of documents (reviews). The common approach is to build a classifier using supervised learning. This approach has several issues which hurts its applicability. First, supervised learning needs a large set of documents from each author to serve as the training data. This can be difficult in practice. For example, in the online review domain, most reviewers (authors) only write a few reviews, which are not enough to serve as the training data. Second, the learned classifier cannot be applied to authors whose documents have not been used in training. In this article, we propose a novel solution to deal with the two problems. The core idea is that instead of learning in the original document space, we transform it to a similarity space. In the similarity space, the learning is able to naturally tackle the issues. Our experiment results based on online reviews and reviewers show that the proposed method outperforms the state-of-the-art supervised and unsupervised baseline methods significantly.
KW - authorship attribution
KW - similarity space
KW - supervised learning
UR - http://www.scopus.com/inward/record.url?scp=84921381510&partnerID=8YFLogxK
UR - https://www.scopus.com/record/pubmetrics.uri?eid=2-s2.0-84921381510&origin=recordpage
U2 - 10.1007/s11390-015-1513-6
DO - 10.1007/s11390-015-1513-6
M3 - RGC 21 - Publication in refereed journal
SN - 1000-9000
VL - 30
SP - 200
EP - 213
JO - Journal of Computer Science and Technology
JF - Journal of Computer Science and Technology
IS - 1
ER -