An Effective and Scalable Framework for Authorship Attribution Query Processing

Research output: Journal Publications and Reviews (RGC: 21, 22, 62)21_Publication in refereed journalpeer-review

5 Scopus Citations
View graph of relations

Author(s)

  • Kanatip Chitavisutthivong
  • Sukrit Sriratanawilai
  • Yaohai Xu
  • Dickson Chow
  • Thanawin Rakthanmanon

Related Research Unit(s)

Detail(s)

Original languageEnglish
Pages (from-to)50030-50048
Journal / PublicationIEEE Access
Volume6
Online published10 Sep 2018
Publication statusOnline published - 10 Sep 2018

Abstract

Authorship attribution aims at identifying the original author of an anonymous text from a given set of candidate authors and has a wide range of applications. The main challenge in authorship attribution problem is that the real world applications tend to have hundreds of authors while each author may have a small number of text samples, e.g., 5 to 10 texts per author. As a result, building a predictive model that can accurately identify the author of an anonymous text is a challenging task. In fact, existing authorship attribution solutions based on long text focus on application scenarios where the number of candidate authors are limited to 50. These solutions generally report a significant performance reduction as the number of authors increases. To overcome this challenge, we propose a novel data representation model that captures stylistic variations within each document, which transforms the problem of authorship attribution into a similarity search problem. Based on this data representation model, we also propose a similarity query processing technique that can effectively handle outliers. We assess the accuracy of our proposed method against the state-of-the-art authorship attribution methods using real-world datasets extracted from Project Gutenberg. Our dataset contains 3000 novels from 500 authors. Experimental results from our study show that our method significantly outperforms all competitors. Specifically, as for the closed-set and open-set authorship attribution problems, our method have achieved higher than 95% accuracy.

Research Area(s)

  • Data models, Entropy, Feature extraction, large scale database, Query processing, similarity search, stylometry, Syntactics, Task analysis, Writing

Citation Format(s)

An Effective and Scalable Framework for Authorship Attribution Query Processing. / Sarwar, Raheem; Yu, Chenyun; Tungare, Ninad; Chitavisutthivong, Kanatip; Sriratanawilai, Sukrit; Xu, Yaohai; Chow, Dickson; Rakthanmanon, Thanawin; Nutanong, Sarana.

In: IEEE Access, Vol. 6, 10.09.2018, p. 50030-50048.

Research output: Journal Publications and Reviews (RGC: 21, 22, 62)21_Publication in refereed journalpeer-review