Enhanced genre classification through linguistically fine-grained POS tags

    Research output: Chapters, Conference Papers, Creative and Literary WorksRGC 32 - Refereed conference paper (with host publication)peer-review

    5 Citations (Scopus)

    Abstract

    We propose the use of fine-grained part-of-speech (POS) tags as discriminatory attributes for automatic genre classification and report empirical results from an experiment that indicate substantial accuracy gain by such features over the conventional bag-of-words approach through word unigrams. In particular, this paper reports our research to investigate the performance of a fine-grained tag set when tested with the British component of the International Corpus of English. Ten different genre classification tasks were identified and the performance of the tags was evaluated in terms of F-score. Our results show that the use of linguistically fine-grained POS tags produces superior accuracy when compared with word unigrams, particularly for a rich set of 32 different genres with Naïve Bayes Multinominal Classifier. Through a comparison with an impoverished tag set, our results further demonstrate that the superior performance is due to the rich linguistic information embodied in the 400-strong different POS tags. © 2010 by Alex Chengyu Fang and Jing Cao.
    Original languageEnglish
    Title of host publicationPACLIC 24 - Proceedings of the 24th Pacific Asia Conference on Language, Information and Computation
    Pages85-94
    Publication statusPublished - 2010
    Event24th Pacific Asia Conference on Language, Information and Computation, PACLIC 24 - Sendai, Japan
    Duration: 4 Nov 20107 Nov 2010

    Conference

    Conference24th Pacific Asia Conference on Language, Information and Computation, PACLIC 24
    Country/TerritoryJapan
    CitySendai
    Period4/11/107/11/10

    Research Keywords

    • AUTASYS
    • Automatic genre classification
    • Fine-grained POS tag
    • ICE-GB
    • Linguistic granularity

    Fingerprint

    Dive into the research topics of 'Enhanced genre classification through linguistically fine-grained POS tags'. Together they form a unique fingerprint.

    Cite this