Creating an interoperable language resource for interoperable linguistic studies

Research output: Journal Publications and ReviewsRGC 21 - Publication in refereed journalpeer-review

1 Scopus Citations
View graph of relations

Detail(s)

Original languageEnglish
Pages (from-to)327-340
Journal / PublicationLanguage Resources and Evaluation
Volume46
Issue number2
Publication statusPublished - 2012

Abstract

There are two different levels of interoperability for language resources: operational interoperability and conceptual interoperability. The former refers to the standardization of the formal aspects of language resources so that different resources can work together. The latter refers to the standardization of the notional representation of the semantic content of the analysis. This article addresses both issues but focuses on the latter through a description of the annotation and analysis of the International Corpus of English, which is a corpus for the study of English as a global language. The project is parameterised by component, regional sub-corpora and a set of pre-defined textual categories. The one-million-word British component has been constructed, grammatically tagged, and syntactically parsed. This article is first of all a description of steps taken to ensure conformity within the project. These include corpus design, part-of-speech tagging, and syntactic parsing. The article will then present a study that examines the use of adverbial clauses across speech and writing, illustrating the imminent necessity for interoperable analysis of linguistic data. © 2012 Springer Science+Business Media B.V.

Research Area(s)

  • Adverbial clause, Conceptual interoperability, Operational interoperability, Parsing, Speech, Tagging, The International Corpus of English, Writing