Narrowing the gap between termbases and corpora in commercial environments

  • Kara Cordelia WARBURTON

Student thesis: Doctoral Thesis

Abstract

This research investigates the terminological data in terminology databases (termbases) and in corresponding corpora from commercial sources. Four companies in the information technology (IT) sector are used as case studies. Our broad objective is to increase awareness about some of the issues and challenges faced by terminologists in commercial settings. We demonstrate that there are significant gaps between the termbases and the corresponding corpora, that such gaps reduce the effectiveness of the termbases, and that they can be minimised by adopting a corpus-based approach to term identification. We begin by establishing that the language used in a company contains terminology. After reviewing the conventional theories and methodologies of the field of terminology, we challenge the suitability of some of their precepts for companies that require terminological resources that are both repurposable and production-oriented. We then reveal features in the termbases that depart from established norms. Using a batch concordance technique, we quantify the gap between the termbase terms and the corpora. We then attempt to explain this gap by examining termbase terms that occur in various frequency ranges within the corpora. Using empirical observations, we formulate some guiding principles for selecting terms for termbases with respect to various features including term length, part of speech, term variation, and the use of certain types of modifiers. We discover that keywords hold potential for discovering multi-word terms that, if documented in termbases, would significantly increase the correspondence between termbases and corpora. We conclude that termbases developed in companies would increase in value if corpus-based approaches to term identification were adopted. This challenges the conventional understanding of what is a term; to open the field of terminology to commercial applications and environments, termhood needs to be established based on communicative purpose and end-use of terminological resources in addition to purely semantic criteria.
Date of Award15 Jul 2014
Original languageEnglish
Awarding Institution
  • City University of Hong Kong
SupervisorChengyu Alex FANG (Supervisor)

Keywords

  • Commerce
  • Terminology
  • Corpora (Linguistics)

Cite this

'