Evaluation of machine translation via parameterized quantification of closeness in word choice and position

  • Tak Ming WONG

    Student thesis: Doctoral Thesis

    Abstract

    To evaluate the translation quality of machine translation (MT) systems, various automatic metrics have been invented aiming at a fast, objective and replicable estimation similar to human evaluation. Most evaluation metrics in use so far have to rely on some text similarity measures to compute the closeness of MT outputs to corresponding reference human translations by virtue of string-based matching. This mainstream methodology has been criticized for poor discriminative power and disregard of important linguistic features. This thesis is intended to develop novel methodologies to address such deficiencies by formulating and testing a lexical-oriented evaluation metric. The metric quantifies word choice and word position, two fundamental aspects of text similarity. Words are recognized as basic operable text units, providing the basis of an extensive range of features to characterize the multiple dimensions of MT output. At the word level, every word between an output candidate and its reference is compared, covering the structural and phonological aspects of word form, the knowledge- and corpus-based similarity of word sense, and word informativeness. At the sentence level, two language-independent distance measures are formulated to account for word position in sentence and the overall word sequence respectively. At the document level, inter-sentence relationship is captured by a measure of lexical cohesion - an important factor to refl ect text coherence. This approach provides a comprehensive coverage of evaluation addressing different aspects of the quality of MT output. The validity and effectiveness of the methodologies of this novel approach are verified by the performance of the proposed metric. In a couple of open evaluations of MT metrology, our metric performs comprehensively better than the standard metrics in the field and turns out to be highly comparable to several state-of-the-art ones for English and other European languages, in terms of its magnitude of correlation to human assessment. The impact of our metric, owing to its strong performance, is also illustrated through practical ranking of MT systems, in comparison with other metrics. Experimental findings reveal a huge variation of system rankings infl uenced by the choices of metric, which shows how important the right choice of metric is for reliable MT evaluation.
    Date of Award3 Oct 2012
    Original languageEnglish
    Awarding Institution
    • City University of Hong Kong
    SupervisorChun Yu KIT (Supervisor)

    Keywords

    • Machine translating

    Cite this

    '