To evaluate the translation quality of machine translation (MT) systems, various
automatic metrics have been invented aiming at a fast, objective and replicable
estimation similar to human evaluation. Most evaluation metrics in use so far have
to rely on some text similarity measures to compute the closeness of MT outputs
to corresponding reference human translations by virtue of string-based matching.
This mainstream methodology has been criticized for poor discriminative power
and disregard of important linguistic features.
This thesis is intended to develop novel methodologies to address such deficiencies by formulating and testing a lexical-oriented evaluation metric. The
metric quantifies word choice and word position, two fundamental aspects of text
similarity. Words are recognized as basic operable text units, providing the basis of
an extensive range of features to characterize the multiple dimensions of MT output.
At the word level, every word between an output candidate and its reference
is compared, covering the structural and phonological aspects of word form, the
knowledge- and corpus-based similarity of word sense, and word informativeness.
At the sentence level, two language-independent distance measures are formulated
to account for word position in sentence and the overall word sequence respectively.
At the document level, inter-sentence relationship is captured by a measure
of lexical cohesion - an important factor to refl
ect text coherence. This approach
provides a comprehensive coverage of evaluation addressing different aspects of
the quality of MT output.
The validity and effectiveness of the methodologies of this novel approach
are verified by the performance of the proposed metric. In a couple of open
evaluations of MT metrology, our metric performs comprehensively better than
the standard metrics in the field and turns out to be highly comparable to several
state-of-the-art ones for English and other European languages, in terms of its
magnitude of correlation to human assessment. The impact of our metric, owing to its strong performance, is also illustrated through practical ranking of MT
systems, in comparison with other metrics. Experimental findings reveal a huge
variation of system rankings infl
uenced by the choices of metric, which shows how
important the right choice of metric is for reliable MT evaluation.
| Date of Award | 3 Oct 2012 |
|---|
| Original language | English |
|---|
| Awarding Institution | - City University of Hong Kong
|
|---|
| Supervisor | Chun Yu KIT (Supervisor) |
|---|
Evaluation of machine translation via parameterized quantification of closeness in word choice and position
WONG, T. M. (Author). 3 Oct 2012
Student thesis: Doctoral Thesis