Towards a Professional Platform for Chinese Character Conversion

Research output: Journal Publications and Reviews (RGC: 21, 22, 62)22_Publication in policy or professional journal

View graph of relations

Author(s)

Related Research Unit(s)

Detail(s)

Original languageEnglish
Pages (from-to)1 - 22
Journal / PublicationACM Transactions on Asian Language Information Processing (Print)
Volume12
Issue number1
Publication statusPublished - 2013

Abstract

Increasing communication among Chinese-speaking regions using respectively traditional and simplified Chinese character systems has highlighted the subtle-yet-extensive differences between the two systems, which can lead to unexpected hindrance in converting characters from one to the other. This article proposes a new priority-based multi-data resources management model, with a new algorithm called Fused Conversion algorithm from Multi-Data resources (FCMD), to ensure more context-sensitive, human controllable, and thus more reliable conversions, by drawing on reverse maximum matching, n-gram-based statistical model and pattern-based learning and matching. After parameter training on the Tagged Chinese Gigaword corpus, its conversion precision reaches 91.5% in context-sensitive cases, the most difficult part in the conversion, with an overall precision rate at 99.8%, a significant improvement over the state-of-the-art models. The conversion platform based on the model has extra features such as data resource selection and n-grams self-learning ability, providing a more sophisticated tool good especially for high-end professional uses.

Research Area(s)

  • Chinese character conversion, multi-data resources, FCMD algorithm, reverse maximum matching, pattern learning, n-gram