Skip to main navigation Skip to search Skip to main content

Entropy-based training data selection for domain adaptation

Research output: Conference PapersRGC 32 - Refereed conference paper (without host publication)peer-review

Abstract

Training data selection is a common method for domain adaptation, the goal of which is to choose a subset of training data that works well for a given test set. It has been shown to be effective for tasks such as machine translation and parsing. In this paper, we propose several entropy-based measures for training data selection and test their effectiveness on two tasks: Chinese word segmentation and part-of-speech tagging. The experimental results on the Chinese Penn Treebank indicate that some of the measures provide a statistically significant improvement over random selection for both tasks.
Original languageEnglish
Pages1191-1200
Publication statusPublished - 8 Dec 2012
Event24th International Conference on Computational Linguistics / Proceedings of COLING 2012: Posters - Mumbai, India
Duration: 8 Dec 201215 Dec 2012

Conference

Conference24th International Conference on Computational Linguistics / Proceedings of COLING 2012: Posters
PlaceIndia
CityMumbai
Period8/12/1215/12/12

Fingerprint

Dive into the research topics of 'Entropy-based training data selection for domain adaptation'. Together they form a unique fingerprint.

Cite this