Abstract
Training data selection is a common method for domain adaptation, the goal of which is to choose a subset of training data that works well for a given test set. It has been shown to be effective for tasks such as machine translation and parsing. In this paper, we propose several entropy-based measures for training data selection and test their effectiveness on two tasks: Chinese word segmentation and part-of-speech tagging. The experimental results on the Chinese Penn Treebank indicate that some of the measures provide a statistically significant improvement over random selection for both tasks.
| Original language | English |
|---|---|
| Pages | 1191-1200 |
| Publication status | Published - 8 Dec 2012 |
| Event | 24th International Conference on Computational Linguistics / Proceedings of COLING 2012: Posters - Mumbai, India Duration: 8 Dec 2012 → 15 Dec 2012 |
Conference
| Conference | 24th International Conference on Computational Linguistics / Proceedings of COLING 2012: Posters |
|---|---|
| Place | India |
| City | Mumbai |
| Period | 8/12/12 → 15/12/12 |
Fingerprint
Dive into the research topics of 'Entropy-based training data selection for domain adaptation'. Together they form a unique fingerprint.Cite this
- APA
- Author
- BIBTEX
- Harvard
- Standard
- RIS
- Vancouver