Skip to main navigation Skip to search Skip to main content

Scaling conditional random field with application to chinese word segmentation

Hai Zhao, Chunyu Kit

Research output: Chapters, Conference Papers, Creative and Literary WorksRGC 32 - Refereed conference paper (with host publication)peer-review

Abstract

As a powerful sequence labeling model, conditional random field (CRF) has been applied to a number of natural language processing (NLP) tasks successfully. However, the high complexity of CRF training only allows a very small tag (or label)1 set, because the training becomes intractable as the tag set enlarges. This paper proposes an improved decomposed training and joint decoding algorithm for CRF learning. Instead of training a single CRF model for all tags, it trains a binary sub-CRF independently for each tag. A predicted tag sequence is then produced by a joint decoding algorithm based on the probabilistic output of all sub-CRFs involved. To test its effectiveness, this approach is applied to tackle Chinese word segmentation (CWS) as a character tagging problem. Our evaluation shows that it can reduce time and memory cost by 20-39% and 44-50%, respectively, without any significant performance loss on various large-scale data sets. © 2007 IEEE.
Original languageEnglish
Title of host publicationProceedings - Third International Conference on Natural Computation, ICNC 2007
PublisherIEEE Computer Society
Pages95-99
Volume5
ISBN (Print)0769528759, 9780769528755
DOIs
Publication statusPublished - 24 Aug 2007
Event3rd International Conference on Natural Computation, ICNC 2007 - Haikou, Hainan, China
Duration: 24 Aug 200727 Aug 2007

Conference

Conference3rd International Conference on Natural Computation, ICNC 2007
PlaceChina
CityHaikou, Hainan
Period24/08/0727/08/07

Fingerprint

Dive into the research topics of 'Scaling conditional random field with application to chinese word segmentation'. Together they form a unique fingerprint.

Cite this