New Word Extraction from Chinese Financial Documents
Research output: Journal Publications and Reviews › RGC 21 - Publication in refereed journal › peer-review
Author(s)
Detail(s)
Original language | English |
---|---|
Article number | 7891570 |
Pages (from-to) | 770-773 |
Journal / Publication | IEEE Signal Processing Letters |
Volume | 24 |
Issue number | 6 |
Publication status | Published - 1 Jun 2017 |
Externally published | Yes |
Link(s)
Abstract
With the tremendous development of data science, using unstructured documents to analyze marketing dynamics is attracting a great deal of attention. In this letter, we propose an iterative scheme to extract the new words, which is often a bottleneck for Chinese natural language processing (NLP) in financial markets analysis. In contrast to existing static features, the key novelty is the proposed dynamic features that characterize the similarity of context patterns. Via iteration, distinguishable seed context patterns are extracted. Tested on a 203 MB corpus, 19 291 words representing emerging industries, entities, projects, and products were extracted with a precision of 89.8% and recall of 88.9%, which outperforms most competitor methods.
Research Area(s)
- Chinese new word extraction, iterative algorithm, natural language processing, static/dynamic features, support vector machine
Bibliographic Note
Publication details (e.g. title, author(s), publication statuses and dates) are captured on an “AS IS” and “AS AVAILABLE” basis at the time of record harvesting from the data source. Suggestions for further amendments or supplementary information can be sent to [email protected].
Citation Format(s)
New Word Extraction from Chinese Financial Documents. / Yan, Liwei; Bai, Bo; Chen, Wei et al.
In: IEEE Signal Processing Letters, Vol. 24, No. 6, 7891570, 01.06.2017, p. 770-773.
In: IEEE Signal Processing Letters, Vol. 24, No. 6, 7891570, 01.06.2017, p. 770-773.
Research output: Journal Publications and Reviews › RGC 21 - Publication in refereed journal › peer-review