New Word Extraction from Chinese Financial Documents

Research output: Journal Publications and ReviewsRGC 21 - Publication in refereed journalpeer-review

10 Scopus Citations
View graph of relations

Author(s)

Detail(s)

Original languageEnglish
Article number7891570
Pages (from-to)770-773
Journal / PublicationIEEE Signal Processing Letters
Volume24
Issue number6
Publication statusPublished - 1 Jun 2017
Externally publishedYes

Abstract

With the tremendous development of data science, using unstructured documents to analyze marketing dynamics is attracting a great deal of attention. In this letter, we propose an iterative scheme to extract the new words, which is often a bottleneck for Chinese natural language processing (NLP) in financial markets analysis. In contrast to existing static features, the key novelty is the proposed dynamic features that characterize the similarity of context patterns. Via iteration, distinguishable seed context patterns are extracted. Tested on a 203 MB corpus, 19 291 words representing emerging industries, entities, projects, and products were extracted with a precision of 89.8% and recall of 88.9%, which outperforms most competitor methods.

Research Area(s)

  • Chinese new word extraction, iterative algorithm, natural language processing, static/dynamic features, support vector machine

Bibliographic Note

Publication details (e.g. title, author(s), publication statuses and dates) are captured on an “AS IS” and “AS AVAILABLE” basis at the time of record harvesting from the data source. Suggestions for further amendments or supplementary information can be sent to [email protected].

Citation Format(s)

New Word Extraction from Chinese Financial Documents. / Yan, Liwei; Bai, Bo; Chen, Wei et al.
In: IEEE Signal Processing Letters, Vol. 24, No. 6, 7891570, 01.06.2017, p. 770-773.

Research output: Journal Publications and ReviewsRGC 21 - Publication in refereed journalpeer-review