Abstract
De-identification of clinical text, the prerequisite of electronic clinical data reuse, is a typical named entity recogni tion (NER) problem. A number of state-of-the-art deep learning methods for NER, such as Bi-LSTM-CRF (bidirec tional long-short-term-memory conditional random fields), have been applied for de-identification. Neural language models used for language representation bring great improvement in lots of NLP tasks when they are integrated with other deep learning methods. In this paper, we introduce Bi-LSTM-CRF with neural language models for de- identification of clinical text, and evaluate it on the de-identification datasets of the i2b2 2014 and the CEGS N- GRID 2016 challenges. Four neural language models of three types individually integrated with Bi-LSTM-CRF are compared in this study. Bi-LSTM-CRF with neural language models achieves the highest “strict” micro-averaged F1-score of 95.50% on the i2b2 2014 dataset and 91.82% on the CEGS N-GRID 2016 dataset, becoming new benchmark results on these two datasets respectively. © 2019 AMIA.
| Original language | English |
|---|---|
| Title of host publication | AMIA 2019 Annual Symposium Proceedings |
| Pages | 857-863 |
| Number of pages | 7 |
| Publication status | Published - Nov 2019 |
| Externally published | Yes |
| Event | 2019 American Medical Informatics Association Annual Symposium (AMIA 2019) - Washington, United States Duration: 16 Nov 2019 → 20 Nov 2019 https://knowledge.amia.org/change |
Conference
| Conference | 2019 American Medical Informatics Association Annual Symposium (AMIA 2019) |
|---|---|
| Place | United States |
| City | Washington |
| Period | 16/11/19 → 20/11/19 |
| Internet address |
Funding
This paper is supported in part by grants: NSFCs (National Natural Science Foundations of China) (U1813215, 61876052 and 61573118), Special Foundation for Technology Research Program of Guangdong Province (2015B010131010), Strategic Emerging Industry Development Special Funds of Shenzhen (JCYJ20160531192358466), Innovation Fund of Harbin Institute of Technology (HIT.NSRIF.2017052).
Research Keywords
- De-identification
- Named entity recognition
- Bidirectional long-short-term-memory
- Conditional random fields
- Neural language models
Fingerprint
Dive into the research topics of 'De-identification of Clinical Text via Bi-LSTM-CRF with Neural Language Models'. Together they form a unique fingerprint.Cite this
- APA
- Author
- BIBTEX
- Harvard
- Standard
- RIS
- Vancouver