De-identification of Clinical Text via Bi-LSTM-CRF with Neural Language Models

Buzhou Tang*, Dehuan Jiang, Qingcai Chen, Xiaolong Wang, Jun Yan, Ying Shen

*Corresponding author for this work

Research output: Chapters, Conference Papers, Creative and Literary WorksRGC 32 - Refereed conference paper (with host publication)peer-review

Abstract

De-identification of clinical text, the prerequisite of electronic clinical data reuse, is a typical named entity recogni tion (NER) problem. A number of state-of-the-art deep learning methods for NER, such as Bi-LSTM-CRF (bidirec tional long-short-term-memory conditional random fields), have been applied for de-identification. Neural language models used for language representation bring great improvement in lots of NLP tasks when they are integrated with other deep learning methods. In this paper, we introduce Bi-LSTM-CRF with neural language models for de- identification of clinical text, and evaluate it on the de-identification datasets of the i2b2 2014 and the CEGS N- GRID 2016 challenges. Four neural language models of three types individually integrated with Bi-LSTM-CRF are compared in this study. Bi-LSTM-CRF with neural language models achieves the highest “strict” micro-averaged F1-score of 95.50% on the i2b2 2014 dataset and 91.82% on the CEGS N-GRID 2016 dataset, becoming new benchmark results on these two datasets respectively. © 2019 AMIA.
Original languageEnglish
Title of host publicationAMIA 2019 Annual Symposium Proceedings
Pages857-863
Number of pages7
Publication statusPublished - Nov 2019
Externally publishedYes
Event2019 American Medical Informatics Association Annual Symposium (AMIA 2019) - Washington, United States
Duration: 16 Nov 201920 Nov 2019
https://knowledge.amia.org/change

Conference

Conference2019 American Medical Informatics Association Annual Symposium (AMIA 2019)
PlaceUnited States
CityWashington
Period16/11/1920/11/19
Internet address

Funding

This paper is supported in part by grants: NSFCs (National Natural Science Foundations of China) (U1813215, 61876052 and 61573118), Special Foundation for Technology Research Program of Guangdong Province (2015B010131010), Strategic Emerging Industry Development Special Funds of Shenzhen (JCYJ20160531192358466), Innovation Fund of Harbin Institute of Technology (HIT.NSRIF.2017052).

Research Keywords

  • De-identification
  • Named entity recognition
  • Bidirectional long-short-term-memory
  • Conditional random fields
  • Neural language models

Fingerprint

Dive into the research topics of 'De-identification of Clinical Text via Bi-LSTM-CRF with Neural Language Models'. Together they form a unique fingerprint.

Cite this