Epidemiologic information discovery from open-access COVID-19 case reports via pretrained language model
Research output: Journal Publications and Reviews › RGC 21 - Publication in refereed journal › peer-review
Author(s)
Related Research Unit(s)
Detail(s)
Original language | English |
---|---|
Article number | 105079 |
Journal / Publication | iScience |
Volume | 25 |
Issue number | 10 |
Online published | 5 Sept 2022 |
Publication status | Published - 21 Oct 2022 |
Link(s)
DOI | DOI |
---|---|
Attachment(s) | Documents
Publisher's Copyright Statement
|
Link to Scopus | https://www.scopus.com/record/display.uri?eid=2-s2.0-85138083781&origin=recordpage |
Permanent Link | https://scholars.cityu.edu.hk/en/publications/publication(45c33aa8-1bec-49eb-80b5-c140c35c5fb6).html |
Abstract
Although open-access data are increasingly common and useful to epidemiological research, the curation of such datasets is resource-intensive and time-consuming. Despite the existence of a major source of COVID-19 data, the regularly disclosed case reports were often written in natural language with an unstructured format. Here, we propose a computational framework that can automatically extract epidemiological information from open-access COVID-19 case reports. We develop this framework by coupling a language model developed using deep neural networks with training samples compiled using an optimized data annotation strategy. When applied to the COVID-19 case reports collected from mainland China, our framework outperforms all other state-of-the-art deep learning models. The information extracted from our approach is highly consistent with that obtained from the gold-standard manual coding, with a matching rate of 80%. To disseminate our algorithm, we provide an open-access online platform that is able to estimate key epidemiological statistics in real time, with much less effort for data curation.
Research Area(s)
- Artificial intelligence, Health sciences, Machine learning, Virology
Citation Format(s)
Epidemiologic information discovery from open-access COVID-19 case reports via pretrained language model. / Wang, Zhizheng; Liu, Xiao Fan; Du, Zhanwei et al.
In: iScience, Vol. 25, No. 10, 105079, 21.10.2022.
In: iScience, Vol. 25, No. 10, 105079, 21.10.2022.
Research output: Journal Publications and Reviews › RGC 21 - Publication in refereed journal › peer-review
Download Statistics
No data available