Predicting Risks of Machine Translations of Public Health Resources by Developing Interpretable Machine Learning Classifiers
Research output: Journal Publications and Reviews (RGC: 21, 22, 62) › 21_Publication in refereed journal › peer-review
Author(s)
Related Research Unit(s)
Detail(s)
Original language | English |
---|---|
Article number | 8789 |
Journal / Publication | International Journal of Environmental Research and Public Health |
Volume | 18 |
Issue number | 16 |
Online published | 20 Aug 2021 |
Publication status | Published - Aug 2021 |
Link(s)
DOI | DOI |
---|---|
Attachment(s) | Documents
Publisher's Copyright Statement
|
Link to Scopus | https://www.scopus.com/record/display.uri?eid=2-s2.0-85113173053&origin=recordpage |
Permanent Link | https://scholars.cityu.edu.hk/en/publications/publication(1ab02b07-b434-4fe9-987a-75b7e5259cdf).html |
Abstract
We aimed to develop machine learning classifiers as a risk-prevention mechanism to help medical professionals with little or no knowledge of the patient’s languages in order to predict the likelihood of clinically significant mistakes or incomprehensible MT outputs based on the features of English source information as input to the MT systems. A MNB classifier was developed to provide intuitive probabilistic predictions of erroneous health translation outputs based on the computational modelling of a small number of optimised features of the original English source texts. The best performing multinominal Naïve Bayes classifier (MNB) using a small number of optimised features (8) achieved statistically higher AUC (M = 0.760, SD = 0.03) than the classifier using high-dimension natural features (135) (M = 0.631, SD = 0.006, p < 0.0001, SE = 0.004) and the automatically optimised classifier (22) (M = 0.7231, SD = 0.0084, p < 0.0001, SE = 0.004). Furthermore, MNB (8) had statistically higher sensitivity (M = 0.885, SD = 0.100) compared with the full-feature classifier (135) (M = 0.577, SD = 0.155, p < 0.0001, SE = 0.005) and the automatically optimised classifier (22) (M = 0.731, SD = 0.139, p < 0.0001, SE = 0.0023). Finally, MNB (8) reached statistically higher specificity (M = 0.667, SD = 0.138) compared to the full-feature classifier (135) (M = 0.567, SD = 0.139, p = 0.0002, SE = 0.026) and the automatically optimised classifier (22) (M = 0.633, SD = 0.141, p = 0.0133, SE = 0.026).
Research Area(s)
- multinominal naïve bayes classifier, public health education and promotion, machine learning, digital vulnerability
Citation Format(s)
Predicting Risks of Machine Translations of Public Health Resources by Developing Interpretable Machine Learning Classifiers. / Xie, Wenxiu; Ji, Meng; Huang, Riliu et al.
In: International Journal of Environmental Research and Public Health, Vol. 18, No. 16, 8789, 08.2021.Research output: Journal Publications and Reviews (RGC: 21, 22, 62) › 21_Publication in refereed journal › peer-review
Download Statistics
No data available