Large language model for horizontal transfer of resistance gene : From resistance gene prevalence detection to plasmid conjugation rate evaluation
Research output: Journal Publications and Reviews › RGC 21 - Publication in refereed journal › peer-review
Author(s)
Related Research Unit(s)
Detail(s)
Original language | English |
---|---|
Article number | 172466 |
Journal / Publication | Science of the Total Environment |
Volume | 931 |
Online published | 16 Apr 2024 |
Publication status | Published - 25 Jun 2024 |
Link(s)
Abstract
The burgeoning issue of plasmid-mediated resistance genes (ARGs) dissemination poses a significant threat to environmental integrity. However, the prediction of ARGs prevalence is overlooked, especially for emerging ARGs that are potentially evolving gene exchange hotspot. Here, we explored to classify plasmid or chromosome sequences and detect resistance gene prevalence by using DNABERT. Initially, the DNABERT fine-tuned in plasmid and chromosome sequences followed by multilayer perceptron (MLP) classifier could achieve 0.764 AUC (Area under curve) on external datasets across 23 genera, outperforming 0.02 AUC than traditional statistic-based model. Furthermore, Escherichia, Pseudomonas single genera based model were also be trained to explore its predict performance to ARGs prevalence detection. By integrating K-mer frequency attributes, our model could boost the performance to predict the prevalence of ARGs in an external dataset in Escherichia with 0.0281–0.0615 AUC and Pseudomonas with 0.0196–0.0928 AUC. Finally, we established a random forest model aimed at forecasting the relative conjugation transfer rate of plasmids with 0.7956 AUC, drawing on data from existing literature. It identifies the plasmid's repression status, cellular density, and temperature as the most important factors influencing transfer frequency. With these two models combined, they provide useful reference for quick and low-cost integrated evaluation of resistance gene transfer, accelerating the process of computer-assisted quantitative risk assessment of ARGs transfer in environmental field. © 2024
Research Area(s)
- ARGs prevalence prediction, BERT, Deep learning, Large language model, Plasmid conjugation rate
Citation Format(s)
Large language model for horizontal transfer of resistance gene: From resistance gene prevalence detection to plasmid conjugation rate evaluation. / Zhang, Jiabin; Zhao, Lei; Wang, Wei et al.
In: Science of the Total Environment, Vol. 931, 172466, 25.06.2024.
In: Science of the Total Environment, Vol. 931, 172466, 25.06.2024.
Research output: Journal Publications and Reviews › RGC 21 - Publication in refereed journal › peer-review