Large language model for horizontal transfer of resistance gene : From resistance gene prevalence detection to plasmid conjugation rate evaluation

Research output: Journal Publications and ReviewsRGC 21 - Publication in refereed journalpeer-review

View graph of relations

Author(s)

  • Jiabin Zhang
  • Lei Zhao
  • Wei Wang
  • Quan Zhang
  • Xue-Ting Wang
  • De-Feng Xing
  • Nan-Qi Ren
  • Chuan Chen

Related Research Unit(s)

Detail(s)

Original languageEnglish
Article number172466
Journal / PublicationScience of the Total Environment
Volume931
Online published16 Apr 2024
Publication statusPublished - 25 Jun 2024

Abstract

The burgeoning issue of plasmid-mediated resistance genes (ARGs) dissemination poses a significant threat to environmental integrity. However, the prediction of ARGs prevalence is overlooked, especially for emerging ARGs that are potentially evolving gene exchange hotspot. Here, we explored to classify plasmid or chromosome sequences and detect resistance gene prevalence by using DNABERT. Initially, the DNABERT fine-tuned in plasmid and chromosome sequences followed by multilayer perceptron (MLP) classifier could achieve 0.764 AUC (Area under curve) on external datasets across 23 genera, outperforming 0.02 AUC than traditional statistic-based model. Furthermore, Escherichia, Pseudomonas single genera based model were also be trained to explore its predict performance to ARGs prevalence detection. By integrating K-mer frequency attributes, our model could boost the performance to predict the prevalence of ARGs in an external dataset in Escherichia with 0.0281–0.0615 AUC and Pseudomonas with 0.0196–0.0928 AUC. Finally, we established a random forest model aimed at forecasting the relative conjugation transfer rate of plasmids with 0.7956 AUC, drawing on data from existing literature. It identifies the plasmid's repression status, cellular density, and temperature as the most important factors influencing transfer frequency. With these two models combined, they provide useful reference for quick and low-cost integrated evaluation of resistance gene transfer, accelerating the process of computer-assisted quantitative risk assessment of ARGs transfer in environmental field. © 2024

Research Area(s)

  • ARGs prevalence prediction, BERT, Deep learning, Large language model, Plasmid conjugation rate

Citation Format(s)

Large language model for horizontal transfer of resistance gene: From resistance gene prevalence detection to plasmid conjugation rate evaluation. / Zhang, Jiabin; Zhao, Lei; Wang, Wei et al.
In: Science of the Total Environment, Vol. 931, 172466, 25.06.2024.

Research output: Journal Publications and ReviewsRGC 21 - Publication in refereed journalpeer-review