TY - JOUR
T1 - New trend on chemical structure representation learning in toxicology
T2 - In reviews of machine learning model methodology
AU - Zhang, Jiabin
AU - Zhao, Lei
AU - Wang, Wei
AU - Xing, De-Feng
AU - Wang, Zhen-Xing
AU - Ma, Jun
AU - Wang, Aijie
AU - Ren, Nan-Qi
AU - Lee, Duu-Jong
AU - Chen, Chuan
PY - 2025/3/10
Y1 - 2025/3/10
N2 - Computer-assisted virtual screening using structure-activity relationship (QSAR) models is a surrogate method for reducing the need for costly animal experiments. However, traditional QSAR models face significant challenges, such as the ‘activity cliff’ phenomenon and small datasets, which limit their ability to generalize and predict toxicity. This review examines transistion of digital encodings form in molecules and its corresponding models, introducing from molecule descriptors to three advanced types of molecular representations based on deep learning techniques. We highlight the importance of deep learning models that can not only capture molecular similarity in chemical space to address the ‘activity cliff’ problem but also improve model performance through feature fusion. As alternative solutions to reduce reliance on feature engineering potentially, graph neural network, convolutional neural network and large lanuage model and their related training paradigm such as transfer learning could give another opportunity for toxicity model setting in terms of data insuffient dealing etc. This work could help potential deep learning modelers to build robust model, setting the stage for groundbreaking advancements in further development and application of toxicity prediction models. © 2025 Taylor & Francis Group, LLC.
AB - Computer-assisted virtual screening using structure-activity relationship (QSAR) models is a surrogate method for reducing the need for costly animal experiments. However, traditional QSAR models face significant challenges, such as the ‘activity cliff’ phenomenon and small datasets, which limit their ability to generalize and predict toxicity. This review examines transistion of digital encodings form in molecules and its corresponding models, introducing from molecule descriptors to three advanced types of molecular representations based on deep learning techniques. We highlight the importance of deep learning models that can not only capture molecular similarity in chemical space to address the ‘activity cliff’ problem but also improve model performance through feature fusion. As alternative solutions to reduce reliance on feature engineering potentially, graph neural network, convolutional neural network and large lanuage model and their related training paradigm such as transfer learning could give another opportunity for toxicity model setting in terms of data insuffient dealing etc. This work could help potential deep learning modelers to build robust model, setting the stage for groundbreaking advancements in further development and application of toxicity prediction models. © 2025 Taylor & Francis Group, LLC.
KW - chemical structure
KW - Deep representative learning
KW - Peng Gao
KW - QSAR
KW - Risk management
KW - toxicity prediction
UR - http://www.scopus.com/inward/record.url?scp=86000504131&partnerID=8YFLogxK
UR - https://www.scopus.com/record/pubmetrics.uri?eid=2-s2.0-86000504131&origin=recordpage
U2 - 10.1080/10643389.2025.2469868
DO - 10.1080/10643389.2025.2469868
M3 - RGC 21 - Publication in refereed journal
SN - 1064-3389
JO - Critical Reviews in Environmental Science and Technology
JF - Critical Reviews in Environmental Science and Technology
ER -