TY - JOUR
T1 - A machine learning approach using frequency descriptor for molecular property predictions
AU - Chen, Jialu
AU - Xu, Wenjun
AU - Zhang, Ruiqin
PY - 2021/11/28
Y1 - 2021/11/28
N2 - Machine learning algorithms have been found to be effective in predicting the properties of molecules and materials. Recently, a new strategy, Δ-machine learning, which uses low-level calculations as a baseline to predict properties of high-level methods, has been proposed to further reduce computational costs. It has been successfully applied to predictions of potential energy surfaces, bandgaps and chemical shieldings. Here we introduce a new descriptor, in which we used harmonic vibrational frequencies as the descriptor in predictions of molecular properties, namely the frequency descriptor (FD). In detail, we used harmonic vibrational frequencies of several semi-empirical methods (the PM6, PM7 and GFN2-xTB methods) as the descriptor in Δ-machine learning. The energies, enthalpies and HOMO-LUMO gaps of 6095 C7H10O2 isomers at high-level calculations were used as target properties to test the descriptor. We found that the FD generated by the GFN2-xTB method has excellent performance among several semiempirical methods. The chemical accuracy can be achieved with a small training set size according to the combination of single-point calculations at density functional theory levels. In addition, we further included infrared intensities to the FD, namely the FD-II by which the chemical accuracy of energies can be achieved with a small training set size (3%) that represents the smallest sample size in the current dataset (C7H10O2 isomers). We expect that the FD and FD-II can also be used to accelerate other property predictions.
AB - Machine learning algorithms have been found to be effective in predicting the properties of molecules and materials. Recently, a new strategy, Δ-machine learning, which uses low-level calculations as a baseline to predict properties of high-level methods, has been proposed to further reduce computational costs. It has been successfully applied to predictions of potential energy surfaces, bandgaps and chemical shieldings. Here we introduce a new descriptor, in which we used harmonic vibrational frequencies as the descriptor in predictions of molecular properties, namely the frequency descriptor (FD). In detail, we used harmonic vibrational frequencies of several semi-empirical methods (the PM6, PM7 and GFN2-xTB methods) as the descriptor in Δ-machine learning. The energies, enthalpies and HOMO-LUMO gaps of 6095 C7H10O2 isomers at high-level calculations were used as target properties to test the descriptor. We found that the FD generated by the GFN2-xTB method has excellent performance among several semiempirical methods. The chemical accuracy can be achieved with a small training set size according to the combination of single-point calculations at density functional theory levels. In addition, we further included infrared intensities to the FD, namely the FD-II by which the chemical accuracy of energies can be achieved with a small training set size (3%) that represents the smallest sample size in the current dataset (C7H10O2 isomers). We expect that the FD and FD-II can also be used to accelerate other property predictions.
KW - ZETA VALENCE QUALITY
KW - NONCOVALENT INTERACTIONS
KW - NDDO APPROXIMATIONS
KW - BASIS-SETS
KW - OPTIMIZATION
KW - PARAMETERS
KW - ELEMENTS
KW - MODELS
UR - http://www.scopus.com/inward/record.url?scp=85119718585&partnerID=8YFLogxK
UR - http://gateway.isiknowledge.com/gateway/Gateway.cgi?GWVersion=2&SrcAuth=LinksAMR&SrcApp=PARTNER_APP&DestLinkType=FullRecord&DestApp=WOS&KeyUT=000711546700001
UR - https://www.scopus.com/record/pubmetrics.uri?eid=2-s2.0-85119718585&origin=recordpage
U2 - 10.1039/d1nj04739f
DO - 10.1039/d1nj04739f
M3 - RGC 21 - Publication in refereed journal
SN - 1144-0546
VL - 45
SP - 20672
EP - 20680
JO - New Journal of Chemistry
JF - New Journal of Chemistry
IS - 44
ER -