A machine learning approach using frequency descriptor for molecular property predictions

Research output: Journal Publications and ReviewsRGC 21 - Publication in refereed journalpeer-review

2 Scopus Citations
View graph of relations

Related Research Unit(s)

Detail(s)

Original languageEnglish
Pages (from-to)20672-20680
Journal / PublicationNew Journal of Chemistry
Volume45
Issue number44
Online published20 Oct 2021
Publication statusPublished - 28 Nov 2021

Abstract

Machine learning algorithms have been found to be effective in predicting the properties of molecules and materials. Recently, a new strategy, Δ-machine learning, which uses low-level calculations as a baseline to predict properties of high-level methods, has been proposed to further reduce computational costs. It has been successfully applied to predictions of potential energy surfaces, bandgaps and chemical shieldings. Here we introduce a new descriptor, in which we used harmonic vibrational frequencies as the descriptor in predictions of molecular properties, namely the frequency descriptor (FD). In detail, we used harmonic vibrational frequencies of several semi-empirical methods (the PM6, PM7 and GFN2-xTB methods) as the descriptor in Δ-machine learning. The energies, enthalpies and HOMO-LUMO gaps of 6095 C7H10O2 isomers at high-level calculations were used as target properties to test the descriptor. We found that the FD generated by the GFN2-xTB method has excellent performance among several semiempirical methods. The chemical accuracy can be achieved with a small training set size according to the combination of single-point calculations at density functional theory levels. In addition, we further included infrared intensities to the FD, namely the FD-II by which the chemical accuracy of energies can be achieved with a small training set size (3%) that represents the smallest sample size in the current dataset (C7H10O2 isomers). We expect that the FD and FD-II can also be used to accelerate other property predictions.

Research Area(s)

  • ZETA VALENCE QUALITY, NONCOVALENT INTERACTIONS, NDDO APPROXIMATIONS, BASIS-SETS, OPTIMIZATION, PARAMETERS, ELEMENTS, MODELS