Rethinking the Masking Strategy for Pretraining Molecular Graphs from a Data-Centric View

Research output: Journal Publications and ReviewsRGC 21 - Publication in refereed journalpeer-review

View graph of relations

Related Research Unit(s)

Detail(s)

Original languageEnglish
Journal / PublicationACS Omega
Online published3 May 2024
Publication statusOnline published - 3 May 2024

Abstract

Node-level self-supervised learning has been widely applied for pretraining molecular graphs. Attribute Masking (AttrMask) is pioneering work in this field, and its improved methods focus on enhancing the capacity of the backbone models by incorporating additional modules. However, these methods overlook the imbalanced atom distribution due to employing only the random masking strategy to mask atoms for pretraining. According to the properties of molecules, we propose a weighted masking strategy to enhance the capacity of pretrained models by more effective utilization of molecular information while pretraining. Our experimental results demonstrate that AttrMask combined with our proposed weighted masking strategy yields superior performance compared to the random masking strategy, even surpassing the model-centric improvement methods without increasing the parameters. Additionally, our weighted masking strategy can be extended to other pretraining methods to achieve enhanced performance. © 2024 The Authors. Published by American Chemical Society.