Model-Based Offline Reinforcement Learning With Adversarial Data Augmentation

Hongye Cao, Fan Feng, Jing Huo*, Shangdong Yang*, Meng Fang, Tianpei Yang, Yang Gao

*Corresponding author for this work

Research output: Journal Publications and ReviewsRGC 21 - Publication in refereed journalpeer-review

Abstract

Model-based offline reinforcement learning (RL) constructs environment models from offline datasets to perform conservative policy optimization. Existing approaches focus on learning state transitions through ensemble models, rolling out conservative estimation to mitigate extrapolation errors. However, the static data makes it challenging to develop a robust policy, and offline agents cannot access the environment to gather new data. To address these challenges, we introduce Model-based Offline Reinforcement learning with AdversariaL data augmentation (MORAL). In MORAL, we replace the fixed horizon rollout by employing adversarial data augmentation to execute alternating sampling with ensemble models to enrich training data. Specifically, this adversarial process dynamically selects ensemble models against policy for biased sampling, mitigating the optimistic estimation of fixed models, thus robustly expanding the training data for policy optimization. Moreover, a differential factor (DF) is integrated into the adversarial process for regularization, ensuring error minimization in extrapolations. This data-augmented optimization adapts to diverse offline tasks without rollout horizon tuning, showing remarkable applicability. Extensive experiments on the D4RL benchmark demonstrate that MORAL outperforms other model-based offline RL methods in terms of policy learning and sample efficiency. © 2012 IEEE.
Original languageEnglish
Number of pages15
JournalIEEE Transactions on Neural Networks and Learning Systems
Online published2 Dec 2025
DOIs
Publication statusOnline published - 2 Dec 2025

Funding

This work was supported in part by the National Natural Science Foundation of China under Grant 62506157, Grant 62276128, Grant 62192783, and Grant 62206133; in part by Jiangsu Science and Technology Major Project under Grant BG2024031; in part by the Natural Science Foundation of Jiangsu Province under Grant BK20243051; in part by the Fundamental Research Funds for the Central Universities under Grant 14380128; and in part by the Collaborative Innovation Center of Novel Software Technology and Industrialization.

Research Keywords

  • Adversarial data augmentation
  • differential factor (DF) regularization
  • model-based offline reinforcement learning (RL)
  • two-player game

Fingerprint

Dive into the research topics of 'Model-Based Offline Reinforcement Learning With Adversarial Data Augmentation'. Together they form a unique fingerprint.

Cite this