Abstract
Model-based offline reinforcement learning (RL) constructs environment models from offline datasets to perform conservative policy optimization. Existing approaches focus on learning state transitions through ensemble models, rolling out conservative estimation to mitigate extrapolation errors. However, the static data makes it challenging to develop a robust policy, and offline agents cannot access the environment to gather new data. To address these challenges, we introduce Model-based Offline Reinforcement learning with AdversariaL data augmentation (MORAL). In MORAL, we replace the fixed horizon rollout by employing adversarial data augmentation to execute alternating sampling with ensemble models to enrich training data. Specifically, this adversarial process dynamically selects ensemble models against policy for biased sampling, mitigating the optimistic estimation of fixed models, thus robustly expanding the training data for policy optimization. Moreover, a differential factor (DF) is integrated into the adversarial process for regularization, ensuring error minimization in extrapolations. This data-augmented optimization adapts to diverse offline tasks without rollout horizon tuning, showing remarkable applicability. Extensive experiments on the D4RL benchmark demonstrate that MORAL outperforms other model-based offline RL methods in terms of policy learning and sample efficiency. © 2012 IEEE.
| Original language | English |
|---|---|
| Number of pages | 15 |
| Journal | IEEE Transactions on Neural Networks and Learning Systems |
| Online published | 2 Dec 2025 |
| DOIs | |
| Publication status | Online published - 2 Dec 2025 |
Funding
This work was supported in part by the National Natural Science Foundation of China under Grant 62506157, Grant 62276128, Grant 62192783, and Grant 62206133; in part by Jiangsu Science and Technology Major Project under Grant BG2024031; in part by the Natural Science Foundation of Jiangsu Province under Grant BK20243051; in part by the Fundamental Research Funds for the Central Universities under Grant 14380128; and in part by the Collaborative Innovation Center of Novel Software Technology and Industrialization.
Research Keywords
- Adversarial data augmentation
- differential factor (DF) regularization
- model-based offline reinforcement learning (RL)
- two-player game
Fingerprint
Dive into the research topics of 'Model-Based Offline Reinforcement Learning With Adversarial Data Augmentation'. Together they form a unique fingerprint.Cite this
- APA
- Author
- BIBTEX
- Harvard
- Standard
- RIS
- Vancouver