Abstract
Learning policy from offline datasets through offline reinforcement learning (RL) holds promise for scaling data-driven decision-making while avoiding unsafe and costly online interactions. However, real-world data collected from sensors or humans often contains noise and errors, posing a significant challenge for existing offline RL methods, particularly when the real-world data is limited. Our study reveals that prior research focusing on adapting predominant offline RL methods based on temporal difference learning still falls short under data corruption when the dataset is limited. In contrast, we discover that vanilla sequence modeling methods, such as Decision Transformer, exhibit robustness against data corruption, even without specialized modifications. To unlock the full potential of sequence modeling, we propose Robust Decision Transformer (RDT) by incorporating three simple yet effective robust techniques: embedding dropout to improve the model's robustness against erroneous inputs, Gaussian weighted learning to mitigate the effects of corrupted labels, and iterative data correction to eliminate corrupted data from the source. Extensive experiments on MuJoCo, Kitchen, and Adroit tasks demonstrate RDT's superior performance under various data corruption scenarios compared to prior methods. Furthermore, RDT exhibits remarkable robustness in a more challenging setting that combines training-time data corruption with test-time observation perturbations. These results highlight the potential of sequence modeling for learning from noisy or corrupted offline datasets, thereby promoting the reliable application of offline RL in real-world scenarios. Our code is available at https://github.com/jiawei415/RobustDecisionTransformer.
| Original language | English |
|---|---|
| Title of host publication | International Conference on Representation Learning 2025 (ICLR 2025) |
| Editors | Y. Yue, A. Garg, N. Peng, F. Sha, R. Yu |
| Publisher | International Conference on Learning Representations, ICLR |
| Number of pages | 29 |
| ISBN (Electronic) | 9798331320850 |
| Publication status | Published - Apr 2025 |
| Event | 13th International Conference on Learning Representations (ICLR 2025) - Singapore EXPO, Singapore, Singapore Duration: 24 Apr 2025 → 28 Apr 2025 https://iclr.cc/Conferences/2025 |
Conference
| Conference | 13th International Conference on Learning Representations (ICLR 2025) |
|---|---|
| Abbreviated title | ICLR 2025 |
| Place | Singapore |
| City | Singapore |
| Period | 24/04/25 → 28/04/25 |
| Internet address |
Funding
Baoxiang Wang is partially supported by the National Natural Science Foundation of China (62106213, 72394361) and an extended support project from the Shenzhen Science and Technology Program. Shuang Qiu acknowledges the support of GRF 16209124.
RGC Funding Information
- RGC-funded
Fingerprint
Dive into the research topics of 'TACKLING DATA CORRUPTION IN OFFLINE REINFORCEMENT LEARNING VIA SEQUENCE MODELING'. Together they form a unique fingerprint.Projects
- 1 Active
-
GRF: Provably Efficient Multi-Agent Reinforcement Learning: Algorithms and Theory
QIU, S. (Principal Investigator / Project Coordinator)
1/08/24 → …
Project: Research
Cite this
- APA
- Author
- BIBTEX
- Harvard
- Standard
- RIS
- Vancouver