Skip to main navigation Skip to search Skip to main content

CHDP: Cooperative Hybrid Diffusion Policies for Reinforcement Learning in Parameterized Action Space

  • Bingyi Liu
  • , Jinbo He
  • , Haiyong Shi
  • , Enshu Wang*
  • , Weizhen Han*
  • , Jingxiang Hao
  • , Peixi Wang
  • , Zhuangzhuang Zhang
  • *Corresponding author for this work

Research output: Chapters, Conference Papers, Creative and Literary WorksRGC 32 - Refereed conference paper (with host publication)peer-review

Abstract

Hybrid action space, which combines discrete choices and continuous parameters, is prevalent in domains such as robot control and game AI. However, efficiently modeling and optimizing hybrid discrete-continuous action space remains a fundamental challenge, mainly due to limited policy expressiveness and poor scalability in high-dimensional settings. To address this challenge, we view the hybrid action space problem as a fully cooperative game and propose a Cooperative Hybrid Diffusion Policies (CHDP) framework to solve it. CHDP employs two cooperative agents that leverage a discrete and a continuous diffusion policy, respectively. The continuous policy is conditioned on the discrete action’s representation, explicitly modeling the dependency between them. This cooperative design allows the diffusion policies to leverage their expressiveness to capture complex distributions in their respective action spaces. To mitigate the update conflicts arising from simultaneous policy updates in this cooperative setting, we employ a sequential update scheme that fosters co-adaptation. Moreover, to improve scalability when learning in high-dimensional discrete action space, we construct a codebook that embeds the action space into a lowdimensional latent space. This mapping enables the discrete policy to learn in a compact, structured space. Finally, we design a Q-function-based guidance mechanism to align the codebook’s embeddings with the discrete policy’s representation during training. On challenging hybrid action benchmarks, CHDP outperforms the state-of-the-art method by up to 19.3% in success rate. © 2026, Association for the Advancement of Artificial Intelligence. All rights reserved.
Original languageEnglish
Title of host publicationProceedings of the 40th Annual AAAI Conference on Artificial Intelligence
EditorsSven Koenig, Chad Jenkins, Matthew Taylor
PublisherAAAI Press
Pages23640-23648
Number of pages9
ISBN (Print)1-57735-906-2, 978-1-57735-906-7
DOIs
Publication statusPublished - 2026
Event40th Annual AAAI Conference on Artificial Intelligence (AAAI-26) - , Singapore
Duration: 20 Jan 202627 Jan 2026
Conference number: 26
https://aaai.org/conference/aaai/aaai-26/

Publication series

NameProceedings of the AAAI Conference on Artificial Intelligence
Number28
Volume40
ISSN (Print)2159-5399
ISSN (Electronic)2374-3468

Conference

Conference40th Annual AAAI Conference on Artificial Intelligence (AAAI-26)
Abbreviated titleAAAI-26
PlaceSingapore
Period20/01/2627/01/26
Internet address

Funding

This work was supported in part by the National Natural Science Foundation of China under Grant 62272357 and 62302326; in part by Wuhan Science and Technology Joint Project for Building a Strong Transportation Country under Grant 2024-2-7; in part by The State Key Laboratory of Integrated Services Networks under Grant ISN25-09, and in part by the Wuhan Science and Technology Project for Key Research and Development No. 2024050702030090.

Fingerprint

Dive into the research topics of 'CHDP: Cooperative Hybrid Diffusion Policies for Reinforcement Learning in Parameterized Action Space'. Together they form a unique fingerprint.

Cite this