Kernel-Based Decentralized Policy Evaluation for Reinforcement Learning

Jiamin Liu, Heng Lian*

*Corresponding author for this work

Research output: Journal Publications and ReviewsRGC 21 - Publication in refereed journalpeer-review

2 Citations (Scopus)

Abstract

We investigate the decentralized nonparametric policy evaluation problem within reinforcement learning (RL), focusing on scenarios where multiple agents collaborate to learn the state-value function using sampled state transitions and privately observed rewards. Our approach centers on a regression-based multistage iteration technique employing infinite-dimensional gradient descent (GD) within a reproducing kernel Hilbert space (RKHS). To make computation and communication more feasible, we employ Nyström approximation to project this space into a finite-dimensional one. We establish statistical error bounds to describe the convergence of value function estimation, marking the first instance of such analysis within a fully decentralized nonparametric framework. We compare the regression-based method to the kernel temporal difference (TD) method in some numerical studies. © 2024 IEEE.
Original languageEnglish
JournalIEEE Transactions on Neural Networks and Learning Systems
Online published17 Sept 2024
DOIs
Publication statusOnline published - 17 Sept 2024

Funding

The work of Jiamin Liu was supported in part by the NSFC at University of Science and Technology Beijing under Grant 12401332. The work of Heng Lian was supported in part by NSFC at CityU Shenzhen Research Institute under Grant 12371297; in part by NSF of Jiangxi Province under Grant 20223BCJ25017; in part by Hong Kong RGC General Research Fund under Grant 11300424, Grant 11300721, and Grant 11311822; and in part by the CityU Internal under Grant 7006014.

Research Keywords

  • Gradient descent (GD)
  • multiagent reinforcement learning (MARL)
  • policy iteration
  • reinforcement learning (RL)
  • reproducing kernel Hilbert space (RKHS)
  • state-value function

Fingerprint

Dive into the research topics of 'Kernel-Based Decentralized Policy Evaluation for Reinforcement Learning'. Together they form a unique fingerprint.

Cite this