Kernel-Based Decentralized Policy Evaluation for Reinforcement Learning

Research output: Journal Publications and ReviewsRGC 21 - Publication in refereed journalpeer-review

1 Scopus Citations
View graph of relations

Author(s)

Detail(s)

Original languageEnglish
Journal / PublicationIEEE Transactions on Neural Networks and Learning Systems
Online published17 Sept 2024
Publication statusOnline published - 17 Sept 2024

Abstract

We investigate the decentralized nonparametric policy evaluation problem within reinforcement learning (RL), focusing on scenarios where multiple agents collaborate to learn the state-value function using sampled state transitions and privately observed rewards. Our approach centers on a regression-based multistage iteration technique employing infinite-dimensional gradient descent (GD) within a reproducing kernel Hilbert space (RKHS). To make computation and communication more feasible, we employ Nyström approximation to project this space into a finite-dimensional one. We establish statistical error bounds to describe the convergence of value function estimation, marking the first instance of such analysis within a fully decentralized nonparametric framework. We compare the regression-based method to the kernel temporal difference (TD) method in some numerical studies. © 2024 IEEE.

Research Area(s)

  • Gradient descent (GD), multiagent reinforcement learning (MARL), policy iteration, reinforcement learning (RL), reproducing kernel Hilbert space (RKHS), state-value function