Preconditioned temporal difference learning

Hengshuai Yao, Zhi-Qiaug Liu

Research output: Chapters, Conference Papers, Creative and Literary WorksRGC 32 - Refereed conference paper (with host publication)peer-review

14 Citations (Scopus)

Abstract

This paper extends many of the recent popular policy evaluation algorithms to a generalized framework that includes least-squares temporal difference (LSTD) learning, least-squares policy evaluation (LSPE) and a variant of incremental LSTD (iLSTD). The basis of this extension is a preconditioning technique that solves a stochastic model equation. This paper also studies three significant issues of the new framework: it presents a new rule of step-size that can he computed online, provides an iterative way to apply preconditioning, and reduces the complexity of related algorithms to near that of temporal difference (TD) learning. Copyright 2008 by the author(s)/owner(s).
Original languageEnglish
Title of host publicationProceedings of the 25th International Conference on Machine Learning
Pages1208-1215
Publication statusPublished - 2008
Event25th International Conference on Machine Learning - Helsinki, Finland
Duration: 5 Jul 20089 Jul 2008

Conference

Conference25th International Conference on Machine Learning
PlaceFinland
CityHelsinki
Period5/07/089/07/08

Fingerprint

Dive into the research topics of 'Preconditioned temporal difference learning'. Together they form a unique fingerprint.

Cite this