Abstract
This paper extends many of the recent popular policy evaluation algorithms to a generalized framework that includes least-squares temporal difference (LSTD) learning, least-squares policy evaluation (LSPE) and a variant of incremental LSTD (iLSTD). The basis of this extension is a preconditioning technique that solves a stochastic model equation. This paper also studies three significant issues of the new framework: it presents a new rule of step-size that can he computed online, provides an iterative way to apply preconditioning, and reduces the complexity of related algorithms to near that of temporal difference (TD) learning. Copyright 2008 by the author(s)/owner(s).
| Original language | English |
|---|---|
| Title of host publication | Proceedings of the 25th International Conference on Machine Learning |
| Pages | 1208-1215 |
| Publication status | Published - 2008 |
| Event | 25th International Conference on Machine Learning - Helsinki, Finland Duration: 5 Jul 2008 → 9 Jul 2008 |
Conference
| Conference | 25th International Conference on Machine Learning |
|---|---|
| Place | Finland |
| City | Helsinki |
| Period | 5/07/08 → 9/07/08 |