Home Page

Papers

Submissions

News

Editorial Board

Proceedings

Open Source Software

Search

Statistics

Login

Frequently Asked Questions

Contact Us



RSS Feed

Risk-Averse Learning by Temporal Difference Methods with Markov Risk Measures

Umit Köse, Andrzej Ruszczyński; 22(38):1−34, 2021.

Abstract

We propose a novel reinforcement learning methodology where the system performance is evaluated by a Markov coherent dynamic risk measure with the use of linear value function approximations. We construct projected risk-averse dynamic programming equations and study their properties. We propose new risk-averse counterparts of the basic and multi-step methods of temporal differences and we prove their convergence with probability one. We also perform an empirical study on a complex transportation problem.

[abs][pdf][bib]       
© JMLR 2021. (edit, beta)