Improved Regret Bounds for Undiscounted Continuous Reinforcement Learning

K. Lakshmanan, Ronald Ortner, Daniil Ryabko
Proceedings of the 32nd International Conference on Machine Learning, PMLR 37:524-532, 2015.

Abstract

We consider the problem of undiscounted reinforcement learning in continuous state space. Regret bounds in this setting usually hold under various assumptions on the structure of the reward and transition function. Under the assumption that the rewards and transition probabilities are Lipschitz, for 1-dimensional state space a regret bound of O(T^3/4) after any T steps has been given by Ortner and Ryabko (2012). Here we improve upon this result by using non-parametric kernel density estimation for estimating the transition probability distributions, and obtain regret bounds that depend on the smoothness of the transition probability distributions. In particular, under the assumption that the transition probability functions are smoothly differentiable, the regret bound is shown to be O(T^2/3) asymptotically for reinforcement learning in 1-dimensional state space. Finally, we also derive improved regret bounds for higher dimensional state space.

Cite this Paper


BibTeX
@InProceedings{pmlr-v37-lakshmanan15, title = {Improved Regret Bounds for Undiscounted Continuous Reinforcement Learning}, author = {Lakshmanan, K. and Ortner, Ronald and Ryabko, Daniil}, booktitle = {Proceedings of the 32nd International Conference on Machine Learning}, pages = {524--532}, year = {2015}, editor = {Bach, Francis and Blei, David}, volume = {37}, series = {Proceedings of Machine Learning Research}, address = {Lille, France}, month = {07--09 Jul}, publisher = {PMLR}, pdf = {http://proceedings.mlr.press/v37/lakshmanan15.pdf}, url = {https://proceedings.mlr.press/v37/lakshmanan15.html}, abstract = {We consider the problem of undiscounted reinforcement learning in continuous state space. Regret bounds in this setting usually hold under various assumptions on the structure of the reward and transition function. Under the assumption that the rewards and transition probabilities are Lipschitz, for 1-dimensional state space a regret bound of O(T^3/4) after any T steps has been given by Ortner and Ryabko (2012). Here we improve upon this result by using non-parametric kernel density estimation for estimating the transition probability distributions, and obtain regret bounds that depend on the smoothness of the transition probability distributions. In particular, under the assumption that the transition probability functions are smoothly differentiable, the regret bound is shown to be O(T^2/3) asymptotically for reinforcement learning in 1-dimensional state space. Finally, we also derive improved regret bounds for higher dimensional state space.} }
Endnote
%0 Conference Paper %T Improved Regret Bounds for Undiscounted Continuous Reinforcement Learning %A K. Lakshmanan %A Ronald Ortner %A Daniil Ryabko %B Proceedings of the 32nd International Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2015 %E Francis Bach %E David Blei %F pmlr-v37-lakshmanan15 %I PMLR %P 524--532 %U https://proceedings.mlr.press/v37/lakshmanan15.html %V 37 %X We consider the problem of undiscounted reinforcement learning in continuous state space. Regret bounds in this setting usually hold under various assumptions on the structure of the reward and transition function. Under the assumption that the rewards and transition probabilities are Lipschitz, for 1-dimensional state space a regret bound of O(T^3/4) after any T steps has been given by Ortner and Ryabko (2012). Here we improve upon this result by using non-parametric kernel density estimation for estimating the transition probability distributions, and obtain regret bounds that depend on the smoothness of the transition probability distributions. In particular, under the assumption that the transition probability functions are smoothly differentiable, the regret bound is shown to be O(T^2/3) asymptotically for reinforcement learning in 1-dimensional state space. Finally, we also derive improved regret bounds for higher dimensional state space.
RIS
TY - CPAPER TI - Improved Regret Bounds for Undiscounted Continuous Reinforcement Learning AU - K. Lakshmanan AU - Ronald Ortner AU - Daniil Ryabko BT - Proceedings of the 32nd International Conference on Machine Learning DA - 2015/06/01 ED - Francis Bach ED - David Blei ID - pmlr-v37-lakshmanan15 PB - PMLR DP - Proceedings of Machine Learning Research VL - 37 SP - 524 EP - 532 L1 - http://proceedings.mlr.press/v37/lakshmanan15.pdf UR - https://proceedings.mlr.press/v37/lakshmanan15.html AB - We consider the problem of undiscounted reinforcement learning in continuous state space. Regret bounds in this setting usually hold under various assumptions on the structure of the reward and transition function. Under the assumption that the rewards and transition probabilities are Lipschitz, for 1-dimensional state space a regret bound of O(T^3/4) after any T steps has been given by Ortner and Ryabko (2012). Here we improve upon this result by using non-parametric kernel density estimation for estimating the transition probability distributions, and obtain regret bounds that depend on the smoothness of the transition probability distributions. In particular, under the assumption that the transition probability functions are smoothly differentiable, the regret bound is shown to be O(T^2/3) asymptotically for reinforcement learning in 1-dimensional state space. Finally, we also derive improved regret bounds for higher dimensional state space. ER -
APA
Lakshmanan, K., Ortner, R. & Ryabko, D.. (2015). Improved Regret Bounds for Undiscounted Continuous Reinforcement Learning. Proceedings of the 32nd International Conference on Machine Learning, in Proceedings of Machine Learning Research 37:524-532 Available from https://proceedings.mlr.press/v37/lakshmanan15.html.

Related Material