An Expectation Maximization Algorithm for Continuous Markov Decision Processes with Arbitrary Reward

Matthew Hoffman; Nando Freitas; Arnaud Doucet; Jan Peters

An Expectation Maximization Algorithm for Continuous Markov Decision Processes with Arbitrary Reward

Matthew Hoffman, Nando Freitas, Arnaud Doucet, Jan Peters

Proceedings of the Twelfth International Conference on Artificial Intelligence and Statistics, PMLR 5:232-239, 2009.

Abstract

We derive a new expectation maximization algorithm for policy optimization in linear Gaussian Markov decision processes, where the reward function is parameterised in terms of a flexible mixture of Gaussians. This approach exploits both analytical tractability and numerical optimization. Consequently, on the one hand, it is more flexible and general than closed-form solutions, such as the widely used linear quadratic Gaussian (LQG) controllers. On the other hand, it is more accurate and faster than optimization methods that rely on approximation and simulation. Partial analytical solutions (though costly) eliminate the need for simulation and, hence, avoid approximation error. The experiments will show that for the same cost of computation, policy optimization methods that rely on analytical tractability have higher value than the ones that rely on simulation.

Cite this Paper

BibTeX


@InProceedings{pmlr-v5-hoffman09a,
  title = 	 {An Expectation Maximization Algorithm for Continuous Markov Decision Processes with Arbitrary Reward},
  author = 	 {Hoffman, Matthew and Freitas, Nando and Doucet, Arnaud and Peters, Jan},
  booktitle = 	 {Proceedings of the Twelfth International Conference on Artificial Intelligence and Statistics},
  pages = 	 {232--239},
  year = 	 {2009},
  editor = 	 {van Dyk, David and Welling, Max},
  volume = 	 {5},
  series = 	 {Proceedings of Machine Learning Research},
  address = 	 {Hilton Clearwater Beach Resort, Clearwater Beach, Florida USA},
  month = 	 {16--18 Apr},
  publisher =    {PMLR},
  pdf = 	 {http://proceedings.mlr.press/v5/hoffman09a/hoffman09a.pdf},
  url = 	 {https://proceedings.mlr.press/v5/hoffman09a.html},
  abstract = 	 {We derive a new expectation maximization algorithm for policy optimization in  linear Gaussian Markov decision processes, where the reward function is  parameterised in terms of a flexible mixture of Gaussians. This approach  exploits both analytical tractability and numerical optimization. Consequently,  on the one hand, it is more flexible and general than closed-form solutions,  such as the widely used linear quadratic Gaussian (LQG) controllers. On the  other hand, it is more accurate and faster than optimization methods that rely  on approximation and simulation. Partial analytical solutions (though costly)  eliminate the need for simulation and, hence, avoid approximation error. The  experiments will show that for the same cost of computation, policy  optimization methods that rely on analytical tractability have higher value  than the ones that rely on simulation.}
}

Endnote

%0 Conference Paper
%T An Expectation Maximization Algorithm for Continuous Markov Decision Processes with Arbitrary Reward
%A Matthew Hoffman
%A Nando Freitas
%A Arnaud Doucet
%A Jan Peters
%B Proceedings of the Twelfth International Conference on Artificial Intelligence and Statistics
%C Proceedings of Machine Learning Research
%D 2009
%E David van Dyk
%E Max Welling	
%F pmlr-v5-hoffman09a
%I PMLR
%P 232--239
%U https://proceedings.mlr.press/v5/hoffman09a.html
%V 5
%X We derive a new expectation maximization algorithm for policy optimization in  linear Gaussian Markov decision processes, where the reward function is  parameterised in terms of a flexible mixture of Gaussians. This approach  exploits both analytical tractability and numerical optimization. Consequently,  on the one hand, it is more flexible and general than closed-form solutions,  such as the widely used linear quadratic Gaussian (LQG) controllers. On the  other hand, it is more accurate and faster than optimization methods that rely  on approximation and simulation. Partial analytical solutions (though costly)  eliminate the need for simulation and, hence, avoid approximation error. The  experiments will show that for the same cost of computation, policy  optimization methods that rely on analytical tractability have higher value  than the ones that rely on simulation.

RIS


TY  - CPAPER
TI  - An Expectation Maximization Algorithm for Continuous Markov Decision Processes with Arbitrary Reward
AU  - Matthew Hoffman
AU  - Nando Freitas
AU  - Arnaud Doucet
AU  - Jan Peters
BT  - Proceedings of the Twelfth International Conference on Artificial Intelligence and Statistics
DA  - 2009/04/15
ED  - David van Dyk
ED  - Max Welling	
ID  - pmlr-v5-hoffman09a
PB  - PMLR
DP  - Proceedings of Machine Learning Research
VL  - 5
SP  - 232
EP  - 239
L1  - http://proceedings.mlr.press/v5/hoffman09a/hoffman09a.pdf
UR  - https://proceedings.mlr.press/v5/hoffman09a.html
AB  - We derive a new expectation maximization algorithm for policy optimization in  linear Gaussian Markov decision processes, where the reward function is  parameterised in terms of a flexible mixture of Gaussians. This approach  exploits both analytical tractability and numerical optimization. Consequently,  on the one hand, it is more flexible and general than closed-form solutions,  such as the widely used linear quadratic Gaussian (LQG) controllers. On the  other hand, it is more accurate and faster than optimization methods that rely  on approximation and simulation. Partial analytical solutions (though costly)  eliminate the need for simulation and, hence, avoid approximation error. The  experiments will show that for the same cost of computation, policy  optimization methods that rely on analytical tractability have higher value  than the ones that rely on simulation.
ER  -

APA


Hoffman, M., Freitas, N., Doucet, A. & Peters, J.. (2009). An Expectation Maximization Algorithm for Continuous Markov Decision Processes with Arbitrary Reward. Proceedings of the Twelfth International Conference on Artificial Intelligence and Statistics, in Proceedings of Machine Learning Research 5:232-239 Available from https://proceedings.mlr.press/v5/hoffman09a.html.

Related Material

Download PDF