Learning Complex Neural Network Policies with Trajectory Optimization

Sergey Levine, Vladlen Koltun
Proceedings of the 31st International Conference on Machine Learning, PMLR 32(2):829-837, 2014.

Abstract

Direct policy search methods offer the promise of automatically learning controllers for complex, high-dimensional tasks. However, prior applications of policy search often required specialized, low-dimensional policy classes, limiting their generality. In this work, we introduce a policy search algorithm that can directly learn high-dimensional, general-purpose policies, represented by neural networks. We formulate the policy search problem as an optimization over trajectory distributions, alternating between optimizing the policy to match the trajectories, and optimizing the trajectories to match the policy and minimize expected cost. Our method can learn policies for complex tasks such as bipedal push recovery and walking on uneven terrain, while outperforming prior methods.

Cite this Paper


BibTeX
@InProceedings{pmlr-v32-levine14, title = {Learning Complex Neural Network Policies with Trajectory Optimization}, author = {Levine, Sergey and Koltun, Vladlen}, booktitle = {Proceedings of the 31st International Conference on Machine Learning}, pages = {829--837}, year = {2014}, editor = {Xing, Eric P. and Jebara, Tony}, volume = {32}, number = {2}, series = {Proceedings of Machine Learning Research}, address = {Bejing, China}, month = {22--24 Jun}, publisher = {PMLR}, pdf = {http://proceedings.mlr.press/v32/levine14.pdf}, url = {https://proceedings.mlr.press/v32/levine14.html}, abstract = {Direct policy search methods offer the promise of automatically learning controllers for complex, high-dimensional tasks. However, prior applications of policy search often required specialized, low-dimensional policy classes, limiting their generality. In this work, we introduce a policy search algorithm that can directly learn high-dimensional, general-purpose policies, represented by neural networks. We formulate the policy search problem as an optimization over trajectory distributions, alternating between optimizing the policy to match the trajectories, and optimizing the trajectories to match the policy and minimize expected cost. Our method can learn policies for complex tasks such as bipedal push recovery and walking on uneven terrain, while outperforming prior methods.} }
Endnote
%0 Conference Paper %T Learning Complex Neural Network Policies with Trajectory Optimization %A Sergey Levine %A Vladlen Koltun %B Proceedings of the 31st International Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2014 %E Eric P. Xing %E Tony Jebara %F pmlr-v32-levine14 %I PMLR %P 829--837 %U https://proceedings.mlr.press/v32/levine14.html %V 32 %N 2 %X Direct policy search methods offer the promise of automatically learning controllers for complex, high-dimensional tasks. However, prior applications of policy search often required specialized, low-dimensional policy classes, limiting their generality. In this work, we introduce a policy search algorithm that can directly learn high-dimensional, general-purpose policies, represented by neural networks. We formulate the policy search problem as an optimization over trajectory distributions, alternating between optimizing the policy to match the trajectories, and optimizing the trajectories to match the policy and minimize expected cost. Our method can learn policies for complex tasks such as bipedal push recovery and walking on uneven terrain, while outperforming prior methods.
RIS
TY - CPAPER TI - Learning Complex Neural Network Policies with Trajectory Optimization AU - Sergey Levine AU - Vladlen Koltun BT - Proceedings of the 31st International Conference on Machine Learning DA - 2014/06/18 ED - Eric P. Xing ED - Tony Jebara ID - pmlr-v32-levine14 PB - PMLR DP - Proceedings of Machine Learning Research VL - 32 IS - 2 SP - 829 EP - 837 L1 - http://proceedings.mlr.press/v32/levine14.pdf UR - https://proceedings.mlr.press/v32/levine14.html AB - Direct policy search methods offer the promise of automatically learning controllers for complex, high-dimensional tasks. However, prior applications of policy search often required specialized, low-dimensional policy classes, limiting their generality. In this work, we introduce a policy search algorithm that can directly learn high-dimensional, general-purpose policies, represented by neural networks. We formulate the policy search problem as an optimization over trajectory distributions, alternating between optimizing the policy to match the trajectories, and optimizing the trajectories to match the policy and minimize expected cost. Our method can learn policies for complex tasks such as bipedal push recovery and walking on uneven terrain, while outperforming prior methods. ER -
APA
Levine, S. & Koltun, V.. (2014). Learning Complex Neural Network Policies with Trajectory Optimization. Proceedings of the 31st International Conference on Machine Learning, in Proceedings of Machine Learning Research 32(2):829-837 Available from https://proceedings.mlr.press/v32/levine14.html.

Related Material