Opponent Modeling in Deep Reinforcement Learning

He He, Jordan Boyd-Graber, Kevin Kwok, Hal Daumé III
Proceedings of The 33rd International Conference on Machine Learning, PMLR 48:1804-1813, 2016.

Abstract

Opponent modeling is necessary in multi-agent settings where secondary agents with competing goals also adapt their strategies, yet it remains challenging because of strategies’ complex interaction and the non-stationary nature. Most previous work focuses on developing probabilistic models or parameterized strategies for specific applications. Inspired by the recent success of deep reinforcement learning, we present neural-based models that jointly learn a policy and the behavior of opponents. Instead of explicitly predicting the opponent’s action, we encode observation of the opponents into a deep Q-Network (DQN), while retaining explicit modeling under multitasking. By using a Mixture-of-Experts architecture, our model automatically discovers different strategy patterns of opponents even without extra supervision. We evaluate our models on a simulated soccer game and a popular trivia game, showing superior performance over DQN and its variants.

Cite this Paper


BibTeX
@InProceedings{pmlr-v48-he16, title = {Opponent Modeling in Deep Reinforcement Learning}, author = {He, He and Boyd-Graber, Jordan and Kwok, Kevin and Daum\'e, III, Hal}, booktitle = {Proceedings of The 33rd International Conference on Machine Learning}, pages = {1804--1813}, year = {2016}, editor = {Balcan, Maria Florina and Weinberger, Kilian Q.}, volume = {48}, series = {Proceedings of Machine Learning Research}, address = {New York, New York, USA}, month = {20--22 Jun}, publisher = {PMLR}, pdf = {http://proceedings.mlr.press/v48/he16.pdf}, url = {https://proceedings.mlr.press/v48/he16.html}, abstract = {Opponent modeling is necessary in multi-agent settings where secondary agents with competing goals also adapt their strategies, yet it remains challenging because of strategies’ complex interaction and the non-stationary nature. Most previous work focuses on developing probabilistic models or parameterized strategies for specific applications. Inspired by the recent success of deep reinforcement learning, we present neural-based models that jointly learn a policy and the behavior of opponents. Instead of explicitly predicting the opponent’s action, we encode observation of the opponents into a deep Q-Network (DQN), while retaining explicit modeling under multitasking. By using a Mixture-of-Experts architecture, our model automatically discovers different strategy patterns of opponents even without extra supervision. We evaluate our models on a simulated soccer game and a popular trivia game, showing superior performance over DQN and its variants.} }
Endnote
%0 Conference Paper %T Opponent Modeling in Deep Reinforcement Learning %A He He %A Jordan Boyd-Graber %A Kevin Kwok %A Hal Daumé, III %B Proceedings of The 33rd International Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2016 %E Maria Florina Balcan %E Kilian Q. Weinberger %F pmlr-v48-he16 %I PMLR %P 1804--1813 %U https://proceedings.mlr.press/v48/he16.html %V 48 %X Opponent modeling is necessary in multi-agent settings where secondary agents with competing goals also adapt their strategies, yet it remains challenging because of strategies’ complex interaction and the non-stationary nature. Most previous work focuses on developing probabilistic models or parameterized strategies for specific applications. Inspired by the recent success of deep reinforcement learning, we present neural-based models that jointly learn a policy and the behavior of opponents. Instead of explicitly predicting the opponent’s action, we encode observation of the opponents into a deep Q-Network (DQN), while retaining explicit modeling under multitasking. By using a Mixture-of-Experts architecture, our model automatically discovers different strategy patterns of opponents even without extra supervision. We evaluate our models on a simulated soccer game and a popular trivia game, showing superior performance over DQN and its variants.
RIS
TY - CPAPER TI - Opponent Modeling in Deep Reinforcement Learning AU - He He AU - Jordan Boyd-Graber AU - Kevin Kwok AU - Hal Daumé, III BT - Proceedings of The 33rd International Conference on Machine Learning DA - 2016/06/11 ED - Maria Florina Balcan ED - Kilian Q. Weinberger ID - pmlr-v48-he16 PB - PMLR DP - Proceedings of Machine Learning Research VL - 48 SP - 1804 EP - 1813 L1 - http://proceedings.mlr.press/v48/he16.pdf UR - https://proceedings.mlr.press/v48/he16.html AB - Opponent modeling is necessary in multi-agent settings where secondary agents with competing goals also adapt their strategies, yet it remains challenging because of strategies’ complex interaction and the non-stationary nature. Most previous work focuses on developing probabilistic models or parameterized strategies for specific applications. Inspired by the recent success of deep reinforcement learning, we present neural-based models that jointly learn a policy and the behavior of opponents. Instead of explicitly predicting the opponent’s action, we encode observation of the opponents into a deep Q-Network (DQN), while retaining explicit modeling under multitasking. By using a Mixture-of-Experts architecture, our model automatically discovers different strategy patterns of opponents even without extra supervision. We evaluate our models on a simulated soccer game and a popular trivia game, showing superior performance over DQN and its variants. ER -
APA
He, H., Boyd-Graber, J., Kwok, K. & Daumé, III, H.. (2016). Opponent Modeling in Deep Reinforcement Learning. Proceedings of The 33rd International Conference on Machine Learning, in Proceedings of Machine Learning Research 48:1804-1813 Available from https://proceedings.mlr.press/v48/he16.html.

Related Material