Coherent Inference on Optimal Play in Game Trees

Philipp Hennig, David Stern, Thore Graepel
Proceedings of the Thirteenth International Conference on Artificial Intelligence and Statistics, PMLR 9:326-333, 2010.

Abstract

Round-based games are an instance of discrete planning problems. Some of the best contemporary game tree search algorithms use random roll-outs as data. Relying on a good policy, they learn on-policy values by propagating information upwards in the tree, but not between sibling nodes. Here, we present a generative model and a corresponding approximate message passing scheme for inference on the optimal, off-policy value of nodes in smooth AND/OR trees, given random roll-outs. The crucial insight is that the distribution of values in game trees is not completely arbitrary. We define a generative model of the on-policy values using a latent score for each state, representing the value under the random roll-out policy. Inference on the values under the optimal policy separates into an inductive, pre-data step and a deductive, post-data part. Both can be solved approximately with Expectation Propagation, allowing off-policy value inference for any node in the (exponentially big) tree in linear time.

Cite this Paper


BibTeX
@InProceedings{pmlr-v9-hennig10a, title = {Coherent Inference on Optimal Play in Game Trees}, author = {Hennig, Philipp and Stern, David and Graepel, Thore}, booktitle = {Proceedings of the Thirteenth International Conference on Artificial Intelligence and Statistics}, pages = {326--333}, year = {2010}, editor = {Teh, Yee Whye and Titterington, Mike}, volume = {9}, series = {Proceedings of Machine Learning Research}, address = {Chia Laguna Resort, Sardinia, Italy}, month = {13--15 May}, publisher = {PMLR}, pdf = {http://proceedings.mlr.press/v9/hennig10a/hennig10a.pdf}, url = {https://proceedings.mlr.press/v9/hennig10a.html}, abstract = {Round-based games are an instance of discrete planning problems. Some of the best contemporary game tree search algorithms use random roll-outs as data. Relying on a good policy, they learn on-policy values by propagating information upwards in the tree, but not between sibling nodes. Here, we present a generative model and a corresponding approximate message passing scheme for inference on the optimal, off-policy value of nodes in smooth AND/OR trees, given random roll-outs. The crucial insight is that the distribution of values in game trees is not completely arbitrary. We define a generative model of the on-policy values using a latent score for each state, representing the value under the random roll-out policy. Inference on the values under the optimal policy separates into an inductive, pre-data step and a deductive, post-data part. Both can be solved approximately with Expectation Propagation, allowing off-policy value inference for any node in the (exponentially big) tree in linear time.} }
Endnote
%0 Conference Paper %T Coherent Inference on Optimal Play in Game Trees %A Philipp Hennig %A David Stern %A Thore Graepel %B Proceedings of the Thirteenth International Conference on Artificial Intelligence and Statistics %C Proceedings of Machine Learning Research %D 2010 %E Yee Whye Teh %E Mike Titterington %F pmlr-v9-hennig10a %I PMLR %P 326--333 %U https://proceedings.mlr.press/v9/hennig10a.html %V 9 %X Round-based games are an instance of discrete planning problems. Some of the best contemporary game tree search algorithms use random roll-outs as data. Relying on a good policy, they learn on-policy values by propagating information upwards in the tree, but not between sibling nodes. Here, we present a generative model and a corresponding approximate message passing scheme for inference on the optimal, off-policy value of nodes in smooth AND/OR trees, given random roll-outs. The crucial insight is that the distribution of values in game trees is not completely arbitrary. We define a generative model of the on-policy values using a latent score for each state, representing the value under the random roll-out policy. Inference on the values under the optimal policy separates into an inductive, pre-data step and a deductive, post-data part. Both can be solved approximately with Expectation Propagation, allowing off-policy value inference for any node in the (exponentially big) tree in linear time.
RIS
TY - CPAPER TI - Coherent Inference on Optimal Play in Game Trees AU - Philipp Hennig AU - David Stern AU - Thore Graepel BT - Proceedings of the Thirteenth International Conference on Artificial Intelligence and Statistics DA - 2010/03/31 ED - Yee Whye Teh ED - Mike Titterington ID - pmlr-v9-hennig10a PB - PMLR DP - Proceedings of Machine Learning Research VL - 9 SP - 326 EP - 333 L1 - http://proceedings.mlr.press/v9/hennig10a/hennig10a.pdf UR - https://proceedings.mlr.press/v9/hennig10a.html AB - Round-based games are an instance of discrete planning problems. Some of the best contemporary game tree search algorithms use random roll-outs as data. Relying on a good policy, they learn on-policy values by propagating information upwards in the tree, but not between sibling nodes. Here, we present a generative model and a corresponding approximate message passing scheme for inference on the optimal, off-policy value of nodes in smooth AND/OR trees, given random roll-outs. The crucial insight is that the distribution of values in game trees is not completely arbitrary. We define a generative model of the on-policy values using a latent score for each state, representing the value under the random roll-out policy. Inference on the values under the optimal policy separates into an inductive, pre-data step and a deductive, post-data part. Both can be solved approximately with Expectation Propagation, allowing off-policy value inference for any node in the (exponentially big) tree in linear time. ER -
APA
Hennig, P., Stern, D. & Graepel, T.. (2010). Coherent Inference on Optimal Play in Game Trees. Proceedings of the Thirteenth International Conference on Artificial Intelligence and Statistics, in Proceedings of Machine Learning Research 9:326-333 Available from https://proceedings.mlr.press/v9/hennig10a.html.

Related Material