Minimax Regret of Finite Partial-Monitoring Games in Stochastic Environments

Gábor Bartók; Dávid Pál; Csaba Szepesvári

Minimax Regret of Finite Partial-Monitoring Games in Stochastic Environments

Gábor Bartók, Dávid Pál, Csaba Szepesvári

Proceedings of the 24th Annual Conference on Learning Theory, PMLR 19:133-154, 2011.

Abstract

In a partial monitoring game, the learner repeatedly chooses an action, the environment responds with an outcome, and then the learner suffers a loss and receives a feedback signal, both of which are fixed functions of the action and the outcome. The goal of the learner is to minimize his regret, which is the difference between his total cumulative loss and the total loss of the best fixed action in hindsight. Assuming that the outcomes are generated in an i.i.d. fashion from an arbitrary and unknown probability distribution, we characterize the minimax regret of any partial monitoring game with finitely many actions and outcomes. It turns out that the minimax regret of any such game is either zero, $\widetilde{\Theta}(\sqrt{T}), \Theta(T^{2/3})$, or $\Theta(T)$. We provide a computationally efficient learning algorithm that achieves the minimax regret within logarithmic factor for any game.

Cite this Paper

BibTeX


@InProceedings{pmlr-v19-bartok11a,
  title = 	 {Minimax Regret of Finite Partial-Monitoring Games in Stochastic Environments},
  author = 	 {Bartók, Gábor and Pál, Dávid and Szepesvári, Csaba},
  booktitle = 	 {Proceedings of the 24th Annual Conference on Learning Theory},
  pages = 	 {133--154},
  year = 	 {2011},
  editor = 	 {Kakade, Sham M. and von Luxburg, Ulrike},
  volume = 	 {19},
  series = 	 {Proceedings of Machine Learning Research},
  address = 	 {Budapest, Hungary},
  month = 	 {09--11 Jun},
  publisher =    {PMLR},
  pdf = 	 {http://proceedings.mlr.press/v19/bartok11a/bartok11a.pdf},
  url = 	 {https://proceedings.mlr.press/v19/bartok11a.html},
  abstract = 	 {In a partial monitoring game, the learner repeatedly chooses an action, the environment responds with an outcome, and then the learner suffers a loss and receives a feedback signal, both of which are fixed functions of the action and the outcome. The goal of the learner is to minimize his regret, which is the difference between his total cumulative loss and the total loss of the best fixed action in hindsight. Assuming that the outcomes are generated in an i.i.d. fashion from an arbitrary and unknown probability distribution, we characterize the minimax regret of any partial monitoring game with finitely many actions and outcomes. It turns out that the minimax regret of any such game is either zero, $\widetilde{\Theta}(\sqrt{T}), \Theta(T^{2/3})$, or $\Theta(T)$. We provide a computationally efficient learning algorithm that achieves the minimax regret within logarithmic factor for any game.}
}

Endnote

%0 Conference Paper
%T Minimax Regret of Finite Partial-Monitoring Games in Stochastic Environments
%A Gábor Bartók
%A Dávid Pál
%A Csaba Szepesvári
%B Proceedings of the 24th Annual Conference on Learning Theory
%C Proceedings of Machine Learning Research
%D 2011
%E Sham M. Kakade
%E Ulrike von Luxburg	
%F pmlr-v19-bartok11a
%I PMLR
%P 133--154
%U https://proceedings.mlr.press/v19/bartok11a.html
%V 19
%X In a partial monitoring game, the learner repeatedly chooses an action, the environment responds with an outcome, and then the learner suffers a loss and receives a feedback signal, both of which are fixed functions of the action and the outcome. The goal of the learner is to minimize his regret, which is the difference between his total cumulative loss and the total loss of the best fixed action in hindsight. Assuming that the outcomes are generated in an i.i.d. fashion from an arbitrary and unknown probability distribution, we characterize the minimax regret of any partial monitoring game with finitely many actions and outcomes. It turns out that the minimax regret of any such game is either zero, $\widetilde{\Theta}(\sqrt{T}), \Theta(T^{2/3})$, or $\Theta(T)$. We provide a computationally efficient learning algorithm that achieves the minimax regret within logarithmic factor for any game.

RIS


TY  - CPAPER
TI  - Minimax Regret of Finite Partial-Monitoring Games in Stochastic Environments
AU  - Gábor Bartók
AU  - Dávid Pál
AU  - Csaba Szepesvári
BT  - Proceedings of the 24th Annual Conference on Learning Theory
DA  - 2011/12/21
ED  - Sham M. Kakade
ED  - Ulrike von Luxburg	
ID  - pmlr-v19-bartok11a
PB  - PMLR
DP  - Proceedings of Machine Learning Research
VL  - 19
SP  - 133
EP  - 154
L1  - http://proceedings.mlr.press/v19/bartok11a/bartok11a.pdf
UR  - https://proceedings.mlr.press/v19/bartok11a.html
AB  - In a partial monitoring game, the learner repeatedly chooses an action, the environment responds with an outcome, and then the learner suffers a loss and receives a feedback signal, both of which are fixed functions of the action and the outcome. The goal of the learner is to minimize his regret, which is the difference between his total cumulative loss and the total loss of the best fixed action in hindsight. Assuming that the outcomes are generated in an i.i.d. fashion from an arbitrary and unknown probability distribution, we characterize the minimax regret of any partial monitoring game with finitely many actions and outcomes. It turns out that the minimax regret of any such game is either zero, $\widetilde{\Theta}(\sqrt{T}), \Theta(T^{2/3})$, or $\Theta(T)$. We provide a computationally efficient learning algorithm that achieves the minimax regret within logarithmic factor for any game.
ER  -

APA


Bartók, G., Pál, D. & Szepesvári, C.. (2011). Minimax Regret of Finite Partial-Monitoring Games in Stochastic Environments. Proceedings of the 24th Annual Conference on Learning Theory, in Proceedings of Machine Learning Research 19:133-154 Available from https://proceedings.mlr.press/v19/bartok11a.html.

Related Material

Download PDF