Minimax Regret of Finite Partial-Monitoring Games in Stochastic Environments

Gábor Bartók, Dávid Pál, Csaba Szepesvári ; JMLR W&CP 19:133-154, 2011.


In a partial monitoring game, the learner repeatedly chooses an action, theenvironment responds with an outcome, and then the learner suffers a loss andreceives a feedback signal, both of which are fixed functions of the action andthe outcome. The goal of the learner is to minimize his regret, which is thedifference between his total cumulative loss and the total loss of the bestfixed action in hindsight.Assuming that the outcomes are generated in an i.i.d. fashion from an arbitrary andunknown probability distribution, we characterize the minimax regret of anypartial monitoring game with finitely many actions andoutcomes. It turns out that the minimax regret of any such game is either zero,$\widetilde{\Theta}(\sqrt{T})$, $\Theta(T^{2/3})$, or $\Theta(T)$. We provide a computationally efficient learningalgorithm that achieves the minimax regret within logarithmic factor for any game.

Page last modified on Sat Dec 17 01:05 CET 2011.