A Finite-Time Analysis of Multi-armed Bandits Problems with Kullback-Leibler Divergences

Odalric-Ambrym Maillard, Rémi Munos, Gilles Stoltz ; JMLR W&CP 19:497-514, 2011.


We consider a Kullback-Leibler-based algorithm for the stochastic multi-armed bandit problem in the case of distributions with finite supports(not necessarily known beforehand), whose asymptotic regret matches the lower bound of \citet{Burnetas96}. Our contribution is to provide a finite-time analysis of this algorithm;we get bounds whose main terms are smaller than the ones of previously known algorithms with finite-time analyses (like UCB-type algorithms).

Page last modified on Sat Dec 17 01:07 CET 2011.