Home Page

Papers

Submissions

News

Scope

Editorial Board

Announcements

Proceedings

Open Source Software

Search

Login



RSS Feed

Sleeping Experts and Bandits with Stochastic Action Availability and Adversarial Rewards

Varun Kanade, H. Brendan McMahan, Brent Bryan; JMLR W&CP 5:272-279, 2009.

Abstract

We consider algorithms for selecting actions in order to maximize rewards chosen by an adversary, where the set of actions available on any given round is selected stochastically. We present the first polynomial-time no-regret algorithms for this setting. In the full-observation (experts) version of the problem, we present an exponential-weights algorithm that achieves regret O(\sqrt{T log n}), which is the best possible. For the bandit setting (where the algorithm only observes the reward of the action selected), we present a no-regret algorithm based on follow-the-perturbed-leader. This algorithm runs in polynomial time, unlike the EXP4 algorithm which can also be applied to this setting. Our algorithm has the interesting interpretation of solving a geometric experts problem where the embedding in which rewards are linear is never explicitly constructed. We argue that this adversarial-reward, stochastic availability formulation is important in practice, as assuming stationary stochastic rewards is unrealistic in many domains.



Home Page

Papers

Submissions

News

Scope

Editorial Board

Announcements

Proceedings

Open Source Software

Search

Login



RSS Feed

Page last modified on Fri Apr 3 20:30:46 BST 2009.

Copyright @ JMLR 2000. All rights reserved.