Greedy algorithm almost dominates in smoothed contextual bandits

M Raghavan, A Slivkins, JW Vaughan, ZS Wu - SIAM Journal on Computing, 2023 - SIAM
SIAM Journal on Computing, 2023SIAM
Online learning algorithms, widely used to power search and content optimization on the
web, must balance exploration and exploitation, potentially sacrificing the experience of
current users in order to gain information that will lead to better decisions in the future. While
necessary in the worst case, explicit exploration has a number of disadvantages compared
to the greedy algorithm that always “exploits” by choosing an action that currently looks
optimal. We determine under what conditions inherent diversity in the data makes explicit …
Abstract
Online learning algorithms, widely used to power search and content optimization on the web, must balance exploration and exploitation, potentially sacrificing the experience of current users in order to gain information that will lead to better decisions in the future. While necessary in the worst case, explicit exploration has a number of disadvantages compared to the greedy algorithm that always “exploits” by choosing an action that currently looks optimal. We determine under what conditions inherent diversity in the data makes explicit exploration unnecessary. We build on a recent line of work on the smoothed analysis of the greedy algorithm in the linear contextual bandits model. We improve on prior results to show that the greedy algorithm almost matches the best possible Bayesian regret rate of any other algorithm on the same problem instance whenever the diversity conditions hold. The key technical finding is that data collected by the greedy algorithm suffices to simulate a run of any other algorithm. Further, we prove that under a particular smoothness assumption, the Bayesian regret of the greedy algorithm is at most in the worst case, where is the time horizon.
Society for Industrial and Applied Mathematics