An approach for handling risk and uncertainty in multiarmed bandit problems
Abstract
An approach is presented to deal with risk in multiarmed bandit prob-
lems. Specifically, the well known exploration-exploitation dilemma is solved
from the point of view of maximizing an utility function which measures the
decision maker's attitude towards risk and uncertain outcomes. A link with
the preference theory is thus established. Simulations results are provided for
in order to support the main ideas and to compare the approach with existing
methods, with emphasis on the short term (small sample size) behavior of the
proposed method.