An approach for handling risk and uncertainty in multiarmed bandit problems
In EGC 2009, vol. RNTI-E-15, pp.115-126
An approach is presented to deal with risk in multiarmed bandit prob- lems. Specifically, the well known exploration-exploitation dilemma is solved from the point of view of maximizing an utility function which measures the decision maker's attitude towards risk and uncertain outcomes. A link with the preference theory is thus established. Simulations results are provided for in order to support the main ideas and to compare the approach with existing methods, with emphasis on the short term (small sample size) behavior of the proposed method.