Calcul d'une politique déterministe dans un MDP avec récompenses imprécises
Abstract
In some real world applications of sequential decision making under uncertainty, a stochas-
tic policy is not easily interpretable for the system users. This might be due to the nature of the
problem or to the system requirements. In these contexts, it is more convenient (inevitable) to
provide a deterministic policy to the user. We propose an approach for computing a determin-
istic policy for a Markov Decision Process with Imprecise Rewards. To motivate the use of an
exact procedure for finding a deterministic policy, we show some cases where the intuitive idea
of using a deterministic policy obtained after “determinising" (rounding) the optimal stochastic
policy leads to a deterministic policy different from the optimal.