RNTI

MODULAD
Calcul d'une politique déterministe dans un MDP avec récompenses imprécises
In EGC 2019, vol. RNTI-E-35, pp.45-56
Abstract
In some real world applications of sequential decision making under uncertainty, a stochas- tic policy is not easily interpretable for the system users. This might be due to the nature of the problem or to the system requirements. In these contexts, it is more convenient (inevitable) to provide a deterministic policy to the user. We propose an approach for computing a determin- istic policy for a Markov Decision Process with Imprecise Rewards. To motivate the use of an exact procedure for finding a deterministic policy, we show some cases where the intuitive idea of using a deterministic policy obtained after “determinising" (rounding) the optimal stochastic policy leads to a deterministic policy different from the optimal.