Approche contextuelle par régression pour les tests A/B
Abstract
In this work we devise a principled approach which mixes the contextual bandit framework
with the learning of a stratification procedure. The proposed algorithm is able to balance
contextual exploration and exploitation more efficiently than state-of-the-art bandit algorithms
for finite time at cost of a controlled probability for a linear regret. Finally, the learned structure
is easily interpretable by a human.