Apprentissage machine pour la prédiction de l'attrition: une étude comparative
Abstract
Attrition rate prediction is a major economic concern for many companies. Different learning
approaches have been proposed, however, the a priori choice of the most suitable model
remains a non-trivial task as it is highly dependent on the intrinsic characteristics of the churn
data. Our study compares eight supervised learning methods combined with seven sampling
approaches on thirteen public churn data sets. Our evaluations, reported in terms of area under
the curve (AUC), explore the influence of rebalancing and data properties on the performance
of learning methods. We rely on the Nemenyi test and Correspondence Analysis as a means of
visualization of the associations between models, rebalancing and data. Our comparative study
identifies the best methods in an attrition context and proposes a powerful generic pipeline
based on an ensemble approach.