Quand les sous-groupes rencontrent les graduels : découverte de sous-groupes identifiant des corrélations exceptionnelles
Abstract
Subgroup discovery (SD) is a mature field at the frontier of data mining and machine learn-
ing. It gathers methods designed to find coherent subgroups of a dataset where one or more
targets interact in an unusual way. Correlation model classes have already been defined to
discover interesting subgroups when dealing with two numerical targets. However, in this
supervised setting, the two numerical targets are fixed before the subgroup search. To make
unsupervised exploration possible, we propose to search for arbitrary subsets of numerical tar-
gets whose correlation is exceptional for an automatically found subgroup. We introduce the
problem of rank-correlated subgroup discovery with an arbitrary subset of numerical targets. A
rank-correlated subgroup is identified by both conditions on descriptive attributes, whether nu-
meric or nominal, and a pattern on numeric attributes that captures (positive or negative) rank
correlations. We define a new branch-and-bound algorithm that exploits some pruning proper-
ties. An empirical study on several datasets demonstrates the efficiency and the effectiveness
of the algorithm.