Fouille de motifs diversifiés : une approche basée sur la relaxation et l'échantillonnage
Abstract
In this paper, we use constraint programming to efficiently mine a diverse set of closed patterns.
Diversity is controlled through a threshold on the Jaccard similarity measure. We show
that the Jaccard measure has no monotonicity property, which prevents usual pruning techniques
and makes classical pattern mining unworkable. This is why we propose antimonotonic
lower and upper bound relaxations, which allow effective pruning, with an efficient branching
rule, boosting the whole search process. Finally, we show how to integrate our relaxation to
sample diverse set of patterns. We demonstrate experimentally that our approach CLOSEDDIVERSITY
significantly reduces the number of patterns and is very efficient in terms of running
times, particularly on dense datasets. For the pattern sampling task, we show that SDIVJAX-2
constitutes a better compromise between running time and redundancy reduction.