Découverte de labels dupliqués par l'exploration du treillis des classifieurs binaires
Abstract
Analysis of behavioral data represents today a big issue. Anyone generates activity and
mobility traces. When traces are labeled by the user that generates it, models can be learned
to accurately predict the user of an unknown trace. In online systems however, users may
have several virtual identities, or duplicate labels. By ignoring them, the prediction accuracy
drastically drops. In this article, we tackle this duplicate labels identification problem, and
present an original approach that explores the lattice of binary classifiers. Each subset of labels
is learned against the others, and constraints make possible to identify duplicate labels while
pruning the search space. We experiment with data of the video game STARCRAFT 2. Results
are of good quality and encouraging.