Clustering par apprentissage de distance guidé par des préférences sur les attributs
Abstract
In recent years many semi-supervised clustering methods have integrated constraints between
pairs of objects or class of labels, so that the final partition is consistent with the needs
of the user. However in some cases where the dimensions of studies are clearly defined, it
seems appropriate to directly express constraints on the attributes to explore the data. Furthermore,
such formulation would avoid the classic problems of the curse of dimensionality and
the interpretation of the clusters. This article proposes to take into account the preferences of
the user on the attributes to guide the learning of the distance for clustering. Specifically, we
show how to parameterize the Euclidean distance with a diagonal matrix whose coefficients
must be closest to the weight set by the user. This approach builds a compromise clustering
between a data-driven and a user-driven solution. We observe experimentally that the addition
of preferences may be essential to achieve a better clustering.