Une méthode de classification ascendante hiérarchique par compromis : hclustcompro
Abstract
Semi-supervised learning methods allow to use a priori knowledge to guide the
classification algorithm in group discovery. In this work, we propose a new hierarchical
agglomerative clustering algorithm (HAC) that takes into account two sources
of information associated with the same objects. This method, called compromise
HAC (hclustcompro), allows a compromise between the hierarchies obtained from each
source taken separately. A convex combination of the dissimilarities associated with
each of the sources is used to modify the dissimilarity measure in the classical HAC
algorithm. The choice of the mixing parameter is the key point of the method. We
propose an objective function to be minimized based on the absolute difference of correlations
between initial dissimilarities and cophenetics distances, as well as a resampling
procedure to ensure the robustness of the choice of the mixing parameter. We illustrate
our method with archaeological data from the Angkor site in Cambodia.