Classification de Données Complexes par Globalisation de Mesures de Similarité via les Moyennes Quasi-Arithmétiques
Abstract
Most clustering methods have been designed for specific data types i.e. numerical, textual,
categorical, functional, probabilistic or graph. However, datasets generated in our daily life are made of mixed data. Let's consider the health domain, in particular for cardiac disease prevention. The apps developed in this domain will combine data from sensors with many data types like the age of the patient, the effort level, the maximum cardiac frequency, histograms of average cardiac frequency, etc. For summarizing all these data, it would be useful to be able to build clusters on these different data types and to define a global similarity measure from similarities of pairs of objects based on different data types. In this paper, we propose a clustering method based on merging similarity matrices using quasi-arithmetic means, adapted for choosing the different dimensions of data with different types, based on the assumption that a similarity measure exists for each data type.