Un modèle Bayésien de co-clustering de données mixtes
Abstract
We propose a MAP Bayesian approach to perform and evaluate a co-clustering of mixedtype
data tables. The proposed model infers an optimal segmentation of all variables then
performs a co-clustering by minimizing a Bayesian model selection cost function. One advantage
of this approach is that it is user parameter-free. Another main advantage is the proposed
criterion which gives an exact measure of the model quality, measured by probability of fitting
it to the data. Continuous optimization of this criterion ensures finding better and better models
while avoiding data over-fitting. The experiments conducted on real data show the interest of
this co-clustering approach in exploratory data analysis of large data sets.