RNTI

MODULAD
Multiple Linear Regression for Histogram Data using Least Squares of Quantile Functions: a Two-components model.
In HDSDA 2013, vol. RNTI-E-25, pp.78-93
Résumé
Histograms are commonly used for representing summaries of observed data and they can be considered non parametric estimates of probability distributions. Symbolic Data Analysis formalized the concept of histogram symbolic variable, as a variable which allows to describe statistical units by histograms instead of single values. In this paper we present a linear regression model for multivariate histogram variables. We use a Least Square estimation method where the sum of squared errors is defined according to the `2 Wasserstein metric between the observed and the predicted histogram data. Consistently with the l2 Wasserstein metric, we solve the Least Square computational problem by introducing a suitable inner product between two vectors of histogram data. Finally, measures of goodness of fit are discussed and an application on real data shows some interpretative advantages of the proposed method.