Multiple Linear Regression for Histogram Data using Least Squares of Quantile Functions: a Two-components model.
Abstract
Histograms are commonly used for representing summaries of observed
data and they can be considered non parametric estimates of probability
distributions. Symbolic Data Analysis formalized the concept of histogram
symbolic variable, as a variable which allows to describe statistical units by histograms
instead of single values. In this paper we present a linear regression
model for multivariate histogram variables. We use a Least Square estimation
method where the sum of squared errors is defined according to the `2 Wasserstein
metric between the observed and the predicted histogram data. Consistently
with the l2 Wasserstein metric, we solve the Least Square computational problem
by introducing a suitable inner product between two vectors of histogram
data. Finally, measures of goodness of fit are discussed and an application on
real data shows some interpretative advantages of the proposed method.