Un système collectif d'utilisation d'un grand ensemble de classifieurs sur le Cloud pour la classification de Big Data
Abstract
Considering the growing volumes of data (Big Data) and the associated issues (velocity,
variety and veracity), we propose, in this paper, the design of a new collective system of massive
use of set of classifiers for Big Data through the Cloud. We combine the advantages of
labeling by consensus between multiple result decisions distributed on the Cloud with the use
of the Map/Reduce paradigm for the learning of the models by each of the classifiers. For
this, we consider a classifier network deployed through the Cloud. Using mappers, we divide
the training data on different nodes (classifiers) while Reducers launch the learning phase and
returns the performance index and the model of the classifier. Then, for each datum in input,
whatever the network node on which it arrives, the node labels the datum and asks neighbors
to do the same. Thus, they form an ensemble of classifiers. Finally, using a weighted majority
vote, the questioned node returns the final decision. Larger the neighborhood is, better the
quality of results is. However, this extension must be limited because otherwise the time of
treatment is not consistent with Big Data.