Un système collectif d'utilisation d'un grand ensemble de classifieurs sur le Cloud pour la classification de Big Data

Rabah Mazouzi, Cyril de Runz, Herman Akdag

In FDC 2016, vol. RNTI-E-31, pp.1-14

Abstract

Considering the growing volumes of data (Big Data) and the associated issues (velocity, variety and veracity), we propose, in this paper, the design of a new collective system of massive use of set of classifiers for Big Data through the Cloud. We combine the advantages of labeling by consensus between multiple result decisions distributed on the Cloud with the use of the Map/Reduce paradigm for the learning of the models by each of the classifiers. For this, we consider a classifier network deployed through the Cloud. Using mappers, we divide the training data on different nodes (classifiers) while Reducers launch the learning phase and returns the performance index and the model of the classifier. Then, for each datum in input, whatever the network node on which it arrives, the node labels the datum and asks neighbors to do the same. Thus, they form an ensemble of classifiers. Finally, using a weighted majority vote, the questioned node returns the final decision. Larger the neighborhood is, better the quality of results is. However, this extension must be limited because otherwise the time of treatment is not consistent with Big Data.

Preview See bibtex

Download