SGIA : Stratégie Intelligente de Groupement pour Améliorer le Traitement des Requêtes OLAP en MapReduce
Abstract
Enhancing OLAP query performance in a distributed system such as Hadoop and Spark
is a challenging task. An OLAP query is composed of several operations, such as projection,
filtering, join, and grouping operations. Each operation can be executed in the map or in the
reduce phase with one or several Spark stages. While some operations, such as star join and
filtering, can be enhanced by using a static partitioning technique and load balancing for the
data since we have the prior knowledge of the load balancing decision. However, optimizing SGIA : Stratégie Intelligente de Groupement
Group By and aggregate functions, requires in general, a dynamic technique of partitioning
and distributing to make a good partition scheme of the reducer inputs since we can only
pick up the relevant information at query runtime. In this paper, we propose a smart method,
called SGIA, to balance on the fly the reducer inputs. We used a multi-agent system that can
balance smartly the reducer loads for Group By task. Our experiments reveal that our proposal
outperforms existing approaches in terms of query execution time.