Conception physique d'un entrepôt de données distribuées basée sur K-means équilibré

Yassine Ramdane, Omar Boussaid, Nadia Kabachi, Fadila Bentayeb

In EGC 2019, vol. RNTI-E-35, pp.177-188

Abstract

Horizontal partitioning has been widely used to optimize query processing in distributed system such as Hadoop and Spark. In distributed data warehouses, the most expensive opera- tion for OLAP queries is star join which requires many MapReduce cycles to perform it. In this paper, we propose new data placement in Hadoop based on K-means balanced algorithm. This schéma allows to perfom star join operation in only one Spark stage. In our technique, we take into account the physical characteristics of the cluster and the volume of data. To evaluate our approach, we conducted some experiments on a cluster of 5 nodes. Where, our approach has improved the execution time of some OLAP queries by 60% over some existing approaches.

Preview See bibtex

Download