Optimisation d'architecture de lacs de données basée sur les chaînes d'approvisionnement

Marzieh Derakhshannia, Anne Laurent,, Dickson Owuor

In EGC 2021, vol. RNTI-E-37, pp.277-284

Abstract

Data lakes have recently emerged as a new generation of data repository. Data lake architecture design, which has significant impacts on data lake performance and data quality, is an active topic. In this paper, we study a joint “location-allocation" problem which is used in supply chain network design for improving data lake architecture and performance. we propose a mathematical model applied to a MapReduce environment, based on an analogy between data lakes and supply chain. We solve this model with a greedy algorithm and determine the optimal numbers of MapReduce jobs that should be run in such a data lake to optimize the performance.

Preview See bibtex

Download