RNTI

MODULAD
Optimisation d'architecture de lacs de données basée sur les chaînes d'approvisionnement
In EGC 2021, vol. RNTI-E-37, pp.277-284
Abstract
Data lakes have recently emerged as a new generation of data repository. Data lake architecture design, which has significant impacts on data lake performance and data quality, is an active topic. In this paper, we study a joint “location-allocation" problem which is used in supply chain network design for improving data lake architecture and performance. we propose a mathematical model applied to a MapReduce environment, based on an analogy between data lakes and supply chain. We solve this model with a greedy algorithm and determine the optimal numbers of MapReduce jobs that should be run in such a data lake to optimize the performance.