RNTI

MODULAD
Systèmes de métadonnées dans les lacs de données : modélisation et fonctionnalités
In EDA 2019, vol. RNTI-B-15, pp.77-92
Abstract
Over the past decade, the data lake concept has emerged as an alternative to data warehouses for storing and analyzing big data. A data lake allows storing data without any predefined schema. Therefore, data querying and analysis depends on a metadata system that must be efficient and comprehensive. However, metadata management in data lakes remains a current issue and the criteria for evaluating its effectiveness are more or less inexistent. In this article, we propose MEDAL, a generic model for metadata management in data lakes. We adopt a graph-based model for MEDAL. We also propose evaluation criteria for data lake metadata systems through a list of expected features. Eventually, we show that our approach is more comprehensive than existing metadata systems.