RNTI

MODULAD
Relations complexes au sein du système Hive
In EDA 2019, vol. RNTI-B-15, pp.1-14
Abstract
In this paper, we raise the question how data architects model their data for processing in Hive. This well-known SQL-on-Hadoop engine allows for complex value relations, where attribute types need not be atomic. In an empirical study, we analyze Hive schemas in open source repositories. We examine to which extent practitioners make use of complex value relations and accordingly, whether they write queries over complex types. Understanding which features are actively used will help us make the right decisions in setting up benchmarks for SQL-on-Hadoop engines, as well as in choosing which query operators to optimize for.