Relations complexes au sein du système Hive

Matthieu Pilven, Stefanie Scherzinger, Laurent d’Orazio

In EDA 2019, vol. RNTI-B-15, pp.1-14

Abstract

In this paper, we raise the question how data architects model their data for processing in Hive. This well-known SQL-on-Hadoop engine allows for complex value relations, where attribute types need not be atomic. In an empirical study, we analyze Hive schemas in open source repositories. We examine to which extent practitioners make use of complex value relations and accordingly, whether they write queries over complex types. Understanding which features are actively used will help us make the right decisions in setting up benchmarks for SQL-on-Hadoop engines, as well as in choosing which query operators to optimize for.

Preview See bibtex

Download