Relations complexes au sein du système Hive
Abstract
In this paper, we raise the question how data architects model their data for processing
in Hive. This well-known SQL-on-Hadoop engine allows for complex value relations, where
attribute types need not be atomic. In an empirical study, we analyze Hive schemas in open
source repositories. We examine to which extent practitioners make use of complex value relations and accordingly, whether they write queries over complex types. Understanding which
features are actively used will help us make the right decisions in setting up benchmarks for
SQL-on-Hadoop engines, as well as in choosing which query operators to optimize for.