Requêtes discriminantes pour l'exploration des données
Abstract
In the Big Data era, it is essential to explore data to unearth new knowledge. As user
profiles become increasingly diverse and data ever more complex, it has become progressively
hard to explore data. Analysts can access gigantic scientific data through SQL. In this paper, we
propose a rewriting technique to help them formulate queries, to rapidly and intuitively explore
big data. We introduce discriminatory queries, a syntactic restriction of SQL, with a selection
condition dissociating positive and negative examples. We construct a learning dataset whose
positive examples correspond to the results desired by analysts, and negative examples to those
they do not want. We reformulate the initial query using machine learning techniques, and
obtain a new query, more efficient and diverse. We propose measures to evaluate the rewriting
quality. To support our approach, we developed the iSQL prototype on top of a commercial
DBMS and conducted experiments with astrophysicists.