RNTI

MODULAD
Requêtes discriminantes pour l'exploration des données
In EGC 2016, vol. RNTI-E-30, pp.195-206
Abstract
In the Big Data era, it is essential to explore data to unearth new knowledge. As user profiles become increasingly diverse and data ever more complex, it has become progressively hard to explore data. Analysts can access gigantic scientific data through SQL. In this paper, we propose a rewriting technique to help them formulate queries, to rapidly and intuitively explore big data. We introduce discriminatory queries, a syntactic restriction of SQL, with a selection condition dissociating positive and negative examples. We construct a learning dataset whose positive examples correspond to the results desired by analysts, and negative examples to those they do not want. We reformulate the initial query using machine learning techniques, and obtain a new query, more efficient and diverse. We propose measures to evaluate the rewriting quality. To support our approach, we developed the iSQL prototype on top of a commercial DBMS and conducted experiments with astrophysicists.