Vers une Conception des Entrepôts de Données Parallèles Autonomes
In EDA 2019, vol. RNTI-B-15, pp.109-124
Parallel Database Systems have become one of the essential solutions for processing massive data. The effectiveness of these systems depends heavily on the placement of data across all nodes. In today's business intelligence applications deployed on parallel systems involve two types of query loads: queries known in advance and ad-hoc queries. In reviewing the literature, we find a significant number of parallel solution designs to optimize the first type of queries. Some recent work has been proposed to optimize ad-hoc queries using machine learning. In this article, we propose a proactive approach to designing an offline parallel database. The basic idea of our approach is to use static queries as a learning base to handle ad-hoc queries. To carry out our approach, firstly, a method of categorization of queries is proposed, followed by an incremental partitioning technique. Secondly, an allocation algorithm to maximize throughput static queries and maximize reuse of intermediate results for ad-hoc and batch queries is proposed. Finally, the validation of our proposals is presented, and the results obtained are promising.