Équilibrage de Distribution de Données d'une Base en Mémoire Parallèle Partitionnées par Intervalle
Abstract
Due to the availability of larger main memory capacities, we are witnessing the presence
of parallel memory-based database systems offering increased performance unlike traditional databases. Data partitioning is a precondition for these databases, because it significantly improves the performance of certain queries such as range queries. Partitioning usually causes the Data Skew problem that may contribute to the degradation of query performance. Ganesan et al. (2004) group has proposed an algorithm that offers a low ratio between the maximum and minimum loads among the nodes that offers a low ratio between the maximum and minimum loads of nodes, while exploiting information of the global load. This information is stored in a data structure, called skip graphs, which requires the exchange of (log p) messages between the p nodes of the parallel machine during the balancing process. The goal of our work is to reduce the number of these messages. To do so, we propose a vector of approximate partition statistics (VSP), in which the nodes and clients have a rough view of the data distribution. The maintenance cost of this vector is almost zero. Our approach is validated on a cluster of 8 nodes and 2 clients.