Optimisation des performances dans les entrepôts de données NoSQL en colonnes
Abstract
NoSQL Column Oriented model offer a flexible and highly non-normalized database
schema. In this paper, we propose a method that transforms a relational data warehouse
to a NoSQL one with distributed columns in a multi-node cluster. Our method
is based on a strategy of grouping attributes from fact tables and dimensions, as families
´ columns. In this purpose, we used two algorithms, the first one is a meta-heuristic
algorithm, in this case the Particle Swarm Optimization : PSO, and the second one
is the k-means algorithm. To evaluate our method, we use TCP-DS benchmark. We
conducted several tests to evaluate these algorithms in the generation of families of
columns and data partitions in the NoSQL Column Oriented Hbase DBMS, with a
MapReduce paradigm and Hadoop distributed system.