Enumération Randomisée des Triangles dans des Graphes à Grande Echelle à base de SQL
Abstract
Finding all the triangles in a given graph is a classic theoretical problem with a large number of practical applications. Several algorithms for enumerating triangles in large graphs have
been proposed and implemented in centralized and distributed environments with languages
like C or Python. In terms of databases, there is a large volume of data that can be modeled
and analyzed in the form of graphs. Offering solutions for enumerating triangles with SQL
queries for these graphs represents an important challenge for the database community. In
this article, we propose a distributed randomized algorithm implemented using SQL queries
to enumerate triangles in large graphs. This algorithm ensures load balancing between processors thanks to its partitioning strategy aiming to eliminate costly data exchanges between hosts
when enumerating triangles. We have experimentally shown the scalability of this approach
deployed on 8 machines cluster hosting Vertica DBMS.