Enumération Randomisée des Triangles dans des Graphes à Grande Echelle à base de SQL
In EDA 2020, vol. RNTI-B-16, pp.33-46
Finding all the triangles in a given graph is a classic theoretical problem with a large number of practical applications. Several algorithms for enumerating triangles in large graphs have been proposed and implemented in centralized and distributed environments with languages like C or Python. In terms of databases, there is a large volume of data that can be modeled and analyzed in the form of graphs. Offering solutions for enumerating triangles with SQL queries for these graphs represents an important challenge for the database community. In this article, we propose a distributed randomized algorithm implemented using SQL queries to enumerate triangles in large graphs. This algorithm ensures load balancing between processors thanks to its partitioning strategy aiming to eliminate costly data exchanges between hosts when enumerating triangles. We have experimentally shown the scalability of this approach deployed on 8 machines cluster hosting Vertica DBMS.