Extension de C-SPARQL pour l'échantillonnage de flux de graphes RDF

Amadou Fall Dia, Zakia Kazi Aoul, Aliou Boly

In EGC 2016, vol. RNTI-E-30, pp.159-170

Abstract

SemanticWeb technologies are increasingly adopted for data stream management. Several RDF stream processing systems have been proposed: C-SPARQL, CQELS, SPARQL, EPSPARQL, SPARKWAVE, etc. These all extend the semantic query language SPARQL. Input data are large and continuously generated, thus the storage and processing of the entire stream become expensive and the reasoning is almost impossible. Therefore, the use of technology to reduce the load while keeping the semantics of the data is required to optimize treatments or reasoning. However, none of SPARQL's extensions include this feature. Thus, in this paper, we propose to extend C-SPARQL system to generate samples on the fly on graphs streams while keeping semantics. We add three sampling operators (UNIFORM, RESERVOIR and CHAIN) in C-SPARQL's syntax. These operators have been integrated in Esper, the C-SPARQL's data flow management module. Experiments show the performance of our extension in terms of execution time and preserving data semantics.

Preview See bibtex

Download