Extension de C-SPARQL pour l'échantillonnage de flux de graphes RDF
Abstract
SemanticWeb technologies are increasingly adopted for data stream management. Several
RDF stream processing systems have been proposed: C-SPARQL, CQELS, SPARQL, EPSPARQL,
SPARKWAVE, etc. These all extend the semantic query language SPARQL. Input
data are large and continuously generated, thus the storage and processing of the entire stream
become expensive and the reasoning is almost impossible. Therefore, the use of technology to
reduce the load while keeping the semantics of the data is required to optimize treatments or
reasoning. However, none of SPARQL's extensions include this feature. Thus, in this paper, we
propose to extend C-SPARQL system to generate samples on the fly on graphs streams while
keeping semantics. We add three sampling operators (UNIFORM, RESERVOIR and CHAIN)
in C-SPARQL's syntax. These operators have been integrated in Esper, the C-SPARQL's data
flow management module. Experiments show the performance of our extension in terms of
execution time and preserving data semantics.