Génération de RDF à partir de sources de données aux formats hétérogènes
In EGC 2017, vol. RNTI-E-33, pp.93-104
Unlike what is promoted by the Web of Data initiative, data published by most organizations are in non-RDF formats such as CSV, JSON, or XML. Furthermore in the Web of Things, constrained objects prefer binary formats such as EXI or CBOR over textual RDF formats. In this context, RDF can still be used as a lingua franca to enable semantic interoperability, integration of data with heterogeneous formats, reasoning, and querying. Several tools and formalisms have been designed to transform non-RDF documents to RDF. The most flexible ones are based on transformation or mapping languages (GRDDL, XSPARQL, R2RML, RML, CSVW, etc.). This paper defines a new such language, SPARQL-Generate, designed as an extension of SPARQL 1.1 to generate RDF from a RDF dataset and a set of documents with arbitrary formats. We show it can be implemented on top of any existing SPARQL 1.1 engine, mention a first implementation on top of Apache Jena, and show that it leverages the expressivity and the extensibility of SPARQL 1.1 to open the set of supported formats.