Génération de RDF à partir de sources de données aux formats hétérogènes
Abstract
Unlike what is promoted by the Web of Data initiative, data published by most organizations
are in non-RDF formats such as CSV, JSON, or XML. Furthermore in the Web of
Things, constrained objects prefer binary formats such as EXI or CBOR over textual RDF
formats. In this context, RDF can still be used as a lingua franca to enable semantic interoperability,
integration of data with heterogeneous formats, reasoning, and querying. Several tools
and formalisms have been designed to transform non-RDF documents to RDF. The most flexible
ones are based on transformation or mapping languages (GRDDL, XSPARQL, R2RML,
RML, CSVW, etc.). This paper defines a new such language, SPARQL-Generate, designed
as an extension of SPARQL 1.1 to generate RDF from a RDF dataset and a set of documents
with arbitrary formats. We show it can be implemented on top of any existing SPARQL 1.1
engine, mention a first implementation on top of Apache Jena, and show that it leverages the
expressivity and the extensibility of SPARQL 1.1 to open the set of supported formats.