Broad Data: What happens when the Web of Data becomes real?
Résumé
“Big Data” is used to refer to the very large datasets generated by scientists, to the many
petabytes of data held by companies like Facebook and Google, and to analyzing real-time data
assets like the stream of twitter messages emerging from events around the world. Key areas
of interest include technologies to manage much larger datasets (cf. NoSQL), technologies for the visualization and analysis of databases, cloud-based data management and datamining
algorithms.
Recently, however, we have begun to see the emergence of another, and equally compelling
data challenge – that of the “Broad data” that emerges from millions and millions of raw
datasets available on the World Wide Web. For broad data the new challenges that emerge includeWeb-
scale data search and discovery, rapid and potentially ad hoc integration of datasets,
visualization and analysis of only-partially modeled datasets, and issues relating to the policies
for data use, reuse and combination. In this talk, we present the broad data challenge and
discuss potential starting points for solutions. We illustrate these approaches using data from
a “meta-catalog” of over 1,000,000 open datasets that have been collected from about two
hundred governments from around the world.