Construction automatique d'un graphe de connaissances géo-historiques à partir de textes encyclopédiques anciens
Abstract
Ancient encyclopedias, such as Diderot and d'Alembert's Encyclopédie (1751–1772), are
invaluable resources for studying the evolution of geographical knowledge, but their scale
hinders manual analysis. This paper presents an automated method for building a geo-historical
knowledge graph from these texts. We design spatial and provenance ontologies tailored to the
corpus and introduce a gold-standard dataset of 2,750 geographical articles. The proposed
pipeline combines supervised learning and large language models for article classification,
entity typing, and spatial relation extraction. Our approach achieves strong results (F1 = 92%
for relation classification, F1 > 97% for article classification), producing an RDF graph of over
35,000 entities and 46,000 spatial relations. All datasets, models, and code are available on
HuggingFace1 and Gitlab2.