RNTI

MODULAD
Construction automatique d'un graphe de connaissances géo-historiques à partir de textes encyclopédiques anciens
In EGC 2026, vol. RNTI-E-42, pp.25-36
Abstract
Ancient encyclopedias, such as Diderot and d'Alembert's Encyclopédie (1751–1772), are invaluable resources for studying the evolution of geographical knowledge, but their scale hinders manual analysis. This paper presents an automated method for building a geo-historical knowledge graph from these texts. We design spatial and provenance ontologies tailored to the corpus and introduce a gold-standard dataset of 2,750 geographical articles. The proposed pipeline combines supervised learning and large language models for article classification, entity typing, and spatial relation extraction. Our approach achieves strong results (F1 = 92% for relation classification, F1 > 97% for article classification), producing an RDF graph of over 35,000 entities and 46,000 spatial relations. All datasets, models, and code are available on HuggingFace1 and Gitlab2.