Généraliser l'adaptation de modèles de langue frugaux pour l'extraction de motifs RDF à partir de texte, à des relations de type Datatype et Object property

Célian Ringwald, Fabien Gandon, Catherine Faron, Franck Michel, Hanna Abi Akl

In EGC 2026, vol. RNTI-E-42, pp.399-406

Abstract

Small Language Models (SLMs) demonstrate strong effectiveness in RDF relation extraction guided by SHACL shapes. This paper, based on our work accepted at K-CAP 2025, investigates their ability to jointly handle both Datatype and Object Property relations. The main challenge lies in extracting rare properties. To address this issue, we explore several strategies, including stratified sampling, loss weighting, data scaling, and pattern-based synthetic data generation. The best results are achieved when each property reaches a minimal occurrence threshold within the training data. To ensure reproducibility, we publicly release all datasets, experimental results, and source code. This work thus provides practical methods for training specialized SLMs and highlights promising directions for semantic relation extraction.

Preview See bibtex

Download