Généraliser l'adaptation de modèles de langue frugaux pour l'extraction de motifs RDF à partir de texte, à des relations de type Datatype et Object property
Abstract
Small Language Models (SLMs) demonstrate strong effectiveness in RDF relation extraction
guided by SHACL shapes. This paper, based on our work accepted at K-CAP 2025, investigates
their ability to jointly handle both Datatype and Object Property relations. The main
challenge lies in extracting rare properties. To address this issue, we explore several strategies,
including stratified sampling, loss weighting, data scaling, and pattern-based synthetic data
generation. The best results are achieved when each property reaches a minimal occurrence
threshold within the training data. To ensure reproducibility, we publicly release all datasets,
experimental results, and source code. This work thus provides practical methods for training
specialized SLMs and highlights promising directions for semantic relation extraction.