RNTI

MODULAD
Enrichissement de corpus par approche générative et impact sur les modèles de reconnaissance d'entités nommées
In EGC 2024, vol. RNTI-E-40, pp.223-230
Abstract
Industrial applications of Named Entity Recognition (NER) are usually confronted with imbalanced corpora. This could harm the performance of trained models when dealing with unknown data. In this paper we develop two generation-based data enrichment approaches to improve entity distribution. We compare the impact of enriched corpora on NER models, using both non-contextual and contextual embeddings, and a biLSTM-CRF as entity classifier. The approach is evaluated on a contract renewal detection task. The results show that the proposed enrichment significantly improves the model's effectiveness on unkonwn data, while not degrading the performance on the original test set.