RNTI

MODULAD
Extension et adaptation des modèles de langues pour la classification de corpus en santé animale
In EGC 2023, vol. RNTI-E-39, pp.531-538
Abstract
e present EpidBioBERT, an epidemiological biomonitoring document classifier. Our model, trained on a corpus containing news articles on animal disease outbreaks, aims to distinguish relevant and irrelevant documents for an information extraction task. We adopt a pre-trained biomedical language model with a fine-tuning approach, focusing on the epidemi-ological thematic features, namely disease, host, location and date. We experiment with the impact of each feature on the classifier in ablation studies. We also compare our pre-trained biomedical approach with a general language model.