BERTEPro : Une nouvelle approche de représentation sémantique dans le domaine de l'éducation et de la formation professionnelle
Abstract
FlauBERT and CamemBERT have established a new state-of-the-art performance for the understanding. Recently, SBERT has transformed the use of the pre-trained BERT network, to reduce the computational effort of sentence embeddings, while maintaining BERT's accuracy. However, these models were trained on non-specific texts of the French language, which therefore do not allow a fine representation of texts from specific domains, such as education and professional training. In this paper, we present BERTEPro, a language model based on FlauBERT, whose training has been extended on texts of the specific domain of education and professional training, before being fine-tuned on NLI and STS tasks. The performance evaluation of BERTEPro on common and specific STS tasks, as well as on classification tasks on textual data from the domain of education and professional training, confirmed that the proposed methodology enjoys significant advantages over other state-of-the-art methods.