RNTI

MODULAD
MTEB-FR: une expérience à large échelle pour l'apprentissage de représentation en français
In EGC 2025, vol. RNTI-E-41, pp.219-230
Abstract
Thousands of textual embedding models are available today. We introduce the first Massive Textual Embedding Benchmark for French so that models can be compared easily. We gather data and create new ones for a global evaluation on 27 datasets associated with 8 NLP tasks. We compare 51 carefully selected models to find the ones that dominate and conclude on the characteristics that make them competitive. Our work comes with an easily usable open source library, and a public leaderboard allowing external contributions.