MTEB-FR: une expérience à large échelle pour l'apprentissage de représentation en français

Wissam Siblini, Mathieu Ciancone, Imene Kerboua, Marion Schaeffer

In EGC 2025, vol. RNTI-E-41, pp.219-230

Abstract

Thousands of textual embedding models are available today. We introduce the first Massive Textual Embedding Benchmark for French so that models can be compared easily. We gather data and create new ones for a global evaluation on 27 datasets associated with 8 NLP tasks. We compare 51 carefully selected models to find the ones that dominate and conclude on the characteristics that make them competitive. Our work comes with an easily usable open source library, and a public leaderboard allowing external contributions.

Preview See bibtex

Download