MTEB-FR: une expérience à large échelle pour l'apprentissage de représentation en français
Abstract
Thousands of textual embedding models are available today. We introduce the first Massive
Textual Embedding Benchmark for French so that models can be compared easily. We gather
data and create new ones for a global evaluation on 27 datasets associated with 8 NLP tasks.
We compare 51 carefully selected models to find the ones that dominate and conclude on the
characteristics that make them competitive. Our work comes with an easily usable open source
library, and a public leaderboard allowing external contributions.