Évaluation des propriétés multilingues d'un embedding contextualisé
Abstract
Deep learning models like BERT, a stack of attention layers with an unsupervised pretraining
on large corpora, have become the norm in NLP. mBERT, a multilingual version of BERT,
is capable of learning a task in one language and of generalizing it to another. This generalization
ability opens the perspective of having efficient models in languages with few annotated
data, but remains still largely unexplained. We propose a new method based on in-context
translated words rather than translated Sentences in order to analyze the similarity between
contextualized representations across languages. We show that the representations learned by
mBERT are closer for deep layers, outperforming other representations that were specifically
trained to be aligned.