RNTI

MODULAD
Évaluation des propriétés multilingues d'un embedding contextualisé
In EGC 2022, vol. RNTI-E-38, pp.249-256
Abstract
Deep learning models like BERT, a stack of attention layers with an unsupervised pretraining on large corpora, have become the norm in NLP. mBERT, a multilingual version of BERT, is capable of learning a task in one language and of generalizing it to another. This generalization ability opens the perspective of having efficient models in languages with few annotated data, but remains still largely unexplained. We propose a new method based on in-context translated words rather than translated Sentences in order to analyze the similarity between contextualized representations across languages. We show that the representations learned by mBERT are closer for deep layers, outperforming other representations that were specifically trained to be aligned.