RNTI

MODULAD
Etude approfondie des représentations de données textuelles dans l'apprentissage non supervisé
In EGC 2023, vol. RNTI-E-39, pp.361-368
Abstract
Dense text representations are gaining great interest in several supervised tasks but much less is known about how suitable they are when dealing with an unlabeled dataset. In this paper, we investigate the use of such representations in unsupervised tasks: document clustering and visualization. For that, we propose the use of a tandem approach based un UMAP, showing that we can do better than the fine-tuning approaches usually proposed in the literature.