RNTI

MODULAD
Processus de Dirichlet profonds pour le topic modeling
In EGC 2022, vol. RNTI-E-38, pp.355-362
Abstract
This paper presents two novel models: the neural Embedded Dirichlet Process and the neural Embedded Hierarchical Dirichlet Process. Both methods extend the Embedded Topic Model (ETM) to nonparametric settings, thus simultaneously learning the number of topics, latent representations of documents, and topic and word embeddings from data. To achieve this, we replace ETM's logistic normal prior with Dirichlet Processes in a variational autoencoding inference setting. Our tests on the 20 Newsgroups and on the Humanitarian Assistance and Disaster Relief datasets show that our models present the advantage of maintaining low perplexity while providing meaningful representations that outperform that of state of the art methods. We obtained our results without having to perform costly reruns to find the number of topics nor having to sacrifice a Dirichlet-like prior.