TOM: A library for topic modeling and browsing
Résumé
In this paper, we present TOM (TOpic Modeling), a Python library
for topic modeling and browsing. Its objective is to allow for an efficient analysis
of a text corpus from start to finish, via the discovery of latent topics. To this
end, TOM features advanced functions for preparing and vectorizing a text corpus.
It also offers a unified interface for two topic models (namely LDA using
either variational inference or Gibbs sampling, and NMF using alternating leastsquare
with a projected gradient method), and implements three state-of-the-art
methods for estimating the optimal number of topics to model a corpus. What is
more, TOM constructs an interactive Web-based browser that makes exploring
a topic model and the related corpus easy.