Topic modeling and hypergraph mining to analyze the EGC conference history
Abstract
Each year the EGC conference gathers researchers and practitioners
from the knowledge discovery and management domain to present their latest
advances. This year's edition features an open challenge that encourages participants
to leverage the EGC rich anthology which spans from 2004 to 2015. The
ultimate goal is to highlight the dynamics of the conference history and to try to
get a glimpse of the coming years. In this context, we first describe our methodology
for inferring latent topics that pervade this corpus using non-negative matrix
factorization. Based on the discovered topics and other properties of the
articles (e.g., authors, affiliations) we shed light on interesting facts on both the
topical and collaborative structures of the EGC society. Secondly, we employ
a hypergraph itemset extraction process to discover existent but latent relations
between authors or between topics. We also propose topic-author and authorauthor
recommendations with a content-based approach. Lastly, we describe a
Web interface for browsing this collection of articles complemented with the
discovered knowledge