Étiquetage thématique automatisé de corpus par représentation sémantique
Abstract
In scientific text corpus, some articles from different research communities are not tagged
by the same keywords even if they share the same topic. This causes issues in information
retrieval systems using limited number of tag variations and thus, lower chances of interdisciplinary
exploration. Our approach automatically assigns a topic tag to articles by learning
a classifier for each topic based on the semantics representation of the title and the abstract
of already tagged articles. The approach requires much less computation power than using
topic modeling on millions of documents. In our proposed model, we use topic sysnomyns to
retrieve more semantically similar articles and merge them to the articles obtained by the topic
classifier. The experiments show higher recall against two variations of the model, one only
uses the synonyms set, and another one only uses the semantic representation of the text.