RNTI

MODULAD
Étiquetage thématique automatisé de corpus par représentation sémantique
In EGC 2018, vol. RNTI-E-34, pp.323-328
Abstract
In scientific text corpus, some articles from different research communities are not tagged by the same keywords even if they share the same topic. This causes issues in information retrieval systems using limited number of tag variations and thus, lower chances of interdisciplinary exploration. Our approach automatically assigns a topic tag to articles by learning a classifier for each topic based on the semantics representation of the title and the abstract of already tagged articles. The approach requires much less computation power than using topic modeling on millions of documents. In our proposed model, we use topic sysnomyns to retrieve more semantically similar articles and merge them to the articles obtained by the topic classifier. The experiments show higher recall against two variations of the model, one only uses the synonyms set, and another one only uses the semantic representation of the text.