RNTI

MODULAD
Analyse des données textuelles : Une approche d'extraction de contenu sémantique et un opérateur d'agrégation Top_KRankedTopics
In EDA 2016, vol. RNTI-B-12, pp.51-64
Abstract
The consideration of textual data semantic in OLAP analysis is a complex task, which is not supported by traditional business intelegence systems. To address this problem, we propose a new approach for semantic descriptors extraction of textual data for analysis purpuses. The proposed approach is based on the use of Latent Dirichelet allocation method (LDA) and Open Directory Project (ODP) taxonomy as an external source of knowledge, to identify relevant topics in text documents. Our approach is to build for each text document a semantic hierarchy based on ODP concepts. To make this semantic hierarchy usfull in an OLAP analysis; we propose a weighting function and an aggregation operator that selects the first k subject and returns for each