Analyse des données textuelles : Une approche d'extraction de contenu sémantique et un opérateur d'agrégation Top_KRankedTopics
Abstract
The consideration of textual data semantic in OLAP analysis is a complex task, which is
not supported by traditional business intelegence systems. To address this problem, we propose
a new approach for semantic descriptors extraction of textual data for analysis purpuses. The
proposed approach is based on the use of Latent Dirichelet allocation method (LDA) and Open
Directory Project (ODP) taxonomy as an external source of knowledge, to identify relevant
topics in text documents. Our approach is to build for each text document a semantic hierarchy
based on ODP concepts. To make this semantic hierarchy usfull in an OLAP analysis; we
propose a weighting function and an aggregation operator that selects the first k subject and
returns for each