Une approche combinée pour l'enrichissement d'ontologie à partir de textes et de données du LOD
Abstract
This paper proposes an approach to automatically label documents describing products,
with very specific concepts reflecting specific users' needs. The peculiarity of the approach is
that it confronts a triple challenge: 1) the concepts used for labeling have no direct terminology
in the documents, 2) their formal definitions are not initially known, 3) all the necessary
information is not necessarily mentioned in the documents. To solve this problem, we propose
an annotation process in two steps, guided by an ontology. The first step is to populate
the ontology with information extracted from documents, completed by others from external
resources. The second one is a reasoning step on the extracted data covering either a learning
phase of concept definitions, or a phase of application of learned definitions. Thus, the
SAUPODOC approach is a novel approach of ontology enrichment exploiting the foundations
of the Semantic Web, by combining the contributions of the LOD and text analytics, machine
learning and reasoning tools. The evaluation, on two domains of application, provides quality
results and demonstrates the interest of the approach.