Détection d'entités quasi-dupliquées dans une base de connaissances avec PIKA
Abstract
This paper explores the use of Graph Neural Network models producing node embeddings,
in order to solve the not fully addressed problem of detecting similar items stored in a knowledge
base. Leveraging pre-trained models for textual semantic similarity, our proposed method
PIKA aggregates heterogeneous (structured and unstructured) characteristics of an entity and
its neighborhood to produce an embedding vector that can be used in different tasks such as
information retrieval or classification tasks. Our method learns specific weights for each information
brought by an entity, enabling us to process it in an inductive fashion.