Sélection de variables secondaires de données multi-tables pour la classification
Abstract
This article discusses the significance of multi-table data analysis in organizations for applications
like fraud detection, service improvement, and customer relations. To utilize this
data, it must be flattened into a single table by creating new variables from the original ones.
While propositionalization tools can automate this process, the complexity of the data can hinder
efficiency. The aim of this paper is to propose a secondary feature selection method and to
demonstrate that this method can sort and filter out uninformative variables using a univariate
approach. Finally, we will show on a set of academic databases that by reducing the number
of secondary variables to only those that are informative, the quality of the classification can
be improved.