Sélection de variables secondaires de données multi-tables pour la classification

Nicolas Voisine, Lou-Anne Quellet, Marc Boullé, Fabrice Clérot, Anais Collin

In EGC 2025, vol. RNTI-E-41, pp.195-206

Abstract

This article discusses the significance of multi-table data analysis in organizations for applications like fraud detection, service improvement, and customer relations. To utilize this data, it must be flattened into a single table by creating new variables from the original ones. While propositionalization tools can automate this process, the complexity of the data can hinder efficiency. The aim of this paper is to propose a secondary feature selection method and to demonstrate that this method can sort and filter out uninformative variables using a univariate approach. Finally, we will show on a set of academic databases that by reducing the number of secondary variables to only those that are informative, the quality of the classification can be improved.

Preview See bibtex

Download