Sélection topologique de variables dans un contexte de discrimination

In EGC 2016, vol. RNTI-E-30, pp.123-134

Abstract

In machine learning, the presence of a large number of explanatory variables leads to a high complexity of algorithms and a strong degradation of the performance of prediction models. In this case, a selection of an optimal discriminant subset of these variables is necessary. In this article, a topological approach is proposed for the selection of this optimal subset. It uses the concept of neighborhood graph for ranking variables in order of relevance, then the forward method is applied to construct a series of models among which the best subset is selected based on the degree of topological equivalence in discrimination. For each subset, the degree of equivalence is measured by comparing the adjacency matrix induced by the proximity measure selected to that induced by the "best" discriminant proximity measure called of reference. The performance of this approach is evaluated using simulated and real data. Comparisons of the results of variables selection in discrimination with those of a metric approach show a much better selection using the proposed topological approach.

Preview See bibtex

Download