Sélection topologique de variables dans un contexte de discrimination
Abstract
In machine learning, the presence of a large number of explanatory variables leads to a high
complexity of algorithms and a strong degradation of the performance of prediction models. In
this case, a selection of an optimal discriminant subset of these variables is necessary. In this
article, a topological approach is proposed for the selection of this optimal subset. It uses the
concept of neighborhood graph for ranking variables in order of relevance, then the forward
method is applied to construct a series of models among which the best subset is selected
based on the degree of topological equivalence in discrimination. For each subset, the degree of
equivalence is measured by comparing the adjacency matrix induced by the proximity measure
selected to that induced by the "best" discriminant proximity measure called of reference. The
performance of this approach is evaluated using simulated and real data. Comparisons of the
results of variables selection in discrimination with those of a metric approach show a much
better selection using the proposed topological approach.