Prédiction de défauts dans les arbres du parc végétal Grenoblois et préconisations pour les futures plantations
Abstract
We describe in this paper our response to the EGC Challenge 2017. Exploratory data analysis has
first lead to understand the distribution of variables and detect strong correlations. We then defined two
new variables combining dataset variables. Several classification algorithms have been experimented for
the first task of the challenge. Performances have been evaluated by 10-fold cross validation. It has
resulted in selecting the best unilabel and multilabel classifiers. On both unilabel and multilabel levels,
the best classifier outperforms the reference scores by approximately 2%. We also explored the second
task of the challenge. On one hand, association rules have been searched. On the other hand, the initial
dataset has been enriched with domain knowledge such as climate data (rainfall, temperature, wind) or
taxonomic data in the field of botany. Furthermore, geographical and cartographic data have been used
in a visualisation tool for representing trees.