Khiops: apprentissage automatique sans hyperparamètre
Marc Boullé,
Nicolas Voisine,
Bruno Guerraz,
Carine Hue,
Felipe Olmos,
Vladimir Popescu,
Stéphane Gouache,
Stéphane Bouget,
Alexis Bondu,
Luc Aurelien Gauthier,
Yassine Nair Benrekia,
Fabrice Clérot,
Vincent Lemaire Abstract
Khiops is an open source machine learning tool designed for mining large multi-table
databases. Khiops is based on a unique Bayesian approach that has attracted academic interest
with more than 20 publications on topics such as variable selection, classification, decision
trees and co-clustering. It provides a predictive measure of variable importance using discretisation
models for numerical data and value clustering for categorical data. The proposed
classification/regression model is a naive Bayesian classifier incorporating variable selection
and weight learning. In the case of multi-table databases, it provides propositionalisation by
automatically constructing aggregates. Khiops is adapted to the analysis of large databases
with millions of individuals, tens of thousands of variables and hundreds of millions of records
in secondary tables. It is available on many environments, both from a Python library and via
a user interface.