Fairness-Aware Data Mining
Résumé
In data mining we often have to learn from biased data, because, for instance, data comes
from different batches or there was a gender or racial bias in the collection of social data. In
some applications it may be necessary to explicitly control this bias in the models we learn from
the data. Recently this topic received considerable interest both in the research community as
well as more general, as witnessed by several recent articles in popular news media such as
the New York Times. In this talk I will introduce and motivate research in fairness-aware data
mining. Different techniques in unsupervised and supervised data mining will be discussed,
dividing these techniques into three categories: algorithms of the first category adapt the input
data in such a way to remove harmful biases while the second adapts the learning algorithms
and the third category modifies the output models in such a way that its predictions become
unbiased. Furthermore different ways to quantify unfairness, and indirect and conditional
discrimination will be discussed, each with their own pros and cons. With this talk I hope to
convincingly argument the validity and necessity of this often contested research area.