CRAFTML, une forêt aléatoire efficace pour l'apprentissage multi-label extrême
Abstract
eXtreme Multi-label Learning (XML) considers large number of instances annotated with a
few labels among hundreds of thousands of possibilities. Tree-based methods, which hierarchically
divide learning into small-scale subproblems, are particularly promising in this context
to reduce the complexities of learning and predictions and to open the way to parallelization.
However, the current best approaches do not exploit tree randomization which has yet shown
its efficiency in random forests and they resort to complex partitioning strategies. To overcome
these limitations, we introduce here a new forest algorithm with diversified trees and a
partitioning strategy adapted to XML called CRAFTML. Experimental comparisons on eight
XML datasets show that our approach is faster and more accurate than the other state-of-the-art
tree-based methods.