Apprentissage multimodal basé sur des modèles d'attention pour la classification de documents dans un contexte déséquilibré
Abstract
The corporate documents classification process may rely on the use of image analysis approach
considered separately of textual features.
The recent state of art deep learning methods propose to combine those two within a multimodal
approach. In addition, corporate documents classification processes offer a particular
challenge for deep learning based systems with an imbalanced corpus. This paper presents
an evaluation of several state of the art methods designed for document classification task
using the textual content, the visual content and some multi-modal approaches. We complete
this evaluation with our own method, a multi-modal network with an attention model .This
combination offers a performance gain of 1% for our private database and 3% for the public
RVL-CDIP database.