RNTI

MODULAD
Apprentissage multimodal basé sur des modèles d'attention pour la classification de documents dans un contexte déséquilibré
In EGC 2021, vol. RNTI-E-37, pp.357-364
Abstract
The corporate documents classification process may rely on the use of image analysis approach considered separately of textual features. The recent state of art deep learning methods propose to combine those two within a multimodal approach. In addition, corporate documents classification processes offer a particular challenge for deep learning based systems with an imbalanced corpus. This paper presents an evaluation of several state of the art methods designed for document classification task using the textual content, the visual content and some multi-modal approaches. We complete this evaluation with our own method, a multi-modal network with an attention model .This combination offers a performance gain of 1% for our private database and 3% for the public RVL-CDIP database.