Analyse comparative de méthodes d'apprentissage pour la catégorisation d'un texte selon sa langue de rédaction
Abstract
The objective of the work is twofold. On the one hand, the aim is to categorize french novels to make it possible for a user to determine whether they are original or translated, that is to say in the original language of the author or not. On the other hand, to compare and optimize the elaborated methods to achieve this goal. Here, the textual data we consider are voluminous and present variety in the themes and styles. The four implemented approaches – taking into account frequency, lexical, syntactic or semantic characteristics – rely on machine learning. The approach comparison considers the representation space as well as the parametrisation of the methods, the recognition rates (by classes or global) or the explainability.