Construction d'un corpus annoté en genre par apprentissage zero-shot.
Abstract
In order to best adapt to new technologies, an association has developed a webchat application allowing anyone to express and share their anxieties. Several thousand anonymous conversations have then been brought together and form an unprecedented corpus of stories about human distress and social violence. We present in this paper a methodology to produce a learning model that allows an automatic gender labeling of a corpus of texts in French. The method is based on a combination of a Zero-Shot classification algorithm, human validation, and supervised learning. This method allows us to effectively pre-annotate a large corpus by presenting some experimental results so that an expert can finally more easily validate the annotation produced.