RNTI

MODULAD
Extraction dans des textes anciens d'entités nommées de type binômes de la classification linnéenne du vivant : une étude de cas
In EGC 2023, vol. RNTI-E-39, pp.417-424
Abstract
Linnean binoms (aka. taxons) are rarely studied as a type of named entities, and so is their extraction from archival texts. We introduce the competent reader hypothesis, i.e., the ability to recognize a taxon, even if it is deprecated or ill-composed. This hypothesis is the key to our evaluation process. We compare several approaches for recognizing taxons: dictionary-based, rule-based, and a form of generalization learning. We show that the criteria of looking Latin used alone lacks precision. Finally, we show that a rarity criteria, when combined with the Latin criteria, yields a high quality recognizer with an f-measure of about 70 %.