Un protocole d'expérimentation sur les propriétés graphémiques avec l'algorithme SOM
Abstract
We present an experimentation line that encompasses various stages for research on graphemes
distribution and unsupervised classification. We aim to help close the gap between recent
research results showing the abilities of unsupervised learning and clustering algorithms
to detect underlying properties of phonemes and the present possibilities of Unicode textual
representation. Our procedures need to ensure repeatability and guarantee that no information
is implicitely present in the preprocessing of data. Our approach is able to categorize potential
graphemes correctly, thus showing that not only phonemic properties are indeed present in textual
data, but that they can be automatically retrieved from raw-unicode text data and translated
into phonemic representations.