Extraction d'un modèle articulatoire à partir d'une analyse tri-directionnelle de cinéradiographies d'un locuteur
Abstract
For several reasons it is difficult to analyze the sequences of radiographs of a person talking.
The first is technical: these data are images annotated in several places, times, in a semiautomatic
or manual way. The second is representational: the movements of the articulators
during speech (tongue, jaw, etc.) are complex to describe because of multiple mechanical and
dynamic interdependencies. When speaking, a speaker sets in motion a complex set of articulators:
the jaw which opens more or less, the tongue which takes many shapes and positions,
the lips that allow him to leave the air escaping more or less abruptly, etc.. The best-known articulary
model is the one of Maeda (1990), derived from Principal Component Analysis made
on arrays of coordinates of points of the articulators of a speaker talking. We propose a 3-way
analysis of the same data type, after converting tables into distances. We validate our model
by predicting spoken sounds, which prediction proved almost as good as the acoustic model,
and even better when coarticulation is taken into account.