Extraction d'un modèle articulatoire à partir d'une analyse tri-directionnelle de cinéradiographies d'un locuteur

In FDC 2016, vol. RNTI-E-31, pp.73-92

Abstract

For several reasons it is difficult to analyze the sequences of radiographs of a person talking. The first is technical: these data are images annotated in several places, times, in a semiautomatic or manual way. The second is representational: the movements of the articulators during speech (tongue, jaw, etc.) are complex to describe because of multiple mechanical and dynamic interdependencies. When speaking, a speaker sets in motion a complex set of articulators: the jaw which opens more or less, the tongue which takes many shapes and positions, the lips that allow him to leave the air escaping more or less abruptly, etc.. The best-known articulary model is the one of Maeda (1990), derived from Principal Component Analysis made on arrays of coordinates of points of the articulators of a speaker talking. We propose a 3-way analysis of the same data type, after converting tables into distances. We validate our model by predicting spoken sounds, which prediction proved almost as good as the acoustic model, and even better when coarticulation is taken into account.

Preview See bibtex

Download