0000000000667162
AUTHOR
H. Rulot
Application of the Error Correcting Grammatical Inference Method (ECGI) to Multi-Speaker Isolated Word Recognition
It is well known that speech signals constitute highly structured objects which are composed of different kinds of subobjects such as words, phonemes, etc. This fact has motivated several researchers to propose different models which more or less explicitly assume the structural nature of speech. Notable examples of these models are Markov models /Bak 75/, /Jel 76/; the famous Harpy /Low 76/; Scriber and Lafs /Kla 80/; and many others works in which the convenience of some structural model of the speech objects considered is explicitly claimed /Gup 82/, /Lev 83/, /Cra 84/, /Sca 85/, /Kam 85/, /Sau 85/, /Rab 85/, /Kop 85/, /Sch 85/, /Der 86/, /Tan 86/.
On the use of a metric-space search algorithm (AESA) for fast DTW-based recognition of isolated words
The approximating and eliminating search algorithm (AESA) presented was recently introduced for finding nearest neighbors in metric spaces. Although the AESA was originally developed for reducing the time complexity of dynamic time-warping isolated word recognition (DTW-IWR), only rather limited experiments had been previously carried out to check its performance in this task. A set of experiments aimed at filling this gap is reported. The main results show that the important features reflected in previous simulation experiments are also true for real speech samples. With single-speaker dictionaries of up to 200 words, and for most of the different speech parameterizations, local metrics, a…
On the metric properties of dynamic time warping
Recently, some new and promising methods have been proposed to reduce the number of Dynamic Time Warping (DTW) computations in Isolated Word Recognition. For these methods to be properly applicable, the verification of the Triangle Inequality (TI) by the DTW-based Dissimilarity Measure utilized seems to be an important prerequisite.
Learning the structure of HMM's through grammatical inference techniques
A technique is described in which all the components of a hidden Markov model are learnt from training speech data. The structure or topology of the model (i.e. the number of states and the actual transitions) is obtained by means of an error-correcting grammatical inference algorithm (ECGI). This structure is then reduced by using an appropriate state pruning criterion. The statistical parameters that are associated with the obtained topology are estimated from the same training data by means of the standard Baum-Welch algorithm. Experimental results showing the applicability of this technique to speech recognition are presented. >