Search results for "speech recognition"
showing 10 items of 357 documents
Non-speech voice for sonic interaction: a catalogue
2016
This paper surveys the uses of non-speech voice as an interaction modality within sonic applications. Three main contexts of use have been identified: sound retrieval, sound synthesis and control, and sound design. An overview of different choices and techniques regarding the style of interaction, the selection of vocal features and their mapping to sound features or controls is here displayed. A comprehensive collection of examples instantiates the use of non-speech voice in actual tools for sonic interaction. It is pointed out that while voice-based techniques are already being used proficiently in sound retrieval and sound synthesis, their use in sound design is still at an exploratory p…
ERP qualification exploiting waveform, spectral and time-frequency infomax
2008
The present contribution briefly introduces an event related potential (ERP) detector. The specified detector includes three kinds of features of ERP. They are the ERP waveform feature, ERP spectral feature and ERP time-frequency feature respectively. According to these characteristics, two parameters are defined to reflect the timing feature of ERP. The mismatch negativity (MMN) is taken as the example to design an exact qualification detector. The experiment validates that the computer can automatically detect the raw trace to reflect the quality of the dataset, qualify the filtered trace to test whether the artifacts have been filtered out, and select the ERP-like component to reject art…
Analyse des Visuellen Klassifikationssystems Durch Detektionsexperimente
1977
Summary Experiments on recognizing statistically distorted patterns show that the human visual system operates as a linear classifier. The spatial frequency range, within which features are extracted, is determined by the coupling in the area of sharpest vision (2°). The relevant features for classifying patterns are not produced by isotropic filtering
A Sub-Symbolic Approach to Word Modelling for Domain Specific Speech Recognition
2006
In this work a sub-symbolic technique for automatic, data driven language models construction is presented. Such a technique can be used to arrange a language-modelling module, which can be easily integrated in existing speech recognition architectures, such as the well-found HTK architecture. The proposed technique takes advantages from both the traditional LSA approach and from a novel application of a probability space metric known as "Hellinger's distance". Experimental trials are also presented, in order to validate the proposed approach.
Detection of TV commercials
2004
This paper presents a system that labels TV shots either as commercial or program shots. The system uses two observations: logo presence and shot duration. These observations are modeled using HMMs, and a Viterbi decoder is finally used for shot labeling. The system has been tested on several hours of real video, achieving more than 99% correct labeling.
Spectral tools for Dynamic Tonality and audio morphing
2009
Computer Music Journal Spectral Tools for Dynamic Tonality and Audio Morphing William Sethares sethares@ece.wisc.edu, Department of Electrical and Computer Engineering, University of Wisconsin-Madison, Madison, WI 53706 USA. Andrew Milne andymilne@tonalcentre.org, Department of Music, P.O. Box 35, 40014, University of Jyvaskyla, Finland. Stefan Tiedje Stefan-Tiedje@addcom.de, CCMIX, Paris, France. Anthony Prechtl aprechtl@gmail.com, Department of Music, P.O. Box 35, 40014, University of Jyvaskyla, Finland. James Plamondon jim@thumtronics.com, CEO, Thumtronics Inc., 6911 Thistle Hill Way, Austin, TX 78754 USA
The neural basis of sublexical speech and corresponding nonspeech processing: a combined EEG-MEG study.
2014
Abstract We addressed the neural organization of speech versus nonspeech sound processing by investigating preattentive cortical auditory processing of changes in five features of a consonant–vowel syllable (consonant, vowel, sound duration, frequency, and intensity) and their acoustically matched nonspeech counterparts in a simultaneous EEG–MEG recording of mismatch negativity (MMN/MMNm). Overall, speech–sound processing was enhanced compared to nonspeech sound processing. This effect was strongest for changes which affect word meaning (consonant, vowel, and vowel duration) in the left and for the vowel identity change in the right hemisphere also. Furthermore, in the right hemisphere, spe…
Does letter position coding depend on consonant/vowel status? Evidence with the masked priming technique
2008
Recently, a number of input coding schemes (e.g., SOLAR model, SERIOL model, open-bigram model, overlap model) have been proposed that capture the transposed-letter priming effect (i.e., faster response times for jugde-JUDGE than for jupte-JUDGE). In their current version, these coding schemes do not assume any processing differences between vowels and consonants. However, in a lexical decision task, Perea and Lupker (2004, JML; Lupker, Perea, & Davis, 2008, L&CP) reported that transposed-letter priming effects occurred for consonant transpositions but not for vowel transpositions. This finding poses a challenge for these recently proposed coding schemes. Here, we report four masked priming…
Harmony perception and regularity of spike trains in a simple auditory model
2013
A probabilistic approach for investigating the phenomena of dissonance and consonance in a simple auditory sensory model, composed by two sensory neurons and one interneuron, is presented. We calculated the interneuron’s firing statistics, that is the interspike interval statistics of the spike train at the output of the interneuron, for consonant and dissonant inputs in the presence of additional "noise", representing random signals from other, nearby neurons and from the environment. We find that blurry interspike interval distributions (ISIDs) characterize dissonant accords, while quite regular ISIDs characterize consonant accords. The informational entropy of the non-Markov spike train …
Beyond alphabetic orthographies: The role of form and phonology in transposition effects in Katakana
2008
In the past years, there has been growing interest in how the order of letters is attained in visual word recognition. Two critical issues are: (1) whether the front-end of the recently proposed models of letter position encoding can be generalised to non-alphabetic scripts, and (2) whether phonology plays an important role in the process of letter position encoding. In the present masked priming lexical decision experiments, we employed a syllabic/moraic script (Katakana), which allows disentangling form and phonology. In Experiment 1, we found a robust masked transposed-mora priming effect: the prime a.ri.me.ka facilitates the processing of the word a.me.ri.ka relative to a double-substit…