Search results for "speech recognition"
showing 10 items of 357 documents
Spectrogram analysis of multipath fading channels
2015
The analysis of the Doppler power spectral density (PSD) of measured and simulated data is an important topic in the area of mobile radio channel modelling. In this paper, we estimate the Doppler PSD of multipath fading channels by using the concept of the spectrogram. The spectrogram is a spectral representation that gives insight into how the distribution of the spectral density of a signal changes over time. The multipath fading channel is modelled by a sum-of-cisoids (SOC) process. A closed-form solution is presented for the spectrogram and the corresponding time-dependent autocorrelation function (ACF). The closed-form solutions disclose several unwanted effects that come with the limi…
Atrial activity extraction for atrial fibrillation analysis using blind source separation.
2004
This contribution addresses the extraction of atrial activity (AA) from real electrocardiogram (ECG) recordings of atrial fibrillation (AF). We show the appropriateness of independent component analysis (ICA) to tackle this biomedical challenge when regarded as a blind source separation (BSS) problem. ICA is a statistical tool able to reconstruct the unobservable independent sources of bioelectric activity which generate, through instantaneous linear mixing, a measurable set of signals. The three key hypothesis that make ICA applicable in the present scenario are discussed and validated: 1) AA and ventricular activity (VA) are generated by sources of independent bioelectric activity; 2) AA …
A Musical Pattern Discovery System Founded on a Modeling of Listening Strategies
2004
Music is a domain of expression that conveys a paramount degree of complexity. The musical surface, composed of a multitude of notes, results from the elaboration of numerous structures of different types and sizes. The composer constructs this structural complexity in a more or less explicit way. The listener, faced by such a complex phenomenon, is able to reconstruct only a limited part of it, mostly in a non-explicit way. One particular aim of music analysis is to objectify such complexity, thus offering to the listener a tool for enriching the appreciation of music (Lartillot and SaintJames, 2004). The trouble is, traditional musical analysis, although offering a valuable understanding …
Depression Assessment by Fusing High and Low Level Features from Audio, Video, and Text
2016
International audience; Depression is a major cause of disability world-wide. The present paper reports on the results of our participation to the depression sub-challenge of the sixth Audio/Visual Emotion Challenge (AVEC 2016), which was designed to compare feature modalities ( audio, visual, interview transcript-based) in gender-based and gender-independent modes using a variety of classification algorithms. In our approach, both high and low level features were assessed in each modality. Audio features were extracted from the low-level descriptors provided by the challenge organizers. Several visual features were extracted and assessed including dynamic characteristics of facial elements…
Algorithmic Aspects of Speech Recognition: A Synopsis
2000
Speech recognition is an area with a sizable literature, but there is little discussion of the topic within the computer science algorithms community. Since many of the problems arising in speech recognition are well suited for algorithmic studies, we present them in terms familiar to algorithm designers. Such cross fertilization can breed fresh insights from new perspectives. This material is abstracted from A. L. Buchsbaum and R. Giancarlo, Algorithmic Aspects of Speech Recognition: An Introduction, ACM Journal of Experimental Algorithmics, Vol. 2, 1997, http://www.jea.acm.org.
What's the difference? comparing humans and machines on the Aurora 2 speech recognition task
2013
Processing Continuous Speech in Infancy
2016
The present chapter focuses on fluent speech segmentation abilities in early language development. We first review studies exploring the early use of major prosodic boundary cues which allow infants to cut full utterances into smaller-sized sequences like clauses or phrases. We then summarize studies showing that word segmentation abilities emerge around 8 months, and rely on infants’ processing of various bottom-up word boundary cues and top-down known word recognition cues. Given that most of these cues are specific to the language infants are acquiring, we emphasize how the development of these abilities varies cross-linguistically, and explore their developmental origin. In particular, …
Does training in syllable recognition improve reading speed? A computer-based trial with poor readers from second and third grade.
2013
Repeated reading of infrequent syllables has been shown to increase reading speed at the word level in a transparent orthography. This study confirms these results with a computer-based training method and extends them by comparing the training effects of short syllables and long frequent and infrequent syllables, controlling for rapid automatized naming. Our results, based on a sample of 150 poor readers of Finnish, showed clear gains in reading speed regarding all trained syllables, but a transfer effect to the word level was evident only in the case of long infrequent syllables. Rapid automatized naming was associated with initial reading speed, but not with the training effect. peerRevi…
Tempo Induction from Music Recordings Using Ensemble Empirical Mode Decomposition Analysis
2011
Tempo and beat are among the most important features of Western music. Owing to the perceptual nature of tempo, its automatic analysis and extraction remains a difficult task for a large variety of music genres. Western music notation represents musical events using a hierarchical metrical structure distinguishing different time scales. This hierarchy is often modeled using three levels: the tatum, the tactus, and the measure. The tatum represents the shortest durational value in music that is not just an accidental phenomenon (Bilmes 1993). The tactus period is the most perceptually prominent period, and is the period at which most humans would tap their feet in time with the music (Lerdah…
Automatic fitting of cochlear implants with evolutionary algorithms
2004
This paper presents an optimisation algorithm designed to perform in-situ automatic fitting of cochlear implants.All patients are different, which means that cochlear parametrisation is a difficult and long task, with results ranging from perfect blind speech recognition to patients who cannot make anything out of their implant and just turn it off.The proposed method combines evolutionary algorithms and medical expertise to achieve autonomous interactive fitting through a Personal Digital Assistant (PDA).