Search results for "Speech recognition"
showing 10 items of 357 documents
Speech Emotion Recognition method using time-stretching in the Preprocessing Phase and Artificial Neural Network Classifiers
2020
Human emotions are playing a significant role in the understanding of human behaviour. There are multiple ways of recognizing human emotions, and one of them is through human speech. This paper aims to present an approach for designing a Speech Emotion Recognition (SER) system for an industrial training station. While assembling a product, the end user emotions can be monitored and used as a parameter for adapting the training station. The proposed method is using a phase vocoder for time-stretching and an Artificial Neural Network (ANN) for classification of five typical different emotions. As input for the ANN classifier, features like Mel Frequency Cepstral Coefficients (MFCCs), short-te…
Fully automatic face recognition system using a combined audio-visual approach
2005
This paper presents a novel audio and video information fusion approach that greatly improves automatic recognition of people in video sequences. To that end, audio and video information is first used independently to obtain confidence values that indicate the likelihood that a specific person appears in a video shot. Finally, a post-classifier is applied to fuse audio and visual confidence values. The system has been tested on several news sequences and the results indicate that a significant improvement in the recognition rate can be achieved when both modalities are used together.
2015
Visuo-auditory sensory substitution systems are augmented reality devices that translate a video stream into an audio stream in order to help the blind in daily tasks requiring visuo-spatial information. In this work, we present both a new mobile device and a transcoding method specifically designed to sonify moving objects. Frame differencing is used to extract spatial features from the video stream and two-dimensional spatial information is converted into audio cues using pitch, interaural time difference and interaural level difference. Using numerical methods, we attempt to reconstruct visuo-spatial information based on audio signals generated from various video stimuli. We show that de…
Steered Response Power Localization of Acoustic Passband Signals
2017
The vast majority of localization approaches using phase transform (PHAT) consider that the sources of interest are wideband low-pass sources. While this may be the usual case for common audio signals such as speech, PHAT methods are affected negatively by modulation artifacts when the sources to be localized are passband signals. In these cases, steered response power PHAT localization becomes less robust. This letter analyzes the form of generalized cross-correlation functions with PHAT when passband acoustic signals are considered, proposing approaches for increasing the localization performance through the mitigation of these negative effects.
Auditory distance perception in an acoustic pipe
2008
In a study of auditory distance perception, we investigated the effects of exaggeration the acoustic cue of reverberation where the intensity of sound did not vary noticeably. The set of stimuli was obtained by moving a sound source inside a 10.2-m long pipe having a 0.3-m diameter. Twelve subjects were asked to listen to a speech sound while keeping their head inside the pipe and then to estimate the egocentric distance from the sound source using a magnitude production procedure. The procedure was repeated eighteen times using six different positions of the sound source. Results show that the point at which perceived distance equals physical distance is located approximately 3.5 m away fr…
Event-related brain responses while listening to entire pieces of music
2017
Brain responses to discrete short sounds have been studied intensively using the event-related potential (ERP) method, in which the electroencephalogram (EEG) signal is divided into epochs time-locked to stimuli of interest. Here we introduce and apply a novel technique which enables one to isolate ERPs in human elicited by continuous music. The ERPs were recorded during listening to a Tango Nuevo piece, a deep techno track and an acoustic lullaby. Acoustic features related to timbre, harmony, and dynamics of the audio signal were computationally extracted from the musical pieces. Negative deflation occurring around 100 milliseconds after the stimulus onset (N100) and positive deflation occ…
Repetition suppression comprises both attention-independent and attention-dependent processes.
2014
International audience; Repetition suppression, a robust phenomenon of reduction in neural responses to stimulus repetition, is suggested to consist of a combination of bottom-up adaptation and top-down prediction effects. However, there is little consensus on how repetition suppression is related to attention in functional magnetic resonance imaging (fMRI) studies. It is probably because fMRI integrates neural activity related to adaptation and prediction effects, which are respectively attention-independent and attention-dependent. Here we orthogonally manipulated stimulus repetition and attention in a target detection task while participants' electroencephalography (EEG) was recorded. In…
2013
To identify factors limiting performance in multitone intensity discrimination, we presented sequences of five pure tones alternating in level between loud (85 dB SPL) and soft (30, 55, or 80 dB SPL). In the “overall-intensity task”, listeners detected a level increment on all of the five tones. In the “masking task”, the level increment was imposed only on the soft tones, rendering the soft tones targets and loud tones task-irrelevant maskers. Decision weights quantifying the importance of the five tone levels for the decision were estimated using methods of molecular psychophysics. Compatible with previous studies, listeners placed higher weights on the loud tones than on the soft tones i…
Categorization of Extremely Brief Auditory Stimuli: Domain-Specific or Domain-General Processes?
2011
The present study investigated the minimum amount of auditory stimulation that allows differentiation of spoken voices, instrumental music, and environmental sounds. Three new findings were reported. 1) All stimuli were categorized above chance level with 50 ms-segments. 2) When a peak-level normalization was applied, music and voices started to be accurately categorized with 20 ms-segments. When the root-mean-square (RMS) energy of the stimuli was equalized, voice stimuli were better recognized than music and environmental sounds. 3) Further psychoacoustical analyses suggest that the categorization of extremely brief auditory stimuli depends on the variability of their spectral envelope in…
Prior Precision Modulates the Minimization of Auditory Prediction Error
2019
International audience; The predictive coding model of perception proposes that successful representation of the perceptual world depends upon canceling out the discrepancy between prediction and sensory input (i.e., prediction error). Recent studies further suggest a distinction to be made between prediction error triggered by non-predicted stimuli of different prior precision (i.e., inverse variance). However, it is not fully understood how prediction error with different precision levels is minimized in the predictive process. Here, we conducted a magnetoencephalography (MEG) experiment which orthogonally manipulated prime-probe relation (for contextual precision) and stimulus repetition…