Search results for "speech recognition"

showing 10 items of 357 documents

Speech Emotion Recognition method using time-stretching in the Preprocessing Phase and Artificial Neural Network Classifiers

2020

Human emotions are playing a significant role in the understanding of human behaviour. There are multiple ways of recognizing human emotions, and one of them is through human speech. This paper aims to present an approach for designing a Speech Emotion Recognition (SER) system for an industrial training station. While assembling a product, the end user emotions can be monitored and used as a parameter for adapting the training station. The proposed method is using a phase vocoder for time-stretching and an Artificial Neural Network (ANN) for classification of five typical different emotions. As input for the ANN classifier, features like Mel Frequency Cepstral Coefficients (MFCCs), short-te…

Artificial neural networkComputer scienceSpeech recognitionPhase vocoderAudio time-scale/pitch modification020206 networking & telecommunications02 engineering and technologyComputingMethodologies_PATTERNRECOGNITION0202 electrical engineering electronic engineering information engineeringPreprocessor020201 artificial intelligence & image processingMel-frequency cepstrumEmotion recognitionClassifier (UML)Speech rate2020 IEEE 16th International Conference on Intelligent Computer Communication and Processing (ICCP)
researchProduct

Fully automatic face recognition system using a combined audio-visual approach

2005

This paper presents a novel audio and video information fusion approach that greatly improves automatic recognition of people in video sequences. To that end, audio and video information is first used independently to obtain confidence values that indicate the likelihood that a specific person appears in a video shot. Finally, a post-classifier is applied to fuse audio and visual confidence values. The system has been tested on several news sequences and the results indicate that a significant improvement in the recognition rate can be achieved when both modalities are used together.

Audio miningDynamic time warpingModalitiesComputer sciencebusiness.industryShot (filmmaking)Speech recognitionComputingMethodologies_IMAGEPROCESSINGANDCOMPUTERVISIONVideo sequenceFacial recognition systemVideo trackingSignal ProcessingFuse (electrical)Computer visionArtificial intelligenceElectrical and Electronic EngineeringbusinessIEE Proceedings - Vision, Image, and Signal Processing
researchProduct

2015

Visuo-auditory sensory substitution systems are augmented reality devices that translate a video stream into an audio stream in order to help the blind in daily tasks requiring visuo-spatial information. In this work, we present both a new mobile device and a transcoding method specifically designed to sonify moving objects. Frame differencing is used to extract spatial features from the video stream and two-dimensional spatial information is converted into audio cues using pitch, interaural time difference and interaural level difference. Using numerical methods, we attempt to reconstruct visuo-spatial information based on audio signals generated from various video stimuli. We show that de…

Audio signalComputer Networks and Communicationsbusiness.industryComputer scienceSpeech recognitionMotion detectionTranscodingAudio signal flowVideo processingcomputer.software_genreSensory substitutionArtificial IntelligenceHardware and ArchitectureSonificationComputer visionArtificial intelligencebusinessAudio signal processingcomputerSoftwareInformation SystemsFrontiers in ICT
researchProduct

Steered Response Power Localization of Acoustic Passband Signals

2017

The vast majority of localization approaches using phase transform (PHAT) consider that the sources of interest are wideband low-pass sources. While this may be the usual case for common audio signals such as speech, PHAT methods are affected negatively by modulation artifacts when the sources to be localized are passband signals. In these cases, steered response power PHAT localization becomes less robust. This letter analyzes the form of generalized cross-correlation functions with PHAT when passband acoustic signals are considered, proposing approaches for increasing the localization performance through the mitigation of these negative effects.

Audio signalComputer scienceApplied MathematicsSpeech recognitionAcousticsBandwidth (signal processing)020206 networking & telecommunications02 engineering and technology030507 speech-language pathology & audiology03 medical and health sciencesModulationSignal Processing0202 electrical engineering electronic engineering information engineeringElectrical and Electronic EngineeringWideband0305 other medical sciencePassbandIEEE Signal Processing Letters
researchProduct

Auditory distance perception in an acoustic pipe

2008

In a study of auditory distance perception, we investigated the effects of exaggeration the acoustic cue of reverberation where the intensity of sound did not vary noticeably. The set of stimuli was obtained by moving a sound source inside a 10.2-m long pipe having a 0.3-m diameter. Twelve subjects were asked to listen to a speech sound while keeping their head inside the pipe and then to estimate the egocentric distance from the sound source using a magnitude production procedure. The procedure was repeated eighteen times using six different positions of the sound source. Results show that the point at which perceived distance equals physical distance is located approximately 3.5 m away fr…

Auditory displayReverberationRange (music)Critical distanceSound and Music ComputingGeneral Computer SciencePerformanceSpeech recognitionmedia_common.quotation_subjectExperimental and Cognitive PsychologySound and Music Computing; Auditory display; Distance perceptionTheoretical Computer ScienceLoudnessPerceptionExperimentationSound (geography)media_commonMathematicsExperimentation; Measurement; Performance; Acoustic pipe; Auditory display; Distance perceptionMeasurementgeographygeography.geographical_feature_categorySettore INF/01 - InformaticaAuditory displaySound intensityAcoustic pipeAcoustic pipe; auditory display; distance perceptionDistance perceptionACM Transactions on Applied Perception
researchProduct

Event-related brain responses while listening to entire pieces of music

2017

Brain responses to discrete short sounds have been studied intensively using the event-related potential (ERP) method, in which the electroencephalogram (EEG) signal is divided into epochs time-locked to stimuli of interest. Here we introduce and apply a novel technique which enables one to isolate ERPs in human elicited by continuous music. The ERPs were recorded during listening to a Tango Nuevo piece, a deep techno track and an acoustic lullaby. Acoustic features related to timbre, harmony, and dynamics of the audio signal were computationally extracted from the musical pieces. Negative deflation occurring around 100 milliseconds after the stimulus onset (N100) and positive deflation occ…

Auditory perceptionAdultMaleSpeech recognitionMismatch negativityStimulus (physiology)Electroencephalographyevent-related potentialsta3112050105 experimental psychology03 medical and health sciencesYoung Adult0302 clinical medicineEvent-related potentialmedicineHumans0501 psychology and cognitive sciencesmusicN100P200Evoked PotentialsCerebral CortexN100CommunicationAudio signalmedicine.diagnostic_testbusiness.industryGeneral Neuroscience05 social sciencesMiddle Agedta6131Auditory PerceptionFemalePsychologybusinessTimbre030217 neurology & neurosurgeryelectroencephalographymusical featuresNeuroscience
researchProduct

Repetition suppression comprises both attention-independent and attention-dependent processes.

2014

International audience; Repetition suppression, a robust phenomenon of reduction in neural responses to stimulus repetition, is suggested to consist of a combination of bottom-up adaptation and top-down prediction effects. However, there is little consensus on how repetition suppression is related to attention in functional magnetic resonance imaging (fMRI) studies. It is probably because fMRI integrates neural activity related to adaptation and prediction effects, which are respectively attention-independent and attention-dependent. Here we orthogonally manipulated stimulus repetition and attention in a target detection task while participants' electroencephalography (EEG) was recorded. In…

Auditory perceptionAdultMalemedicine.medical_specialtyCognitive NeuroscienceSpeech recognitionElectroencephalographyAudiologyStimulus (physiology)Neural activity[SCCO]Cognitive scienceYoung AdultmedicineHumansAttentionta515medicine.diagnostic_test[SCCO.NEUR]Cognitive science/NeuroscienceBrainElectroencephalographyAdaptation PhysiologicalAmplitudeNeurologyAcoustic StimulationAuditory PerceptionFemaleFunctional magnetic resonance imagingPsychologyNeuroImage
researchProduct

2013

To identify factors limiting performance in multitone intensity discrimination, we presented sequences of five pure tones alternating in level between loud (85 dB SPL) and soft (30, 55, or 80 dB SPL). In the “overall-intensity task”, listeners detected a level increment on all of the five tones. In the “masking task”, the level increment was imposed only on the soft tones, rendering the soft tones targets and loud tones task-irrelevant maskers. Decision weights quantifying the importance of the five tone levels for the decision were estimated using methods of molecular psychophysics. Compatible with previous studies, listeners placed higher weights on the loud tones than on the soft tones i…

Auditory perceptionMultidisciplinaryStimulus modalityNoise reductionSpeech recognitionQUIETPsychophysicsPerceptual MaskingSound pressureMathematicsOptimal decisionPLOS ONE
researchProduct

Categorization of Extremely Brief Auditory Stimuli: Domain-Specific or Domain-General Processes?

2011

The present study investigated the minimum amount of auditory stimulation that allows differentiation of spoken voices, instrumental music, and environmental sounds. Three new findings were reported. 1) All stimuli were categorized above chance level with 50 ms-segments. 2) When a peak-level normalization was applied, music and voices started to be accurately categorized with 20 ms-segments. When the root-mean-square (RMS) energy of the stimuli was equalized, voice stimuli were better recognized than music and environmental sounds. 3) Further psychoacoustical analyses suggest that the categorization of extremely brief auditory stimuli depends on the variability of their spectral envelope in…

Auditory perceptionNormalization (statistics)Property (programming)Experimental psychologySpeech recognitionmedia_common.quotation_subjectlcsh:MedicineBiologySocial and Behavioral SciencesPerceptionPsychophysicsPsychologyHumanslcsh:ScienceSet (psychology)Biologymedia_commonMultidisciplinarylcsh:RExperimental PsychologyRecognition PsychologySensory SystemsSoundAuditory SystemAcoustic StimulationCategorizationSpectral envelopeAuditory PerceptionVoiceSensory Perceptionlcsh:QMusicResearch ArticleNeurosciencePsychoacousticsPLoS ONE
researchProduct

Prior Precision Modulates the Minimization of Auditory Prediction Error

2019

International audience; The predictive coding model of perception proposes that successful representation of the perceptual world depends upon canceling out the discrepancy between prediction and sensory input (i.e., prediction error). Recent studies further suggest a distinction to be made between prediction error triggered by non-predicted stimuli of different prior precision (i.e., inverse variance). However, it is not fully understood how prediction error with different precision levels is minimized in the predictive process. Here, we conducted a magnetoencephalography (MEG) experiment which orthogonally manipulated prime-probe relation (for contextual precision) and stimulus repetition…

Auditory perceptionrepetitionMean squared prediction errorSpeech recognitionmedia_common.quotation_subjectStimulus (physiology)050105 experimental psychologylcsh:RC321-571Cognitive Penetration[SCCO]Cognitive science03 medical and health sciencesBehavioral Neuroscience0302 clinical medicinePerceptual learningPerceptionmedicinemagnetoencephalography (MEG)0501 psychology and cognitive sciencesaivotutkimuspredictive codinglcsh:Neurosciences. Biological psychiatry. Neuropsychiatryennakointita515Biological PsychiatryOriginal ResearchVisual CortexMathematicsmedia_commonPredictive codingprediction errorMEGmedicine.diagnostic_testmagnetoencephalagraphy (MEG)[SCCO.NEUR]Cognitive science/Neuroscience05 social sciencesMagnetoencephalographykuuloauditory perceptionPsychiatry and Mental healthNeuropsychology and Physiological Psychologyhavainnointi ja aistiminenNeurologyMinificationtoistoärsykkeet030217 neurology & neurosurgeryNeuroscienceCoding TheoryFrontiers in Human Neuroscience
researchProduct