6533b835fe1ef96bd129f346
RESEARCH PRODUCT
Speech Emotion Recognition method using time-stretching in the Preprocessing Phase and Artificial Neural Network Classifiers
Mihai NeghinaValentin Catalin Govoreanusubject
Artificial neural networkComputer scienceSpeech recognitionPhase vocoderAudio time-scale/pitch modification020206 networking & telecommunications02 engineering and technologyComputingMethodologies_PATTERNRECOGNITION0202 electrical engineering electronic engineering information engineeringPreprocessor020201 artificial intelligence & image processingMel-frequency cepstrumEmotion recognitionClassifier (UML)Speech ratedescription
Human emotions are playing a significant role in the understanding of human behaviour. There are multiple ways of recognizing human emotions, and one of them is through human speech. This paper aims to present an approach for designing a Speech Emotion Recognition (SER) system for an industrial training station. While assembling a product, the end user emotions can be monitored and used as a parameter for adapting the training station. The proposed method is using a phase vocoder for time-stretching and an Artificial Neural Network (ANN) for classification of five typical different emotions. As input for the ANN classifier, features like Mel Frequency Cepstral Coefficients (MFCCs), short-term energy, zero-crossing rate, pitch and the speech rate were extracted. The proposed method was evaluated on the Ryerson Audio-Visual Database of Emotion Speech and Song (RAVDESS) and shows promising results when compared to other methods such as zero-padding.
| year | journal | country | edition | language |
|---|---|---|---|---|
| 2020-09-03 | 2020 IEEE 16th International Conference on Intelligent Computer Communication and Processing (ICCP) |