Search results for "Cepstrum"
showing 10 items of 15 documents
Multi-band identification for enhancing bearing fault detection in variable speed conditions
2020
Abstract Rolling element bearings are crucial components in rotating machinery, and avoiding unexpected breakdowns using fault detection methods is an increased demand in industry today. Variable speed conditions render a challenge for vibration-based fault diagnosis due to the non-stationary impact frequency. Computed order tracking transforms the vibration signal from time domain to the shaft-angle domain, allowing order analysis with the envelope spectrum. To enhance fault detection, the bearing resonance frequency region is isolated in the raw signal prior to order tracking. Identification of this region is not trivial but may be estimated using kurtosis-based methods reported in the li…
Image-Evoked Affect and its Impact on Eeg-Based Biometrics
2019
Electroencephalography (EEG) signals provide a representation of the brain’s activity patterns and have been recently exploited for user identification and authentication due to their uniqueness and their robustness to interception and artificial replication. Nevertheless, such signals are commonly affected by the individual’s emotional state. In this work, we examine the use of images as stimulus for acquiring EEG signals and study whether the use of images that evoke similar emotional responses leads to higher identification accuracy compared to images that evoke different emotional responses. Results show that identification accuracy increases when the system is trained with EEG recordin…
Speech Emotion Recognition method using time-stretching in the Preprocessing Phase and Artificial Neural Network Classifiers
2020
Human emotions are playing a significant role in the understanding of human behaviour. There are multiple ways of recognizing human emotions, and one of them is through human speech. This paper aims to present an approach for designing a Speech Emotion Recognition (SER) system for an industrial training station. While assembling a product, the end user emotions can be monitored and used as a parameter for adapting the training station. The proposed method is using a phase vocoder for time-stretching and an Artificial Neural Network (ANN) for classification of five typical different emotions. As input for the ANN classifier, features like Mel Frequency Cepstral Coefficients (MFCCs), short-te…
ASR performance prediction on unseen broadcast programs using convolutional neural networks
2018
In this paper, we address a relatively new task: prediction of ASR performance on unseen broadcast programs. We first propose an heterogenous French corpus dedicated to this task. Two prediction approaches are compared: a state-of-the-art performance prediction based on regression (engineered features) and a new strategy based on convolutional neural networks (learnt features). We particularly focus on the combination of both textual (ASR transcription) and signal inputs. While the joint use of textual and signal features did not work for the regression baseline, the combination of inputs for CNNs leads to the best WER prediction performance. We also show that our CNN prediction remarkably …
Environment Sound Classification using Multiple Feature Channels and Attention based Deep Convolutional Neural Network
2020
In this paper, we propose a model for the Environment Sound Classification Task (ESC) that consists of multiple feature channels given as input to a Deep Convolutional Neural Network (CNN) with Attention mechanism. The novelty of the paper lies in using multiple feature channels consisting of Mel-Frequency Cepstral Coefficients (MFCC), Gammatone Frequency Cepstral Coefficients (GFCC), the Constant Q-transform (CQT) and Chromagram. Such multiple features have never been used before for signal or audio processing. And, we employ a deeper CNN (DCNN) compared to previous models, consisting of spatially separable convolutions working on time and feature domain separately. Alongside, we use atten…
Low-Power Audio Keyword Spotting using Tsetlin Machines
2021
The emergence of Artificial Intelligence (AI) driven Keyword Spotting (KWS) technologies has revolutionized human to machine interaction. Yet, the challenge of end-to-end energy efficiency, memory footprint and system complexity of current Neural Network (NN) powered AI-KWS pipelines has remained ever present. This paper evaluates KWS utilizing a learning automata powered machine learning algorithm called the Tsetlin Machine (TM). Through significant reduction in parameter requirements and choosing logic over arithmetic based processing, the TM offers new opportunities for low-power KWS while maintaining high learning efficacy. In this paper we explore a TM based keyword spotting (KWS) pipe…
Audio-video people recognition system for an intelligent environment
2011
In this paper an audio-video system for intelligent environments with the capability to recognize people is presented. Users are tracked inside the environment and their positions and activities can be logged. Users identities are assessed through a multimodal approach by detecting and recognizing voices and faces through the different cameras and microphones installed in the environment. This approach has been chosen in order to create a flexible and cheap but reliable system, implemented using consumer electronics. Voice features are extracted by a short time cepstrum analysis, and face features are extracted using the eigenfaces technique. The recognition task is solved using the same Su…
A case study on feature sensitivity for audio event classification using support vector machines
2016
Automatic recognition of multiple acoustic events is an interesting problem in machine listening that generalizes the classical speech/non-speech or speech/music classification problem. Typical audio streams contain a diversity of sound events that carry important and useful information on the acoustic environment and context. Classification is usually performed by means of hidden Markov models (HMMs) or support vector machines (SVMs) considering traditional sets of features based on Mel-frequency cepstral coefficients (MFCCs) and their temporal derivatives, as well as the energy from auditory-inspired filterbanks. However, while these features are routinely used by many systems, it is not …
Single-channel EEG-based subject identification using visual stimuli
2021
Electroencephalography (EEG) signals have been recently proposed as a biometrics modality due to some inherent advantages over traditional biometric approaches. In this work, we studied the performance of individual EEG channels for the task of subject identification in the context of EEG-based biometrics using a recently proposed benchmark dataset that contains EEG recordings acquired under various visual and non-visual stimuli using a low-cost consumer-grade EEG device. Results showed that specific EEG electrodes provide consistently higher identification accuracy regardless of the feature and stimuli types used, while features based on the Mel Frequency Cepstral Coefficients (MFCC) provi…
MFCC-based Recurrent Neural Network for automatic clinical depression recognition and assessment from speech
2022
Abstract Clinical depression or Major Depressive Disorder (MDD) is a common and serious medical illness. In this paper, a deep Recurrent Neural Network-based framework is presented to detect depression and to predict its severity level from speech. Low-level and high-level audio features are extracted from audio recordings to predict the 24 scores of the Patient Health Questionnaire and the binary class of depression diagnosis. To overcome the problem of the small size of Speech Depression Recognition (SDR) datasets, expanding training labels and transferred features are considered. The proposed approach outperforms the state-of-art approaches on the DAIC-WOZ database with an overall accura…