6533b859fe1ef96bd12b8041
RESEARCH PRODUCT
Emergency Detection with Environment Sound Using Deep Convolutional Neural Networks
Ole-christoffer GranmoMorten GoodwinJivitesh Sharmasubject
Signal processingAudio signalComputer sciencebusiness.industrySpeech recognitionDeep learningFeature extractioncomputer.software_genreConvolutional neural networkBinary classificationMel-frequency cepstrumArtificial intelligenceAudio signal processingbusinesscomputerdescription
In this paper, we propose a generic emergency detection system using only the sound produced in the environment. For this task, we employ multiple audio feature extraction techniques like the mel-frequency cepstral coefficients, gammatone frequency cepstral coefficients, constant Q-transform and chromagram. After feature extraction, a deep convolutional neural network (CNN) is used to classify an audio signal as a potential emergency situation or not. The entire model is based on our previous work that sets the new state of the art in the environment sound classification (ESC) task (Our paper is under review in the IEEE/ACM Transactions on Audio, Speech and Language Processing and also available here https://arxiv.org/abs/1908.11219.) We combine the benchmark ESC datasets: UrbanSound8K and ESC-50 (ESC-10 is a subset of ESC-50) and reduce the problem to a binary classification problem. This is done by aggregating sound classes such as sirens, fire crackling, glass breaking, gunshot as the emergency class and others as normal. Even though there are only two classes to distinguish, they are highly imbalanced. To overcome this difficulty, we introduce class weights in calculating the loss while training the model. Our model is able to achieve \(99.56\%\) emergency detection accuracy.
year | journal | country | edition | language |
---|---|---|---|---|
2020-10-01 |