6533b859fe1ef96bd12b8041

RESEARCH PRODUCT

Emergency Detection with Environment Sound Using Deep Convolutional Neural Networks

Ole-christoffer GranmoMorten GoodwinJivitesh Sharma

subject

Signal processingAudio signalComputer sciencebusiness.industrySpeech recognitionDeep learningFeature extractioncomputer.software_genreConvolutional neural networkBinary classificationMel-frequency cepstrumArtificial intelligenceAudio signal processingbusinesscomputer

description

In this paper, we propose a generic emergency detection system using only the sound produced in the environment. For this task, we employ multiple audio feature extraction techniques like the mel-frequency cepstral coefficients, gammatone frequency cepstral coefficients, constant Q-transform and chromagram. After feature extraction, a deep convolutional neural network (CNN) is used to classify an audio signal as a potential emergency situation or not. The entire model is based on our previous work that sets the new state of the art in the environment sound classification (ESC) task (Our paper is under review in the IEEE/ACM Transactions on Audio, Speech and Language Processing and also available here https://arxiv.org/abs/1908.11219.) We combine the benchmark ESC datasets: UrbanSound8K and ESC-50 (ESC-10 is a subset of ESC-50) and reduce the problem to a binary classification problem. This is done by aggregating sound classes such as sirens, fire crackling, glass breaking, gunshot as the emergency class and others as normal. Even though there are only two classes to distinguish, they are highly imbalanced. To overcome this difficulty, we introduce class weights in calculating the loss while training the model. Our model is able to achieve \(99.56\%\) emergency detection accuracy.

https://doi.org/10.1007/978-981-15-5859-7_14