6533b854fe1ef96bd12aea54
RESEARCH PRODUCT
On the Robustness of Deep Features for Audio Event Classification in Adverse Environments
Maximo CobosFrancesc J. FerriIrene Martin-moratosubject
ReverberationNoise measurementComputer scienceSpeech recognitionFeature extraction02 engineering and technologyConvolutional neural network030507 speech-language pathology & audiology03 medical and health sciencesRaw audio formatRobustness (computer science)Audio analyzer0202 electrical engineering electronic engineering information engineering020201 artificial intelligence & image processingMel-frequency cepstrum0305 other medical sciencedescription
Deep features, responses to complex input patterns learned within deep neural networks, have recently shown great performance in image recognition tasks, motivating their use for audio analysis tasks as well. These features provide multiple levels of abstraction which permit to select a sufficiently generalized layer to identify classes not seen during training. The generalization capability of such features is very useful due to the lack of complete labeled audio datasets. However, as opposed to classical hand-crafted features such as Mel-frequency cepstral coefficients (MFCCs), the performance impact of having an acoustically adverse environment has not been evaluated in detail. In this paper, we analyze the robustness of deep features under adverse conditions such as noise, reverberation and segmentation errors. The selected features are extracted from SoundNet, a deep convolutional neural network (CNN) for audio classification tasks using raw audio segments as input. The results show that the performance is severely affected by noise and reverberation, with room for improvement in terms of robustness to different kinds of acoustic scenarios.
year | journal | country | edition | language |
---|---|---|---|---|
2018-08-01 | 2018 14th IEEE International Conference on Signal Processing (ICSP) |