6533b861fe1ef96bd12c4c9e

RESEARCH PRODUCT

Adaptive Mid-Term Representations for Robust Audio Event Classification

Francesc J. FerriIrene Martin-moratoMaximo Cobos

subject

Audio signalAcoustics and UltrasonicsComputer sciencebusiness.industryFeature vectorPattern recognition01 natural sciences030507 speech-language pathology & audiology03 medical and health sciencesComputational MathematicsNonlinear systemFraming (construction)Acoustic event detection0103 physical sciencesAudio analyzerComputer Science (miscellaneous)SegmentationArtificial intelligenceElectrical and Electronic Engineering0305 other medical sciencebusiness010301 acousticsTemporal information

description

Low-level audio features are commonly used in many audio analysis tasks, such as audio scene classification or acoustic event detection. Due to the variable length of audio signals, it is a common approach to create fixed-length feature vectors consisting of a set of statistics that summarize the temporal variability of such short-term features. To avoid the loss of temporal information, the audio event can be divided into a set of mid-term segments or texture windows. However, such an approach requires to estimate accurately the onset and offset times of the audio events in order to obtain a robust mid-term statistical description of their temporal evolution. This paper proposes the use of an alternative event representation based on nonlinear time normalization prior to the extraction of mid-term statistics. The short-term features are transformed into a new fixed-length representation that considers uniform distance subsampling over a defined feature space in contrast to the classical short-term temporal framing. The results show that the use of distance-based texture windows provides an improved statistical description of the event robust to errors in the event segmentation stage under noisy conditions.

https://doi.org/10.1109/taslp.2018.2865615