0000000000323644

AUTHOR

Irene Martin-morato

showing 8 related works from this author

A Comparative Analysis of Residual Block Alternatives for End-to-End Audio Classification

2020

Residual learning is known for being a learning framework that facilitates the training of very deep neural networks. Residual blocks or units are made up of a set of stacked layers, where the inputs are added back to their outputs with the aim of creating identity mappings. In practice, such identity mappings are accomplished by means of the so-called skip or shortcut connections. However, multiple implementation alternatives arise with respect to where such skip connections are applied within the set of stacked layers making up a residual block. While residual networks for image classification using convolutional neural networks (CNNs) have been widely discussed in the literature, their a…

Normalization (statistics)General Computer ScienceComputer scienceFeature extractionESC02 engineering and technologycomputer.software_genreResidualConvolutional neural networkconvolutional neural networks0202 electrical engineering electronic engineering information engineeringGeneral Materials Scienceurbansound8kAudio signal processingBlock (data storage)Contextual image classificationGeneral EngineeringAudio classification020206 networking & telecommunications113 Computer and information sciences020201 artificial intelligence & image processinglcsh:Electrical engineering. Electronics. Nuclear engineeringData mininglcsh:TK1-9971computerresidual learningIEEE Access
researchProduct

Adaptive Distance-Based Pooling in Convolutional Neural Networks for Audio Event Classification

2020

In the last years, deep convolutional neural networks have become a standard for the development of state-of-the-art audio classification systems, taking the lead over traditional approaches based on feature engineering. While they are capable of achieving human performance under certain scenarios, it has been shown that their accuracy is severely degraded when the systems are tested over noisy or weakly segmented events. Although better generalization could be obtained by increasing the size of the training dataset, e.g. by applying data augmentation techniques, this also leads to longer and more complex training procedures. In this article, we propose a new type of pooling layer aimed at …

Feature engineeringAcoustics and Ultrasonicsbusiness.industryComputer scienceFeature vectorFeature extractionPoolingPattern recognitionConvolutional neural network030507 speech-language pathology & audiology03 medical and health sciencesComputational MathematicsTransformation (function)Feature (computer vision)Adaptive systemComputer Science (miscellaneous)Artificial intelligenceElectrical and Electronic Engineering0305 other medical sciencebusinessIEEE/ACM Transactions on Audio, Speech, and Language Processing
researchProduct

On the Robustness of Deep Features for Audio Event Classification in Adverse Environments

2018

Deep features, responses to complex input patterns learned within deep neural networks, have recently shown great performance in image recognition tasks, motivating their use for audio analysis tasks as well. These features provide multiple levels of abstraction which permit to select a sufficiently generalized layer to identify classes not seen during training. The generalization capability of such features is very useful due to the lack of complete labeled audio datasets. However, as opposed to classical hand-crafted features such as Mel-frequency cepstral coefficients (MFCCs), the performance impact of having an acoustically adverse environment has not been evaluated in detail. In this p…

ReverberationNoise measurementComputer scienceSpeech recognitionFeature extraction02 engineering and technologyConvolutional neural network030507 speech-language pathology & audiology03 medical and health sciencesRaw audio formatRobustness (computer science)Audio analyzer0202 electrical engineering electronic engineering information engineering020201 artificial intelligence & image processingMel-frequency cepstrum0305 other medical science2018 14th IEEE International Conference on Signal Processing (ICSP)
researchProduct

A case study on feature sensitivity for audio event classification using support vector machines

2016

Automatic recognition of multiple acoustic events is an interesting problem in machine listening that generalizes the classical speech/non-speech or speech/music classification problem. Typical audio streams contain a diversity of sound events that carry important and useful information on the acoustic environment and context. Classification is usually performed by means of hidden Markov models (HMMs) or support vector machines (SVMs) considering traditional sets of features based on Mel-frequency cepstral coefficients (MFCCs) and their temporal derivatives, as well as the energy from auditory-inspired filterbanks. However, while these features are routinely used by many systems, it is not …

Machine listeningComputer sciencebusiness.industryEvent (computing)Speech recognitionFeature extractionContext (language use)Pattern recognition02 engineering and technologySupport vector machine030507 speech-language pathology & audiology03 medical and health sciencesComputingMethodologies_PATTERNRECOGNITION0202 electrical engineering electronic engineering information engineeringFeature (machine learning)020201 artificial intelligence & image processingArtificial intelligenceMel-frequency cepstrum0305 other medical sciencebusinessHidden Markov model2016 IEEE 26th International Workshop on Machine Learning for Signal Processing (MLSP)
researchProduct

Analysis of data fusion techniques for multi-microphone audio event detection in adverse environments

2017

Acoustic event detection (AED) is currently a very active research area with multiple applications in the development of smart acoustic spaces. In this context, the advances brought by Internet of Things (IoT) platforms where multiple distributed microphones are available have also contributed to this interest. In such scenarios, the use of data fusion techniques merging information from several sensors becomes an important aspect in the design of multi-microphone AED systems. In this paper, we present a preliminary analysis of several data-fusion techniques aimed at improving the recognition accuracy of an AED system by taking advantage of the diversity provided by multiple microphones in …

Noise measurementEvent (computing)MicrophoneComputer scienceReal-time computingFeature extractionContext (language use)02 engineering and technologycomputer.software_genreSensor fusion030507 speech-language pathology & audiology03 medical and health sciences0202 electrical engineering electronic engineering information engineeringData analysis020201 artificial intelligence & image processing0305 other medical sciencecomputerData integration2017 IEEE 19th International Workshop on Multimedia Signal Processing (MMSP)
researchProduct

Sound Event Envelope Estimation in Polyphonic Mixtures

2019

Sound event detection is the task of identifying automatically the presence and temporal boundaries of sound events within an input audio stream. In the last years, deep learning methods have established themselves as the state-of-the-art approach for the task, using binary indicators during training to denote whether an event is active or inactive. However, such binary activity indicators do not fully describe the events, and estimating the envelope of the sounds could provide more precise modeling of their activity. This paper proposes to estimate the amplitude envelopes of target sound event classes in polyphonic mixtures. For training, we use the amplitude envelopes of the target sounds…

geographygeography.geographical_feature_categoryComputer scienceSpeech recognition02 engineering and technology113 Computer and information sciencesTask (project management)030507 speech-language pathology & audiology03 medical and health sciencesAmplitudeSignal-to-noise ratio0202 electrical engineering electronic engineering information engineering020201 artificial intelligence & image processingPolyphony0305 other medical scienceSound (geography)Envelope (motion)Event (probability theory)
researchProduct

Adaptive Mid-Term Representations for Robust Audio Event Classification

2018

Low-level audio features are commonly used in many audio analysis tasks, such as audio scene classification or acoustic event detection. Due to the variable length of audio signals, it is a common approach to create fixed-length feature vectors consisting of a set of statistics that summarize the temporal variability of such short-term features. To avoid the loss of temporal information, the audio event can be divided into a set of mid-term segments or texture windows. However, such an approach requires to estimate accurately the onset and offset times of the audio events in order to obtain a robust mid-term statistical description of their temporal evolution. This paper proposes the use of…

Audio signalAcoustics and UltrasonicsComputer sciencebusiness.industryFeature vectorPattern recognition01 natural sciences030507 speech-language pathology & audiology03 medical and health sciencesComputational MathematicsNonlinear systemFraming (construction)Acoustic event detection0103 physical sciencesAudio analyzerComputer Science (miscellaneous)SegmentationArtificial intelligenceElectrical and Electronic Engineering0305 other medical sciencebusiness010301 acousticsTemporal informationIEEE/ACM Transactions on Audio, Speech, and Language Processing
researchProduct

On the performance of residual block design alternatives in convolutional neural networks for end-to-end audio classification

2019

Residual learning is a recently proposed learning framework to facilitate the training of very deep neural networks. Residual blocks or units are made of a set of stacked layers, where the inputs are added back to their outputs with the aim of creating identity mappings. In practice, such identity mappings are accomplished by means of the so-called skip or residual connections. However, multiple implementation alternatives arise with respect to where such skip connections are applied within the set of stacked layers that make up a residual block. While ResNet architectures for image classification using convolutional neural networks (CNNs) have been widely discussed in the literature, few w…

FOS: Computer and information sciencesSound (cs.SD)Computer Science - Machine LearningAudio and Speech Processing (eess.AS)FOS: Electrical engineering electronic engineering information engineeringComputer Science - SoundMachine Learning (cs.LG)Electrical Engineering and Systems Science - Audio and Speech Processing
researchProduct