0000000001058921

AUTHOR

Javier Naranjo-alcazar

showing 8 related works from this author

A Comparative Analysis of Residual Block Alternatives for End-to-End Audio Classification

2020

Residual learning is known for being a learning framework that facilitates the training of very deep neural networks. Residual blocks or units are made up of a set of stacked layers, where the inputs are added back to their outputs with the aim of creating identity mappings. In practice, such identity mappings are accomplished by means of the so-called skip or shortcut connections. However, multiple implementation alternatives arise with respect to where such skip connections are applied within the set of stacked layers making up a residual block. While residual networks for image classification using convolutional neural networks (CNNs) have been widely discussed in the literature, their a…

Normalization (statistics)General Computer ScienceComputer scienceFeature extractionESC02 engineering and technologycomputer.software_genreResidualConvolutional neural networkconvolutional neural networks0202 electrical engineering electronic engineering information engineeringGeneral Materials Scienceurbansound8kAudio signal processingBlock (data storage)Contextual image classificationGeneral EngineeringAudio classification020206 networking & telecommunications113 Computer and information sciences020201 artificial intelligence & image processinglcsh:Electrical engineering. Electronics. Nuclear engineeringData mininglcsh:TK1-9971computerresidual learningIEEE Access
researchProduct

On the performance of residual block design alternatives in convolutional neural networks for end-to-end audio classification

2019

Residual learning is a recently proposed learning framework to facilitate the training of very deep neural networks. Residual blocks or units are made of a set of stacked layers, where the inputs are added back to their outputs with the aim of creating identity mappings. In practice, such identity mappings are accomplished by means of the so-called skip or residual connections. However, multiple implementation alternatives arise with respect to where such skip connections are applied within the set of stacked layers that make up a residual block. While ResNet architectures for image classification using convolutional neural networks (CNNs) have been widely discussed in the literature, few w…

FOS: Computer and information sciencesSound (cs.SD)Computer Science - Machine LearningAudio and Speech Processing (eess.AS)FOS: Electrical engineering electronic engineering information engineeringComputer Science - SoundMachine Learning (cs.LG)Electrical Engineering and Systems Science - Audio and Speech Processing
researchProduct

Sound Event Localization and Detection using Squeeze-Excitation Residual CNNs

2020

Sound Event Localization and Detection (SELD) is a problem related to the field of machine listening whose objective is to recognize individual sound events, detect their temporal activity, and estimate their spatial location. Thanks to the emergence of more hard-labeled audio datasets, deep learning techniques have become state-of-the-art solutions. The most common ones are those that implement a convolutional recurrent network (CRNN) having previously transformed the audio signal into multichannel 2D representation. The squeeze-excitation technique can be considered as a convolution enhancement that aims to learn spatial and channel feature maps independently rather than together as stand…

FOS: Computer and information sciencesSound (cs.SD)Audio and Speech Processing (eess.AS)FOS: Electrical engineering electronic engineering information engineeringComputer Science - SoundElectrical Engineering and Systems Science - Audio and Speech Processing
researchProduct

An Open-set Recognition and Few-Shot Learning Dataset for Audio Event Classification in Domestic Environments

2020

The problem of training with a small set of positive samples is known as few-shot learning (FSL). It is widely known that traditional deep learning (DL) algorithms usually show very good performance when trained with large datasets. However, in many applications, it is not possible to obtain such a high number of samples. In the image domain, typical FSL applications include those related to face recognition. In the audio domain, music fraud or speaker recognition can be clearly benefited from FSL methods. This paper deals with the application of FSL to the detection of specific and intentional acoustic events given by different types of sound alarms, such as door bells or fire alarms, usin…

FOS: Computer and information sciencesComputer Science - Machine LearningSound (cs.SD)sound processingaudio datasetmachine listeningUNESCO::CIENCIAS TECNOLÓGICASComputer Science - SoundMachine Learning (cs.LG)classificationArtificial IntelligenceAudio and Speech Processing (eess.AS)Signal ProcessingFOS: Electrical engineering electronic engineering information engineeringfew-shot learningopen-set recognitionComputer Vision and Pattern RecognitionSoftwareElectrical Engineering and Systems Science - Audio and Speech Processing
researchProduct

Open Set Audio Classification Using Autoencoders Trained on Few Data.

2020

Open-set recognition (OSR) is a challenging machine learning problem that appears when classifiers are faced with test instances from classes not seen during training. It can be summarized as the problem of correctly identifying instances from a known class (seen during training) while rejecting any unknown or unwanted samples (those belonging to unseen classes). Another problem arising in practical scenarios is few-shot learning (FSL), which appears when there is no availability of a large number of positive samples for training a recognition system. Taking these two limitations into account, a new dataset for OSR and FSL for audio data was recently released to promote research on solution…

Computer scienceOpen set02 engineering and technologylcsh:Chemical technologyMachine learningcomputer.software_genreBiochemistryArticleAnalytical ChemistrySet (abstract data type)open set recognition020204 information systemsaudio classificationautoencoders0202 electrical engineering electronic engineering information engineeringFeature (machine learning)lcsh:TP1-1185few-shot learningElectrical and Electronic EngineeringRepresentation (mathematics)Instrumentationbusiness.industryopen set classificationPerceptronClass (biology)AutoencoderAtomic and Molecular Physics and OpticsEmbedding020201 artificial intelligence & image processingArtificial intelligenceTransfer of learningbusinesscomputerSensors (Basel, Switzerland)
researchProduct

Acoustic Scene Classification with Squeeze-Excitation Residual Networks

2020

Acoustic scene classification (ASC) is a problem related to the field of machine listening whose objective is to classify/tag an audio clip in a predefined label describing a scene location (e. g. park, airport, etc.). Many state-of-the-art solutions to ASC incorporate data augmentation techniques and model ensembles. However, considerable improvements can also be achieved only by modifying the architecture of convolutional neural networks (CNNs). In this work we propose two novel squeeze-excitation blocks to improve the accuracy of a CNN-based ASC framework based on residual learning. The main idea of squeeze-excitation blocks is to learn spatial and channel-wise feature maps independently…

FOS: Computer and information sciencesSound (cs.SD)Computer Science - Machine LearningGeneral Computer ScienceCalibration (statistics)Computer scienceResidualConvolutional neural networkField (computer science)Computer Science - SoundMachine Learning (cs.LG)030507 speech-language pathology & audiology03 medical and health sciencesAudio and Speech Processing (eess.AS)Acoustic scene classificationFeature (machine learning)FOS: Electrical engineering electronic engineering information engineeringGeneral Materials ScienceBlock (data storage)Artificial neural networkbusiness.industrypattern recognitionGeneral Engineeringdeep learningPattern recognitionmachine listeningsqueeze-excitationArtificial intelligencelcsh:Electrical engineering. Electronics. Nuclear engineering0305 other medical sciencebusinesslcsh:TK1-9971Electrical Engineering and Systems Science - Audio and Speech Processing
researchProduct

Anomalous Sound Detection using unsupervised and semi-supervised autoencoders and gammatone audio representation

2020

Anomalous sound detection (ASD) is, nowadays, one of the topical subjects in machine listening discipline. Unsupervised detection is attracting a lot of interest due to its immediate applicability in many fields. For example, related to industrial processes, the early detection of malfunctions or damage in machines can mean great savings and an improvement in the efficiency of industrial processes. This problem can be solved with an unsupervised ASD solution since industrial machines will not be damaged simply by having this audio data in the training stage. This paper proposes a novel framework based on convolutional autoencoders (both unsupervised and semi-supervised) and a Gammatone-base…

FOS: Computer and information sciencesSound (cs.SD)Computer Science - Machine LearningAudio and Speech Processing (eess.AS)FOS: Electrical engineering electronic engineering information engineeringComputer Science - SoundMachine Learning (cs.LG)Electrical Engineering and Systems Science - Audio and Speech Processing
researchProduct

CNN depth analysis with different channel inputs for Acoustic Scene Classification

2019

Acoustic scene classification (ASC) has been approached in the last years using deep learning techniques such as convolutional neural networks or recurrent neural networks. Many state-of-the-art solutions are based on image classification frameworks and, as such, a 2D representation of the audio signal is considered for training these networks. Finding the most suitable audio representation is still a research area of interest. In this paper, different log-Mel representations and combinations are analyzed. Experiments show that the best results are obtained using the harmonic and percussive components plus the difference between left and right stereo channels, (L-R). On the other hand, it i…

FOS: Computer and information sciencesSound (cs.SD)Computer Science - Machine LearningAudio and Speech Processing (eess.AS)FOS: Electrical engineering electronic engineering information engineeringComputer Science - SoundMachine Learning (cs.LG)Electrical Engineering and Systems Science - Audio and Speech Processing
researchProduct