Pedro Zuccarello
32×32 winner-take-all matrix with single winner selection
A 32 × 32 winner-take-all (WTA) matrix with single winner selection is introduced. A high-resolution gain-boosted regulated-cascode WTA circuit is used in a first competition stage. Because of the large number of competing cells the possibility of a multiple winners situation arises. A single winner is obtained by means of a digital inhibitory circuit following each WTA analogue amplifier. Simulations show that this mixed analogue-digital circuit achieves its objective with a current resolution of approximately 10 nA (0.8% of the maximum input current in the simulated case). A time response of ?s can be achieved.
Selective Change Driven Imaging: A Biomimetic Visual Sensing Strategy
Selective Change Driven (SCD) Vision is a biologically inspired strategy for acquiring, transmitting and processing images that significantly speeds up image sensing. SCD vision is based on a new CMOS image sensor which delivers, ordered by the absolute magnitude of its change, the pixels that have changed after the last time they were read out. Moreover, the traditional full frame processing hardware and programming methodology has to be changed, as a part of this biomimetic approach, to a new processing paradigm based on pixel processing in a data flow manner, instead of full frame image processing.
A Comparative Analysis of Residual Block Alternatives for End-to-End Audio Classification
Residual learning is known for being a learning framework that facilitates the training of very deep neural networks. Residual blocks or units are made up of a set of stacked layers, where the inputs are added back to their outputs with the aim of creating identity mappings. In practice, such identity mappings are accomplished by means of the so-called skip or shortcut connections. However, multiple implementation alternatives arise with respect to where such skip connections are applied within the set of stacked layers making up a residual block. While residual networks for image classification using convolutional neural networks (CNNs) have been widely discussed in the literature, their a…
On the performance of residual block design alternatives in convolutional neural networks for end-to-end audio classification
Residual learning is a recently proposed learning framework to facilitate the training of very deep neural networks. Residual blocks or units are made of a set of stacked layers, where the inputs are added back to their outputs with the aim of creating identity mappings. In practice, such identity mappings are accomplished by means of the so-called skip or residual connections. However, multiple implementation alternatives arise with respect to where such skip connections are applied within the set of stacked layers that make up a residual block. While ResNet architectures for image classification using convolutional neural networks (CNNs) have been widely discussed in the literature, few w…
Applying logistic regression to relevance feedback in image retrieval systems
This paper deals with the problem of image retrieval from large image databases. A particularly interesting problem is the retrieval of all images which are similar to one in the user's mind, taking into account his/her feedback which is expressed as positive or negative preferences for the images that the system progressively shows during the search. Here we present a novel algorithm for the incorporation of user preferences in an image retrieval system based exclusively on the visual content of the image, which is stored as a vector of low-level features. The algorithm considers the probability of an image belonging to the set of those sought by the user, and models the logit of this prob…
Sound Event Localization and Detection using Squeeze-Excitation Residual CNNs
Sound Event Localization and Detection (SELD) is a problem related to the field of machine listening whose objective is to recognize individual sound events, detect their temporal activity, and estimate their spatial location. Thanks to the emergence of more hard-labeled audio datasets, deep learning techniques have become state-of-the-art solutions. The most common ones are those that implement a convolutional recurrent network (CRNN) having previously transformed the audio signal into multichannel 2D representation. The squeeze-excitation technique can be considered as a convolution enhancement that aims to learn spatial and channel feature maps independently rather than together as stand…
An Open-set Recognition and Few-Shot Learning Dataset for Audio Event Classification in Domestic Environments
The problem of training with a small set of positive samples is known as few-shot learning (FSL). It is widely known that traditional deep learning (DL) algorithms usually show very good performance when trained with large datasets. However, in many applications, it is not possible to obtain such a high number of samples. In the image domain, typical FSL applications include those related to face recognition. In the audio domain, music fraud or speaker recognition can be clearly benefited from FSL methods. This paper deals with the application of FSL to the detection of specific and intentional acoustic events given by different types of sound alarms, such as door bells or fire alarms, usin…
On the Advantages of Asynchronous Pixel Reading and Processing for High-Speed Motion Estimation
Biological visual systems are becoming an interesting source for the improvement of artificial visual systems. A biologically inspired read-out and pixel processing strategy is presented. This read-out mechanism is based on Selective pixel Change-Driven (SCD) processing. Pixels are individually processed and read-out instead of the classical approach where the read-out and processing is based on complete frames. Changing pixels are read-out and processed at short time intervals. The simulated experiments show that the response delay using this strategy is several orders of magnitude lower than current cameras while still keeping the same, or even tighter, bandwidth requirements.
Taking Advantage of Selective Change Driven Processing for 3D Scanning
This article deals with the application of the principles of SCD (Selective Change Driven) vision to 3D laser scanning. Two experimental sets have been implemented: one with a classical CMOS (Complementary Metal-Oxide Semiconductor) sensor, and the other one with a recently developed CMOS SCD sensor for comparative purposes, both using the technique known as Active Triangulation. An SCD sensor only delivers the pixels that have changed most, ordered by the magnitude of their change since their last readout. The 3D scanning method is based on the systematic search through the entire image to detect pixels that exceed a certain threshold, showing the SCD approach to be ideal for this applicat…
Open Set Audio Classification Using Autoencoders Trained on Few Data.
Open-set recognition (OSR) is a challenging machine learning problem that appears when classifiers are faced with test instances from classes not seen during training. It can be summarized as the problem of correctly identifying instances from a known class (seen during training) while rejecting any unknown or unwanted samples (those belonging to unseen classes). Another problem arising in practical scenarios is few-shot learning (FSL), which appears when there is no availability of a large number of positive samples for training a recognition system. Taking these two limitations into account, a new dataset for OSR and FSL for audio data was recently released to promote research on solution…
A novel Bayesian framework for relevance feedback in image content-based retrieval systems
This paper presents a new algorithm for image retrieval in content-based image retrieval systems. The objective of these systems is to get the images which are as similar as possible to a user query from those contained in the global image database without using textual annotations attached to the images. The main problem in obtaining a robust and effective retrieval is the gap between the low level descriptors that can be automatically extracted from the images and the user intention. The algorithm proposed here to address this problem is based on the modeling of user preferences as a probability distribution on the image space. Following a Bayesian methodology, this distribution is the pr…
Acoustic Scene Classification with Squeeze-Excitation Residual Networks
Acoustic scene classification (ASC) is a problem related to the field of machine listening whose objective is to classify/tag an audio clip in a predefined label describing a scene location (e. g. park, airport, etc.). Many state-of-the-art solutions to ASC incorporate data augmentation techniques and model ensembles. However, considerable improvements can also be achieved only by modifying the architecture of convolutional neural networks (CNNs). In this work we propose two novel squeeze-excitation blocks to improve the accuracy of a CNN-based ASC framework based on residual learning. The main idea of squeeze-excitation blocks is to learn spatial and channel-wise feature maps independently…
IOWA Operators and Its Application to Image Retrieval
This paper presents a relevance feedback procedure based on logistic regression analysis. Since, the dimension of the feature vector associated to each image is typically larger than the number of evaluated images by the user, different logistic regression models have to be fitted separately. Each fitted model provides us with a relevance probability and a confidence interval for that probability. In order to aggregate these set of probabilities and confidence intervals we use an IOWA operator. The results will show the success of our algorithm and that OWA operators are an efficient and natural way of dealing with this kind of fusion problems.
Anomalous Sound Detection using unsupervised and semi-supervised autoencoders and gammatone audio representation
Anomalous sound detection (ASD) is, nowadays, one of the topical subjects in machine listening discipline. Unsupervised detection is attracting a lot of interest due to its immediate applicability in many fields. For example, related to industrial processes, the early detection of malfunctions or damage in machines can mean great savings and an improvement in the efficiency of industrial processes. This problem can be solved with an unsupervised ASD solution since industrial machines will not be damaged simply by having this audio data in the training stage. This paper proposes a novel framework based on convolutional autoencoders (both unsupervised and semi-supervised) and a Gammatone-base…
CNN depth analysis with different channel inputs for Acoustic Scene Classification
Acoustic scene classification (ASC) has been approached in the last years using deep learning techniques such as convolutional neural networks or recurrent neural networks. Many state-of-the-art solutions are based on image classification frameworks and, as such, a 2D representation of the audio signal is considered for training these networks. Finding the most suitable audio representation is still a research area of interest. In this paper, different log-Mel representations and combinations are analyzed. Experiments show that the best results are obtained using the harmonic and percussive components plus the difference between left and right stereo channels, (L-R). On the other hand, it i…
Selective Change-Driven Image Processing: A Speeding-Up Strategy
Biologically inspired schemes are a source for the improvement of visual systems. Real-time implementation of image processing algorithms is constrained by the large amount of data to be processed. Full image processing is many times unnecessary since there are many pixels that suffer a small change or not suffer any change at all. A strategy based on delivering and processing pixels, instead of processing the complete frame, is presented. The pixels that have suffered higher changes in each frame, ordered by the absolute value of its change, are read-out and processed. Two examples are shown: a morphological motion detection algorithm and the Horn and Schunck optical flow algorithm. Result…