0000000000369008
AUTHOR
Maximo Cobos
Real-time Sound Source Localization on an Embedded GPU Using a Spherical Microphone Array
Abstract Spherical microphone arrays are becoming increasingly important in acoustic signal processing systems for their applications in sound field analysis, beamforming, spatial audio, etc. The positioning of target and interfering sound sources is a crucial step in many of the above applications. Therefore, 3D sound source localization is a highly relevant topic in the acoustic signal processing field. However, spherical microphone arrays are usually composed of many microphones and running signal processing localization methods in real time is an important issue. Some works have already shown the potential of Graphic Processing Units (GPUs) for developing high-end real-time signal proce…
SART3D: A MATLAB toolbox for spatial audio and signal processing education
On the performance of multi-GPU-based expert systems for acoustic localization involving massive microphone arrays
Sound source localization is an important topic in expert systems involving microphone arrays, such as automatic camera steering systems, human-machine interaction, video gaming or audio surveillance. The Steered Response Power with Phase Transform (SRP-PHAT) algorithm is a well-known approach for sound source localization due to its robust performance in noisy and reverberant environments. This algorithm analyzes the sound power captured by an acoustic beamformer on a defined spatial grid, estimating the source location as the point that maximizes the output power. Since localization accuracy can be improved by using high-resolution spatial grids and a high number of microphones, accurate …
A Comparative Analysis of Residual Block Alternatives for End-to-End Audio Classification
Residual learning is known for being a learning framework that facilitates the training of very deep neural networks. Residual blocks or units are made up of a set of stacked layers, where the inputs are added back to their outputs with the aim of creating identity mappings. In practice, such identity mappings are accomplished by means of the so-called skip or shortcut connections. However, multiple implementation alternatives arise with respect to where such skip connections are applied within the set of stacked layers making up a residual block. While residual networks for image classification using convolutional neural networks (CNNs) have been widely discussed in the literature, their a…
On the performance of residual block design alternatives in convolutional neural networks for end-to-end audio classification
Residual learning is a recently proposed learning framework to facilitate the training of very deep neural networks. Residual blocks or units are made of a set of stacked layers, where the inputs are added back to their outputs with the aim of creating identity mappings. In practice, such identity mappings are accomplished by means of the so-called skip or residual connections. However, multiple implementation alternatives arise with respect to where such skip connections are applied within the set of stacked layers that make up a residual block. While ResNet architectures for image classification using convolutional neural networks (CNNs) have been widely discussed in the literature, few w…
Low-complexity AoA and AoD Estimation in the Transformed Spatial Domain for Millimeter Wave MIMO Channels
High-accuracy angle of arrival (AoA) and angle of departure (AoD) estimation is critical for cell search, stable communications and positioning in millimeter wave (mmWave) cellular systems. Moreover, the design of low-complexity AoA/AoD estimation algorithms is also of major importance in the deployment of practical systems to enable a fast and resource-efficient computation of beamforming weights. Parametric mmWave channel estimation allows to describe the channel matrix as a combination of direction-dependent signal paths, exploiting the sparse characteristics of mmWave channels. In this context, a fast Transformed Spatial Domain Channel Estimation (TSDCE) algorithm was recently proposed …
Simultaneous ranging and self-positioning in unsynchronized wireless acoustic sensor networks
Automatic ranging and self-positioning is a very desirable property in wireless acoustic sensor networks, where nodes have at least one microphone and one loudspeaker. However, due to environmental noise, interference, and multipath effects, audio-based ranging is a challenging task. This paper presents a fast ranging and positioning strategy that makes use of the correlation properties of pseudonoise sequences for estimating simultaneously relative time-of-arrivals from multiple acoustic nodes. To this end, a proper test signal design adapted to the acoustic node transducers is proposed. In addition, a novel self-interference reduction method and a peak matching algorithm are introduced, a…
Stereo to Wave-Field Synthesis music up-mixing: An objective and subjective evaluation
Sound source separation techniques are known to be very useful in many applications. High fidelity and audio oriented applications are a challenging issue in this topic, however, existing algorithms are far from performing with such a high quality. In this paper, a subjective and objective evaluation are carried out for several algorithms designed for dealing with stereo music mixtures. The performance of these algorithms applied to acoustic scene resynthesis in a Wave Field Synthesis system is discussed.
A Wireless Acoustic Array System for Binaural Loudness Evaluation in Cities
Networks of acoustic sensors are being deployed in smart cities to continuously monitor noise levels. In this paper, a novel acoustic sensor device is designed for binaural loudness evaluation, in a standalone platform. The audio is acquired from an array of microphones and a binaural signal is synthesized by a direction-of-arrival algorithm and a head-related transfer function. Hardware setup and software algorithms are presented and the results are discussed. Finally, the tests conducted in an early deployment show the feasibility of using the device to carry out large temporal and spatial sampling for the evaluation of binaural loudness.
Ray-Space-Based Multichannel Nonnegative Matrix Factorization for Audio Source Separation
Nonnegative matrix factorization (NMF) has been traditionally considered a promising approach for audio source separation. While standard NMF is only suited for single-channel mixtures, extensions to consider multi-channel data have been also proposed. Among the most popular alternatives, multichannel NMF (MNMF) and further derivations based on constrained spatial covariance models have been successfully employed to separate multi-microphone convolutive mixtures. This letter proposes a MNMF extension by considering a mixture model with Ray-Space-transformed signals, where magnitude data successfully encodes source locations as frequency-independent linear patterns. We show that the MNMF alg…
Sound Event Localization and Detection using Squeeze-Excitation Residual CNNs
Sound Event Localization and Detection (SELD) is a problem related to the field of machine listening whose objective is to recognize individual sound events, detect their temporal activity, and estimate their spatial location. Thanks to the emergence of more hard-labeled audio datasets, deep learning techniques have become state-of-the-art solutions. The most common ones are those that implement a convolutional recurrent network (CRNN) having previously transformed the audio signal into multichannel 2D representation. The squeeze-excitation technique can be considered as a convolution enhancement that aims to learn spatial and channel feature maps independently rather than together as stand…
An Open-set Recognition and Few-Shot Learning Dataset for Audio Event Classification in Domestic Environments
The problem of training with a small set of positive samples is known as few-shot learning (FSL). It is widely known that traditional deep learning (DL) algorithms usually show very good performance when trained with large datasets. However, in many applications, it is not possible to obtain such a high number of samples. In the image domain, typical FSL applications include those related to face recognition. In the audio domain, music fraud or speaker recognition can be clearly benefited from FSL methods. This paper deals with the application of FSL to the detection of specific and intentional acoustic events given by different types of sound alarms, such as door bells or fire alarms, usin…
Frequency-Sliding Generalized Cross-Correlation: A Sub-Band Time Delay Estimation Approach
The generalized cross correlation (GCC) is regarded as the most popular approach for estimating the time difference of arrival (TDOA) between the signals received at two sensors. Time delay estimates are obtained by maximizing the GCC output, where the direct-path delay is usually observed as a prominent peak. Moreover, GCCs play also an important role in steered response power (SRP) localization algorithms, where the SRP functional can be written as an accumulation of the GCCs computed from multiple sensor pairs. Unfortunately, the accuracy of TDOA estimates is affected by multiple factors, including noise, reverberation and signal bandwidth. In this paper, a sub-band approach for time del…
An Efficient Implementation of Parallel Parametric HRTF Models for Binaural Sound Synthesis in Mobile Multimedia
The extended use of mobile multimedia devices in applications like gaming, 3D video and audio reproduction, immersive teleconferencing, or virtual and augmented reality, is demanding efficient algorithms and methodologies. All these applications require real-time spatial audio engines with the capability of dealing with intensive signal processing operations while facing a number of constraints related to computational cost, latency and energy consumption. Most mobile multimedia devices include a Graphics Processing Unit (GPU) that is primarily used to accelerate video processing tasks, providing high computational capabilities due to its inherent parallel architecture. This paper describes…
Speech Intelligibility Analysis and Approximation to Room Parameters through the Internet of Things
In recent years, Wireless Acoustic Sensor Networks (WASN) have been widely applied to different acoustic fields in outdoor and indoor environments. Most of these applications are oriented to locate or identify sources and measure specific features of the environment involved. In this paper, we study the application of a WASN for room acoustic measurements. To evaluate the acoustic characteristics, a set of Raspberry Pi 3 (RPi) has been used. One is used to play different acoustic signals and four are used to record at different points in the room simultaneously. The signals are sent wirelessly to a computer connected to a server, where using MATLAB we calculate both the impulse response (IR…
Adaptive Distance-Based Pooling in Convolutional Neural Networks for Audio Event Classification
In the last years, deep convolutional neural networks have become a standard for the development of state-of-the-art audio classification systems, taking the lead over traditional approaches based on feature engineering. While they are capable of achieving human performance under certain scenarios, it has been shown that their accuracy is severely degraded when the systems are tested over noisy or weakly segmented events. Although better generalization could be obtained by increasing the size of the training dataset, e.g. by applying data augmentation techniques, this also leads to longer and more complex training procedures. In this article, we propose a new type of pooling layer aimed at …
Wireless Acoustic Sensor Networks and Applications
Combining Inter-Subject Modeling with a Subject-Based Data Transformation to Improve Affect Recognition from EEG Signals
Existing correlations between features extracted from Electroencephalography (EEG) signals and emotional aspects have motivated the development of a diversity of EEG-based affect detection methods. Both intra-subject and inter-subject approaches have been used in this context. Intra-subject approaches generally suffer from the small sample problem, and require the collection of exhaustive data for each new user before the detection system is usable. On the contrary, inter-subject models do not account for the personality and physiological influence of how the individual is feeling and expressing emotions. In this paper, we analyze both modeling approaches, using three public repositories. T…
Computation of Psycho-Acoustic Annoyance Using Deep Neural Networks
Psycho-acoustic parameters have been extensively used to evaluate the discomfort or pleasure produced by the sounds in our environment. In this context, wireless acoustic sensor networks (WASNs) can be an interesting solution for monitoring subjective annoyance in certain soundscapes, since they can be used to register the evolution of such parameters in time and space. Unfortunately, the calculation of the psycho-acoustic parameters involved in common annoyance models implies a significant computational cost, and makes difficult the acquisition and transmission of these parameters at the nodes. As a result, monitoring psycho-acoustic annoyance becomes an expensive and inefficient task. Thi…
A Bayesian direction-of-arrival model for an undetermined number of sources using a two-microphone array.
Sound source localization using a two-microphone array is an active area of research, with considerable potential for use with video conferencing, mobile devices, and robotics. Based on the observed time-differences of arrival between sound signals, a probability distribution of the location of the sources is considered to estimate the actual source positions. However, these algorithms assume a given number of sound sources. This paper describes an updated research account on the solution presented in Escolano et al. [J. Acoust. Am. Soc. 132(3), 1257-1260 (2012)], where nested sampling is used to explore a probability distribution of the source position using a Laplacian mixture model, whic…
On the Use of a GPU-Accelerated Mobile Device Processor for Sound Source Localization
Abstract The growing interest to incorporate new features into mobile devices has increased the number of signal processing applications running over processors designed for mobile computing. A challenging signal processing field is acoustic source localization, which is attractive for applications such as automatic camera steering systems, human-machine interfaces, video gaming or audio surveillance. In this context, the emergence of systems-on-chip (SoC) that contain a small graphics accelerator (or GPU), contributes a notable increment of the computational capacity while partially retaining the appealing low-power consumption of embedded systems. This is the case, for example, of the Sam…
Performance comparison of container orchestration platforms with low cost devices in the fog, assisting Internet of Things applications
Abstract In the last decade there has been an increasing interest and demand on the Internet of Things (IoT) and its applications. But, when a high level of computing and/or real time processing is required for these applications, different problems arise due to their requirements. In this context, low cost autonomous and distributed Small Board Computers (SBC) devices, with processing, storage capabilities and wireless communications can assist these IoT networks. Usually, these SBC devices run an operating system based on Linux. In this scenario, container-based technologies and fog computing are an interesting approach and both have led to a new paradigm in how devices cooperate, improvi…
Open Set Audio Classification Using Autoencoders Trained on Few Data.
Open-set recognition (OSR) is a challenging machine learning problem that appears when classifiers are faced with test instances from classes not seen during training. It can be summarized as the problem of correctly identifying instances from a known class (seen during training) while rejecting any unknown or unwanted samples (those belonging to unseen classes). Another problem arising in practical scenarios is few-shot learning (FSL), which appears when there is no availability of a large number of positive samples for training a recognition system. Taking these two limitations into account, a new dataset for OSR and FSL for audio data was recently released to promote research on solution…
Enabling Real-Time Computation of Psycho-Acoustic Parameters in Acoustic Sensors Using Convolutional Neural Networks
Sensor networks have become an extremely useful tool for monitoring and analysing many aspects of our daily lives. Noise pollution levels are very important today, especially in cities where the number of inhabitants and disturbing sounds are constantly increasing. Psycho-acoustic parameters are a fundamental tool for assessing the degree of discomfort produced by different sounds and, combined with wireless acoustic sensor networks (WASNs), could enable, for example, the efficient implementation of acoustic discomfort maps within smart cities. However, the continuous monitoring of psycho-acoustic parameters to create time-dependent discomfort maps requires a high computational demand that …
Cumulative-Sum-Based Localization of Sound Events in Low-Cost Wireless Acoustic Sensor Networks
Wireless acoustic sensor networks (WASNs) are known for their potential applications in multiple areas, such as audio-based surveillance, binaural hearing aids or advanced acoustic monitoring. The knowledge of the spatial position of a source of interest is usually a requirement for many of these applications. Therefore, source localization is an important problem to be addressed in WASNs. Unfortunately, most localization algorithms need costly signal processing stages that prevent them from being implemented in low-cost sensor networks, requiring additional modules for signal acquisition and processing. This paper presents a low-complexity method for acoustic event detection and localizati…
Acoustic Scene Classification with Squeeze-Excitation Residual Networks
Acoustic scene classification (ASC) is a problem related to the field of machine listening whose objective is to classify/tag an audio clip in a predefined label describing a scene location (e. g. park, airport, etc.). Many state-of-the-art solutions to ASC incorporate data augmentation techniques and model ensembles. However, considerable improvements can also be achieved only by modifying the architecture of convolutional neural networks (CNNs). In this work we propose two novel squeeze-excitation blocks to improve the accuracy of a CNN-based ASC framework based on residual learning. The main idea of squeeze-excitation blocks is to learn spatial and channel-wise feature maps independently…
A Parallel Approach to HRTF Approximation and Interpolation Based on a Parametric Filter Model
[EN] Spatial audio-rendering techniques using head-related transfer functions (HRTFs) are currently used in many different contexts such as immersive teleconferencing systems, gaming, or 3-D audio reproduction. Since all these applications usually involve real-time constraints, efficient processing structures for HRTF modeling and interpolation are necessary for providing real-time binaural audio solutions. This letter presents a parametric parallel model that allows us to perform HRTF filtering and interpolation efficiently from an input HRTF dataset. The resulting model, which is an adaptation from a recently proposed modeling technique, not only reduces the size of HRTF datasets signific…
Automatic Detection and Characterization of Acoustic Plane-Wave Reflections Using Circular Microphone Arrays
The spatial characteristics of the sound field inside a room can be meaningfully described by means of microphone array processing techniques. In this context, the set of impulse responses sampled by a microphone array can be seen as an image made of acoustic plane-wave footprints. Due to the circular geometry of the microphone array, these footprints have a cosine-like shape that can be fully described as a function of the direction of arrival (DOA) of the impinging plane wave. This paper proposes a Hough-transform-based approach to plane-wave detection in microphone array multi-trace impulse responses. Experiments using a set of real microphone recordings are described, showing the potent…
AI-IoT Platform for Blind Estimation of Room Acoustic Parameters Based on Deep Neural Networks
Room acoustical parameters have been widely used to describe sound perception in indoor environments, such as concert halls, conference rooms, etc. Many of them have been standardized and often have a high computational demand. With the increasing presence of deep learning approaches in automatic monitoring systems, wireless acoustic sensor networks (WASNs) offer great potential to facilitate the estimation of such parameters. In this scenario, Convolutional Neural Networks (CNNs) offer significant reductions in the computational requirements for in-node parameter predictions, enabling the so-called Artificial Intelligence-Internet of Things (AI-IoT). In this paper, we describe the design a…
Game-based learning supported by audience response tools: game proposals and preliminary assessment
The so-called game-based learning strategies are based on introducing games in the classrooms to improve aspects such as student performance, concentration and effort. Currently, they provide a very useful resource to increase the motivation of university students, generating a better atmosphere among peers and between student and teacher, which in turn is generally translated into better academic results. However, the design of games that successfully achieve the desired teaching-learning objectives is not a trivial task. This work focuses on the design of games that allow the assessment of ICT-related university subjects. Specifically, three different games are proposed, all based on stud…
Steered Response Power Localization of Acoustic Passband Signals
The vast majority of localization approaches using phase transform (PHAT) consider that the sources of interest are wideband low-pass sources. While this may be the usual case for common audio signals such as speech, PHAT methods are affected negatively by modulation artifacts when the sources to be localized are passband signals. In these cases, steered response power PHAT localization becomes less robust. This letter analyzes the form of generalized cross-correlation functions with PHAT when passband acoustic signals are considered, proposing approaches for increasing the localization performance through the mitigation of these negative effects.
On the Robustness of Deep Features for Audio Event Classification in Adverse Environments
Deep features, responses to complex input patterns learned within deep neural networks, have recently shown great performance in image recognition tasks, motivating their use for audio analysis tasks as well. These features provide multiple levels of abstraction which permit to select a sufficiently generalized layer to identify classes not seen during training. The generalization capability of such features is very useful due to the lack of complete labeled audio datasets. However, as opposed to classical hand-crafted features such as Mel-frequency cepstral coefficients (MFCCs), the performance impact of having an acoustically adverse environment has not been evaluated in detail. In this p…
A case study on feature sensitivity for audio event classification using support vector machines
Automatic recognition of multiple acoustic events is an interesting problem in machine listening that generalizes the classical speech/non-speech or speech/music classification problem. Typical audio streams contain a diversity of sound events that carry important and useful information on the acoustic environment and context. Classification is usually performed by means of hidden Markov models (HMMs) or support vector machines (SVMs) considering traditional sets of features based on Mel-frequency cepstral coefficients (MFCCs) and their temporal derivatives, as well as the energy from auditory-inspired filterbanks. However, while these features are routinely used by many systems, it is not …
Self-Localization of Distributed Microphone Arrays Using Directional Statistics with DoA Estimation Reliability
This paper addresses the problem of self-localization of distributed microphone arrays from microphone recordings by following a two-step optimization procedure. In the first step, the relative geometry of the sources and arrays is inferred by the proposed maximum likelihood estimator. It is derived under the assumption that the acquired unit-norm vectors pointing towards the unknown source positions follow a von Mises-Fisher distribution in a D-dimensional space. In the second step, the absolute positions and synchronization offsets between the arrays are estimated from the inferred relative geometry by using the Least Squares procedure. To improve the accuracy of the method, we propose as…
Time Difference of Arrival Estimation from Frequency-Sliding Generalized Cross-Correlations Using Convolutional Neural Networks
The interest in deep learning methods for solving traditional signal processing tasks has been steadily growing in the last years. Time delay estimation (TDE) in adverse scenarios is a challenging problem, where classical approaches based on generalized cross-correlations (GCCs) have been widely used for decades. Recently, the frequency-sliding GCC (FS-GCC) was proposed as a novel technique for TDE based on a sub-band analysis of the cross-power spectrum phase, providing a structured two-dimensional representation of the time delay information contained across different frequency bands. Inspired by deep-learning-based image denoising solutions, we propose in this paper the use of convolutio…
Practical considerations for acoustic source localization in the IoT era: Platforms, energy efficiency, and performance
The rapid development of the Internet of Things (IoT) has posed important changes in the way emerging acoustic signal processing applications are conceived. While traditional acoustic processing applications have been developed taking into account high-throughput computing platforms equipped with expensive multichannel audio interfaces, the IoT paradigm is demanding the use of more flexible and energy-efficient systems. In this context, algorithms for source localization and ranging in wireless acoustic sensor networks can be considered an enabling technology for many IoT-based environments, including security, industrial, and health-care applications. This paper is aimed at evaluating impo…
A Robust Wrap Reduction Algorithm for Fringe Projection Profilometry and Applications in Magnetic Resonance Imaging.
In this paper, we present an effective algorithm to reduce the number of wraps in a 2D phase signal provided as input. The technique is based on an accurate estimate of the fundamental frequency of a 2D complex signal with the phase given by the input, and the removal of a dependent additive term from the phase map. Unlike existing methods based on the discrete Fourier transform (DFT), the frequency is computed by using noise-robust estimates that are not restricted to integer values. Then, to deal with the problem of a non-integer shift in the frequency domain, an equivalent operation is carried out on the original phase signal. This consists of the subtraction of a tilted plane whose slop…
Analysis of data fusion techniques for multi-microphone audio event detection in adverse environments
Acoustic event detection (AED) is currently a very active research area with multiple applications in the development of smart acoustic spaces. In this context, the advances brought by Internet of Things (IoT) platforms where multiple distributed microphones are available have also contributed to this interest. In such scenarios, the use of data fusion techniques merging information from several sensors becomes an important aspect in the design of multi-microphone AED systems. In this paper, we present a preliminary analysis of several data-fusion techniques aimed at improving the recognition accuracy of an AED system by taking advantage of the diversity provided by multiple microphones in …
Nonnegative signal factorization with learnt instrument models for sound source separation in close-microphone recordings
Close-microphone techniques are extensively employed in many live music recordings, allowing for interference rejection and reducing the amount of reverberation in the resulting instrument tracks. However, despite the use of directional microphones, the recorded tracks are not completely free from source interference, a problem which is commonly known as microphone leakage. While source separation methods are potentially a solution to this problem, few approaches take into account the huge amount of prior information available in this scenario. In fact, besides the special properties of close-microphone tracks, the knowledge on the number and type of instruments making up the mixture can al…
Fast Channel Estimation in the Transformed Spatial Domain for Analog Millimeter Wave Systems
Fast channel estimation in millimeter-wave (mmWave) systems is a fundamental enabler of high-gain beamforming, which boosts coverage and capacity. The channel estimation stage typically involves an initial beam training process where a subset of the possible beam directions at the transmitter and receiver is scanned along a predefined codebook. Unfortunately, the high number of transmit and receive antennas deployed in mmWave systems increase the complexity of the beam selection and channel estimation tasks. In this work, we tackle the channel estimation problem in analog systems from a different perspective than used by previous works. In particular, we propose to move the channel estimati…
Anomalous Sound Detection using unsupervised and semi-supervised autoencoders and gammatone audio representation
Anomalous sound detection (ASD) is, nowadays, one of the topical subjects in machine listening discipline. Unsupervised detection is attracting a lot of interest due to its immediate applicability in many fields. For example, related to industrial processes, the early detection of malfunctions or damage in machines can mean great savings and an improvement in the efficiency of industrial processes. This problem can be solved with an unsupervised ASD solution since industrial machines will not be damaged simply by having this audio data in the training stage. This paper proposes a novel framework based on convolutional autoencoders (both unsupervised and semi-supervised) and a Gammatone-base…
Combinación de cuestionarios simples y gamificados utilizando gestores de participación en el aula: experiencia y percepción del alumnado
[EN] The growing use of mobile devices has motivated the development of a wide range of applications to help manage the students’ participation in the classroom. Socrative allows the lecturer to use multiple-choice questionnaires in the classroom, either in a simple or a gamified mode (Space Race). In this paper, we describe our experience at using this tool to promote competitive learning, at both undergraduate and post-graduate levels. The student’s perception indicates that the use of the application helped at increasing engagement and motivation. However, relevant differences were found between both modes of use, underlining the importance of an adequate activity design.
Sound Event Envelope Estimation in Polyphonic Mixtures
Sound event detection is the task of identifying automatically the presence and temporal boundaries of sound events within an input audio stream. In the last years, deep learning methods have established themselves as the state-of-the-art approach for the task, using binary indicators during training to denote whether an event is active or inactive. However, such binary activity indicators do not fully describe the events, and estimating the envelope of the sounds could provide more precise modeling of their activity. This paper proposes to estimate the amplitude envelopes of target sound event classes in polyphonic mixtures. For training, we use the amplitude envelopes of the target sounds…
Spatio-Temporal Analysis of Urban Acoustic Environments with Binaural Psycho-Acoustical Considerations for IoT-Based Applications
Sound pleasantness or annoyance perceived in urban soundscapes is a major concern in environmental acoustics. Binaural psychoacoustic parameters are helpful to describe generic acoustic environments, as it is stated within the ISO 12913 framework. In this paper, the application of a Wireless Acoustic Sensor Network (WASN) to evaluate the spatial distribution and the evolution of urban acoustic environments is described. Two experiments are presented using an indoor and an outdoor deployment of a WASN with several nodes using an Internet of Things (IoT) environment to collect audio data and calculate meaningful parameters such as the sound pressure level, binaural loudness and binaural sharp…
Adaptive Mid-Term Representations for Robust Audio Event Classification
Low-level audio features are commonly used in many audio analysis tasks, such as audio scene classification or acoustic event detection. Due to the variable length of audio signals, it is a common approach to create fixed-length feature vectors consisting of a set of statistics that summarize the temporal variability of such short-term features. To avoid the loss of temporal information, the audio event can be divided into a set of mid-term segments or texture windows. However, such an approach requires to estimate accurately the onset and offset times of the audio events in order to obtain a robust mid-term statistical description of their temporal evolution. This paper proposes the use of…
Real-time Sound Source Localization on Graphics Processing Units
Abstract Sound source localization is an important topic in microphone array signal processing applications, such as camera steering systems, human-machine interaction or surveillance systems. The Steered Response Power with Phase Transform (SRP- PHAT) algorithm is one of the most well-known approaches for sound source localization due to its good performance in noisy and reverberant environments. The algorithm analyzes the sound power captured by a microphone array on a grid of spatial points in a given room. While localization accuracy can be improved by using a high resolution spatial grid and a high number of microphones, performing the localization task in these circumstances requires …
Improving Isolation of Blindly Separated Sources Using Time-Frequency Masking
A refinement technique based on time-frequency masking is proposed to improve source isolation in blind audio source separation algorithms. The refinement technique uses an energy-normalized source-to-interference ratio in order to identify and eliminate interfering energy from the extracted sources. Some examples using this refinement method with different separation algorithms are discussed. The results show that source isolation can be significantly enhanced with negligible degradation of the separated sources.
CNN depth analysis with different channel inputs for Acoustic Scene Classification
Acoustic scene classification (ASC) has been approached in the last years using deep learning techniques such as convolutional neural networks or recurrent neural networks. Many state-of-the-art solutions are based on image classification frameworks and, as such, a 2D representation of the audio signal is considered for training these networks. Finding the most suitable audio representation is still a research area of interest. In this paper, different log-Mel representations and combinations are analyzed. Experiments show that the best results are obtained using the harmonic and percussive components plus the difference between left and right stereo channels, (L-R). On the other hand, it i…
On the Design of Probe Signals in Wireless Acoustic Sensor Networks Self-Positioning Algorithms
A wireless acoustic sensor network comprises a distributed group of devices equipped with audio transducers. Typically, these devices can interoperate with each other using wireless links and perform collaborative audio signal processing. Ranging and self-positioning of the network nodes are examples of tasks that can be carried out collaboratively using acoustic signals. However, the environmental conditions can distort the emitted signals and complicate the ranging process. In this context, the selection of proper acoustic signals can facilitate the attainment of this goal and improve the localization accuracy. This letter deals with the design and evaluation of acoustic probe signals all…
Design and Implementation of Acoustic Source Localization on a Low-Cost IoT Edge Platform
The implementation of algorithms for acoustic source localization on edge platforms for the Internet of Things (IoT) is gaining momentum. Applications based on acoustic monitoring can greatly benefit from efficient implementations of such algorithms, enabling novel services for smart homes and buildings or ambient-assisted living. In this context, this brief proposes extreme low-cost sound source localization system composed of two microphones and the low power microcontroller module ESP32. A Direction-Of-Arrival (DOA) algorithm has been implemented taking into account the specific features of this board, showing excellent performance despite the memory constraints imposed by the platform. …
Low-Cost Alternatives for Urban Noise Nuisance Monitoring Using Wireless Sensor Networks
Noise pollution caused by vehicular traffic is a common problem in urban environments that has been shown to affect people's health and children's cognition. In the last decade, several studies have been conducted to assess this noise, by measuring the equivalent noise pressure level (called L eq ) to acquire an accurate sound map using wireless networks with acoustic sensors. However, even with similar values of L eq , people can feel the noise differently according to its frequency characteristics. Thus, indexes, which can express people's feelings by subjective measures, are required. In this paper, we analyze the suitability of using the psychoacoustic metrics given by the Zwicker's mod…