Search results for " Speech"

showing 10 items of 519 documents

On the performance of residual block design alternatives in convolutional neural networks for end-to-end audio classification

2019

Residual learning is a recently proposed learning framework to facilitate the training of very deep neural networks. Residual blocks or units are made of a set of stacked layers, where the inputs are added back to their outputs with the aim of creating identity mappings. In practice, such identity mappings are accomplished by means of the so-called skip or residual connections. However, multiple implementation alternatives arise with respect to where such skip connections are applied within the set of stacked layers that make up a residual block. While ResNet architectures for image classification using convolutional neural networks (CNNs) have been widely discussed in the literature, few w…

FOS: Computer and information sciencesSound (cs.SD)Computer Science - Machine LearningAudio and Speech Processing (eess.AS)FOS: Electrical engineering electronic engineering information engineeringComputer Science - SoundMachine Learning (cs.LG)Electrical Engineering and Systems Science - Audio and Speech Processing
researchProduct

Anomalous Sound Detection using unsupervised and semi-supervised autoencoders and gammatone audio representation

2020

Anomalous sound detection (ASD) is, nowadays, one of the topical subjects in machine listening discipline. Unsupervised detection is attracting a lot of interest due to its immediate applicability in many fields. For example, related to industrial processes, the early detection of malfunctions or damage in machines can mean great savings and an improvement in the efficiency of industrial processes. This problem can be solved with an unsupervised ASD solution since industrial machines will not be damaged simply by having this audio data in the training stage. This paper proposes a novel framework based on convolutional autoencoders (both unsupervised and semi-supervised) and a Gammatone-base…

FOS: Computer and information sciencesSound (cs.SD)Computer Science - Machine LearningAudio and Speech Processing (eess.AS)FOS: Electrical engineering electronic engineering information engineeringComputer Science - SoundMachine Learning (cs.LG)Electrical Engineering and Systems Science - Audio and Speech Processing
researchProduct

CNN depth analysis with different channel inputs for Acoustic Scene Classification

2019

Acoustic scene classification (ASC) has been approached in the last years using deep learning techniques such as convolutional neural networks or recurrent neural networks. Many state-of-the-art solutions are based on image classification frameworks and, as such, a 2D representation of the audio signal is considered for training these networks. Finding the most suitable audio representation is still a research area of interest. In this paper, different log-Mel representations and combinations are analyzed. Experiments show that the best results are obtained using the harmonic and percussive components plus the difference between left and right stereo channels, (L-R). On the other hand, it i…

FOS: Computer and information sciencesSound (cs.SD)Computer Science - Machine LearningAudio and Speech Processing (eess.AS)FOS: Electrical engineering electronic engineering information engineeringComputer Science - SoundMachine Learning (cs.LG)Electrical Engineering and Systems Science - Audio and Speech Processing
researchProduct

Acoustic Scene Classification with Squeeze-Excitation Residual Networks

2020

Acoustic scene classification (ASC) is a problem related to the field of machine listening whose objective is to classify/tag an audio clip in a predefined label describing a scene location (e. g. park, airport, etc.). Many state-of-the-art solutions to ASC incorporate data augmentation techniques and model ensembles. However, considerable improvements can also be achieved only by modifying the architecture of convolutional neural networks (CNNs). In this work we propose two novel squeeze-excitation blocks to improve the accuracy of a CNN-based ASC framework based on residual learning. The main idea of squeeze-excitation blocks is to learn spatial and channel-wise feature maps independently…

FOS: Computer and information sciencesSound (cs.SD)Computer Science - Machine LearningGeneral Computer ScienceCalibration (statistics)Computer scienceResidualConvolutional neural networkField (computer science)Computer Science - SoundMachine Learning (cs.LG)030507 speech-language pathology & audiology03 medical and health sciencesAudio and Speech Processing (eess.AS)Acoustic scene classificationFeature (machine learning)FOS: Electrical engineering electronic engineering information engineeringGeneral Materials ScienceBlock (data storage)Artificial neural networkbusiness.industrypattern recognitionGeneral Engineeringdeep learningPattern recognitionmachine listeningsqueeze-excitationArtificial intelligencelcsh:Electrical engineering. Electronics. Nuclear engineering0305 other medical sciencebusinesslcsh:TK1-9971Electrical Engineering and Systems Science - Audio and Speech Processing
researchProduct

A quantum vocal theory of sound

2020

Concepts and formalism from acoustics are often used to exemplify quantum mechanics. Conversely, quantum mechanics could be used to achieve a new perspective on acoustics, as shown by Gabor studies. Here, we focus in particular on the study of human voice, considered as a probe to investigate the world of sounds. We present a theoretical framework that is based on observables of vocal production, and on some measurement apparati that can be used both for analysis and synthesis. In analogy to the description of spin states of a particle, the quantum-mechanical formalism is used to describe the relations between the fundamental states associated with phonetic labels such as phonation, turbule…

FOS: Computer and information sciencesSound (cs.SD)Computer scienceAudio processingAnalogyAudio processing; Quantum-inspired algorithms; Sound representation01 natural sciencesComputer Science - Sound050105 experimental psychologyTheoretical Computer Sciencesymbols.namesakeAudio and Speech Processing (eess.AS)0103 physical sciencesFOS: Electrical engineering electronic engineering information engineering0501 psychology and cognitive sciencesPhonationElectrical and Electronic Engineering010306 general physicsQuantumHuman voiceQuantum computerSound representationSettore INF/01 - Informatica05 social sciencesStatistical and Nonlinear PhysicsObservableSettore MAT/04 - Matematiche ComplementariElectronic Optical and Magnetic MaterialsVibrationClassical mechanicsFourier transformComputer Science::SoundModeling and SimulationSignal ProcessingsymbolsQuantum-inspired algorithms Audio processing Sound representationQuantum-inspired algorithmsSettore ING-INF/05 - Sistemi di Elaborazione delle InformazioniElectrical Engineering and Systems Science - Audio and Speech Processing
researchProduct

Time Difference of Arrival Estimation from Frequency-Sliding Generalized Cross-Correlations Using Convolutional Neural Networks

2020

The interest in deep learning methods for solving traditional signal processing tasks has been steadily growing in the last years. Time delay estimation (TDE) in adverse scenarios is a challenging problem, where classical approaches based on generalized cross-correlations (GCCs) have been widely used for decades. Recently, the frequency-sliding GCC (FS-GCC) was proposed as a novel technique for TDE based on a sub-band analysis of the cross-power spectrum phase, providing a structured two-dimensional representation of the time delay information contained across different frequency bands. Inspired by deep-learning-based image denoising solutions, we propose in this paper the use of convolutio…

FOS: Computer and information sciencesSound (cs.SD)Computer sciencePhase (waves)Distributed microphones02 engineering and technologyConvolutional neural networkComputer Science - Sound030507 speech-language pathology & audiology03 medical and health sciencesAudio and Speech Processing (eess.AS)FOS: Electrical engineering electronic engineering information engineering0202 electrical engineering electronic engineering information engineeringGCCRepresentation (mathematics)Signal processingbusiness.industryI.5.4Deep learningConvolutional Neural Networks020206 networking & telecommunicationsTime delay estimationMultilaterationI.2.094A12 68T10LocalizationArtificial intelligence0305 other medical sciencebusinessAlgorithmElectrical Engineering and Systems Science - Audio and Speech ProcessingI.2.0; I.5.4ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)
researchProduct

Low-Power Audio Keyword Spotting using Tsetlin Machines

2021

The emergence of Artificial Intelligence (AI) driven Keyword Spotting (KWS) technologies has revolutionized human to machine interaction. Yet, the challenge of end-to-end energy efficiency, memory footprint and system complexity of current Neural Network (NN) powered AI-KWS pipelines has remained ever present. This paper evaluates KWS utilizing a learning automata powered machine learning algorithm called the Tsetlin Machine (TM). Through significant reduction in parameter requirements and choosing logic over arithmetic based processing, the TM offers new opportunities for low-power KWS while maintaining high learning efficacy. In this paper we explore a TM based keyword spotting (KWS) pipe…

FOS: Computer and information sciencesspeech commandSound (cs.SD)Computer scienceSpeech recognition02 engineering and technologykeyword spottingMachine learningcomputer.software_genreComputer Science - SoundReduction (complexity)Audio and Speech Processing (eess.AS)020204 information systemsFOS: Electrical engineering electronic engineering information engineering0202 electrical engineering electronic engineering information engineeringElectrical and Electronic EngineeringArtificial neural networkLearning automatabusiness.industrylearning automatalcsh:Applications of electric power020206 networking & telecommunicationslcsh:TK4001-4102Pipeline (software)Power (physics)machine learningTsetlin MachineMFCCKeyword spottingelectrical_electronic_engineeringScalabilityMemory footprintpervasive AI020201 artificial intelligence & image processingMel-frequency cepstrumArtificial intelligencebusinesscomputerartificial neural networkEfficient energy useElectrical Engineering and Systems Science - Audio and Speech Processing
researchProduct

Hate Speech e messaggi discriminatori: riflessioni internazionalistiche sul caso Casapound c. Facebook

2021

In this intervention the idea is submitted that the interim measures in the case Casapound v. Facebook, by means of which Fb Ireland was ordered by the Rome Tribunal (on December 12, 2019) to reactivate the profile of the extreme right wing political association Casapound - previously closed as a consequence of an alleged violation of fb standards concerning hate speech - can be criticized under international and european law. Such criticisms stem both from the overall picture of the international and european legal standards concerning hate speech (par. 2) and from the relevant case law of the European Court of Human rights (paras. 3 and 4).

Facebook standards hate speech international and european law european court of human rightsSettore IUS/14 - Diritto Dell'Unione EuropeaSettore IUS/13 - Diritto Internazionale
researchProduct

Adaptive Distance-Based Pooling in Convolutional Neural Networks for Audio Event Classification

2020

In the last years, deep convolutional neural networks have become a standard for the development of state-of-the-art audio classification systems, taking the lead over traditional approaches based on feature engineering. While they are capable of achieving human performance under certain scenarios, it has been shown that their accuracy is severely degraded when the systems are tested over noisy or weakly segmented events. Although better generalization could be obtained by increasing the size of the training dataset, e.g. by applying data augmentation techniques, this also leads to longer and more complex training procedures. In this article, we propose a new type of pooling layer aimed at …

Feature engineeringAcoustics and Ultrasonicsbusiness.industryComputer scienceFeature vectorFeature extractionPoolingPattern recognitionConvolutional neural network030507 speech-language pathology & audiology03 medical and health sciencesComputational MathematicsTransformation (function)Feature (computer vision)Adaptive systemComputer Science (miscellaneous)Artificial intelligenceElectrical and Electronic Engineering0305 other medical sciencebusinessIEEE/ACM Transactions on Audio, Speech, and Language Processing
researchProduct

RECOGNIZABLE PICTURE LANGUAGES

1992

The purpose of this paper is to propose a new notion of recognizability for picture (two-dimensional) languages extending the characterization of one-dimensional recognizable languages in terms of local languages and alphabetic mappings. We first introduce the family of local picture languages (denoted by LOC) and, in particular, prove the undecidability of the emptiness problem. Then we define the new family of recognizable picture languages (denoted by REC). We study some combinatorial and language theoretic properties of REC such as ambiguity, closure properties or undecidability results. Finally we compare the family REC with the classical families of languages recognized by four-way a…

Finite-state machinebusiness.industrymedia_common.quotation_subjectClosure (topology)Abstract family of languagesComputer Science::Computation and Language (Computational Linguistics and Natural Language and Speech Processing)AmbiguityOntology languageCone (formal languages)DecidabilityPhilosophy of languageTheoryofComputation_MATHEMATICALLOGICANDFORMALLANGUAGESArtificial IntelligenceComputer Science::Programming LanguagesComputer Vision and Pattern RecognitionArtificial intelligencebusinessComputer Science::Formal Languages and Automata TheorySoftwareMathematicsmedia_commonInternational Journal of Pattern Recognition and Artificial Intelligence
researchProduct