Search results for " processing"

showing 10 items of 7549 documents

On the performance of residual block design alternatives in convolutional neural networks for end-to-end audio classification

2019

Residual learning is a recently proposed learning framework to facilitate the training of very deep neural networks. Residual blocks or units are made of a set of stacked layers, where the inputs are added back to their outputs with the aim of creating identity mappings. In practice, such identity mappings are accomplished by means of the so-called skip or residual connections. However, multiple implementation alternatives arise with respect to where such skip connections are applied within the set of stacked layers that make up a residual block. While ResNet architectures for image classification using convolutional neural networks (CNNs) have been widely discussed in the literature, few w…

FOS: Computer and information sciencesSound (cs.SD)Computer Science - Machine LearningAudio and Speech Processing (eess.AS)FOS: Electrical engineering electronic engineering information engineeringComputer Science - SoundMachine Learning (cs.LG)Electrical Engineering and Systems Science - Audio and Speech Processing
researchProduct

Anomalous Sound Detection using unsupervised and semi-supervised autoencoders and gammatone audio representation

2020

Anomalous sound detection (ASD) is, nowadays, one of the topical subjects in machine listening discipline. Unsupervised detection is attracting a lot of interest due to its immediate applicability in many fields. For example, related to industrial processes, the early detection of malfunctions or damage in machines can mean great savings and an improvement in the efficiency of industrial processes. This problem can be solved with an unsupervised ASD solution since industrial machines will not be damaged simply by having this audio data in the training stage. This paper proposes a novel framework based on convolutional autoencoders (both unsupervised and semi-supervised) and a Gammatone-base…

FOS: Computer and information sciencesSound (cs.SD)Computer Science - Machine LearningAudio and Speech Processing (eess.AS)FOS: Electrical engineering electronic engineering information engineeringComputer Science - SoundMachine Learning (cs.LG)Electrical Engineering and Systems Science - Audio and Speech Processing
researchProduct

CNN depth analysis with different channel inputs for Acoustic Scene Classification

2019

Acoustic scene classification (ASC) has been approached in the last years using deep learning techniques such as convolutional neural networks or recurrent neural networks. Many state-of-the-art solutions are based on image classification frameworks and, as such, a 2D representation of the audio signal is considered for training these networks. Finding the most suitable audio representation is still a research area of interest. In this paper, different log-Mel representations and combinations are analyzed. Experiments show that the best results are obtained using the harmonic and percussive components plus the difference between left and right stereo channels, (L-R). On the other hand, it i…

FOS: Computer and information sciencesSound (cs.SD)Computer Science - Machine LearningAudio and Speech Processing (eess.AS)FOS: Electrical engineering electronic engineering information engineeringComputer Science - SoundMachine Learning (cs.LG)Electrical Engineering and Systems Science - Audio and Speech Processing
researchProduct

Acoustic Scene Classification with Squeeze-Excitation Residual Networks

2020

Acoustic scene classification (ASC) is a problem related to the field of machine listening whose objective is to classify/tag an audio clip in a predefined label describing a scene location (e. g. park, airport, etc.). Many state-of-the-art solutions to ASC incorporate data augmentation techniques and model ensembles. However, considerable improvements can also be achieved only by modifying the architecture of convolutional neural networks (CNNs). In this work we propose two novel squeeze-excitation blocks to improve the accuracy of a CNN-based ASC framework based on residual learning. The main idea of squeeze-excitation blocks is to learn spatial and channel-wise feature maps independently…

FOS: Computer and information sciencesSound (cs.SD)Computer Science - Machine LearningGeneral Computer ScienceCalibration (statistics)Computer scienceResidualConvolutional neural networkField (computer science)Computer Science - SoundMachine Learning (cs.LG)030507 speech-language pathology & audiology03 medical and health sciencesAudio and Speech Processing (eess.AS)Acoustic scene classificationFeature (machine learning)FOS: Electrical engineering electronic engineering information engineeringGeneral Materials ScienceBlock (data storage)Artificial neural networkbusiness.industrypattern recognitionGeneral Engineeringdeep learningPattern recognitionmachine listeningsqueeze-excitationArtificial intelligencelcsh:Electrical engineering. Electronics. Nuclear engineering0305 other medical sciencebusinesslcsh:TK1-9971Electrical Engineering and Systems Science - Audio and Speech Processing
researchProduct

A quantum vocal theory of sound

2020

Concepts and formalism from acoustics are often used to exemplify quantum mechanics. Conversely, quantum mechanics could be used to achieve a new perspective on acoustics, as shown by Gabor studies. Here, we focus in particular on the study of human voice, considered as a probe to investigate the world of sounds. We present a theoretical framework that is based on observables of vocal production, and on some measurement apparati that can be used both for analysis and synthesis. In analogy to the description of spin states of a particle, the quantum-mechanical formalism is used to describe the relations between the fundamental states associated with phonetic labels such as phonation, turbule…

FOS: Computer and information sciencesSound (cs.SD)Computer scienceAudio processingAnalogyAudio processing; Quantum-inspired algorithms; Sound representation01 natural sciencesComputer Science - Sound050105 experimental psychologyTheoretical Computer Sciencesymbols.namesakeAudio and Speech Processing (eess.AS)0103 physical sciencesFOS: Electrical engineering electronic engineering information engineering0501 psychology and cognitive sciencesPhonationElectrical and Electronic Engineering010306 general physicsQuantumHuman voiceQuantum computerSound representationSettore INF/01 - Informatica05 social sciencesStatistical and Nonlinear PhysicsObservableSettore MAT/04 - Matematiche ComplementariElectronic Optical and Magnetic MaterialsVibrationClassical mechanicsFourier transformComputer Science::SoundModeling and SimulationSignal ProcessingsymbolsQuantum-inspired algorithms Audio processing Sound representationQuantum-inspired algorithmsSettore ING-INF/05 - Sistemi di Elaborazione delle InformazioniElectrical Engineering and Systems Science - Audio and Speech Processing
researchProduct

Time Difference of Arrival Estimation from Frequency-Sliding Generalized Cross-Correlations Using Convolutional Neural Networks

2020

The interest in deep learning methods for solving traditional signal processing tasks has been steadily growing in the last years. Time delay estimation (TDE) in adverse scenarios is a challenging problem, where classical approaches based on generalized cross-correlations (GCCs) have been widely used for decades. Recently, the frequency-sliding GCC (FS-GCC) was proposed as a novel technique for TDE based on a sub-band analysis of the cross-power spectrum phase, providing a structured two-dimensional representation of the time delay information contained across different frequency bands. Inspired by deep-learning-based image denoising solutions, we propose in this paper the use of convolutio…

FOS: Computer and information sciencesSound (cs.SD)Computer sciencePhase (waves)Distributed microphones02 engineering and technologyConvolutional neural networkComputer Science - Sound030507 speech-language pathology & audiology03 medical and health sciencesAudio and Speech Processing (eess.AS)FOS: Electrical engineering electronic engineering information engineering0202 electrical engineering electronic engineering information engineeringGCCRepresentation (mathematics)Signal processingbusiness.industryI.5.4Deep learningConvolutional Neural Networks020206 networking & telecommunicationsTime delay estimationMultilaterationI.2.094A12 68T10LocalizationArtificial intelligence0305 other medical sciencebusinessAlgorithmElectrical Engineering and Systems Science - Audio and Speech ProcessingI.2.0; I.5.4ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)
researchProduct

A Freely Available Morphological Analyzer, Disambiguator and Context Sensitive Lemmatizer for German

1998

In this paper we present Morphy, an integrated tool for German morphology, part-of-speech tagging and context-sensitive lemmatization. Its large lexicon of more than 320,000 word forms plus its ability to process German compound nouns guarantee a wide morphological coverage. Syntactic ambiguities can be resolved with a standard statistical part-of-speech tagger. By using the output of the tagger, the lemmatizer can determine the correct root even for ambiguous word forms. The complete package is freely available and can be downloaded from the World Wide Web.

FOS: Computer and information sciencesSpectrum analyzerRoot (linguistics)Morphology (linguistics)Computer Science - Computation and LanguageComputer sciencebusiness.industryLemmatisationContext (language use)computer.software_genreLexiconSyntaxlanguage.human_languageGermanH.3.4NounlanguageArtificial intelligencebusinesscomputerComputation and Language (cs.CL)Natural language processingWord (computer architecture)
researchProduct

Binary jumbled string matching for highly run-length compressible texts

2012

The Binary Jumbled String Matching problem is defined as: Given a string $s$ over $\{a,b\}$ of length $n$ and a query $(x,y)$, with $x,y$ non-negative integers, decide whether $s$ has a substring $t$ with exactly $x$ $a$'s and $y$ $b$'s. Previous solutions created an index of size O(n) in a pre-processing step, which was then used to answer queries in constant time. The fastest algorithms for construction of this index have running time $O(n^2/\log n)$ [Burcsi et al., FUN 2010; Moosa and Rahman, IPL 2010], or $O(n^2/\log^2 n)$ in the word-RAM model [Moosa and Rahman, JDA 2012]. We propose an index constructed directly from the run-length encoding of $s$. The construction time of our index i…

FOS: Computer and information sciencesString algorithmsStructure (category theory)Binary numberG.2.1Data_CODINGANDINFORMATIONTHEORY0102 computer and information sciences02 engineering and technologyString searching algorithm01 natural sciencesComputer Science - Information RetrievalTheoretical Computer ScienceCombinatoricsdata structuresSimple (abstract algebra)Computer Science - Data Structures and AlgorithmsString algorithms; jumbled pattern matching; prefix normal form; data structures0202 electrical engineering electronic engineering information engineeringParikh vectorData Structures and Algorithms (cs.DS)Run-length encodingMathematics68W32 68P05 68P20String (computer science)prefix normal formSubstringComputer Science Applicationsjumbled pattern matching010201 computation theory & mathematicsData structureSignal ProcessingRun-length encoding020201 artificial intelligence & image processingConstant (mathematics)Information Retrieval (cs.IR)Information SystemsInformation Processing Letters
researchProduct

Exact affine counter automata

2017

We introduce an affine generalization of counter automata, and analyze their ability as well as affine finite automata. Our contributions are as follows. We show that there is a language that can be recognized by exact realtime affine counter automata but by neither 1-way deterministic pushdown automata nor realtime deterministic k-counter automata. We also show that a certain promise problem, which is conjectured not to be solved by two-way quantum finite automata in polynomial time, can be solved by Las Vegas affine finite automata. Lastly, we show that how a counter helps for affine finite automata by showing that the language MANYTWINS, which is conjectured not to be recognized by affin…

FOS: Computer and information sciencesTheoryofComputation_COMPUTATIONBYABSTRACTDEVICESautomataFormal Languages and Automata Theory (cs.FL)GeneralizationComputer scienceFOS: Physical sciencesComputer Science - Formal Languages and Automata Theorycounter automataМатематика0102 computer and information sciences02 engineering and technologyComputational Complexity (cs.CC)01 natural sciencesquantum computinglcsh:QA75.5-76.95Deterministic pushdown automatonComputer Science (miscellaneous)0202 electrical engineering electronic engineering information engineeringQuantum finite automataPromise problemTime complexityDiscrete mathematicsQuantum Physicscomputational complexityFinite-state machinelcsh:MathematicsИнформатикаpushdown automatalcsh:QA1-939Nonlinear Sciences::Cellular Automata and Lattice GasesКибернетикаAutomatonComputer Science - Computational ComplexityTheoryofComputation_MATHEMATICALLOGICANDFORMALLANGUAGES010201 computation theory & mathematics020201 artificial intelligence & image processinglcsh:Electronic computers. Computer scienceAffine transformationaffine computingQuantum Physics (quant-ph)Computer Science::Formal Languages and Automata Theory
researchProduct

Knowledge Base Approach for 3D Objects Detection in Point Clouds Using 3D Processing and Specialists Knowledge

2012

This paper presents a knowledge-based detection of objects approach using the OWL ontology language, the Semantic Web Rule Language, and 3D processing built-ins aiming at combining geometrical analysis of 3D point clouds and specialist's knowledge. Here, we share our experience regarding the creation of 3D semantic facility model out of unorganized 3D point clouds. Thus, a knowledge-based detection approach of objects using the OWL ontology language is presented. This knowledge is used to define SWRL detection rules. In addition, the combination of 3D processing built-ins and topological Built-Ins in SWRL rules allows a more flexible and intelligent detection, and the annotation of objects …

FOS: Computer and information sciencesTopologic analysisComputer Science - Artificial IntelligenceSemantic facility information modelGeometric analysis[INFO.INFO-AI]Computer Science [cs]/Artificial Intelligence [cs.AI]Artificial Intelligence (cs.AI)3D processing algorithmSemantic VRML modelknowledge modelingontology3D scene reconstructionobject identification[ INFO.INFO-AI ] Computer Science [cs]/Artificial Intelligence [cs.AI]Semantic web
researchProduct