Search results for "speech"

showing 10 items of 1281 documents

Effects of Global and Local Contexts on Harmonic Expectancy

1998

Several psycholinguistic studies have investigated the influence of local and global semantic contexts on word processing. The first aim of the present study was to examine local and global level contributions to harmonic priming. The second was to test a spreading-activation account of harmonic context effects (Bharucha, 1987). The expectations for the last chord (the target) of eight-chord sequences were varied by simultaneously manipulating the harmonic relationship of the target to the first six chords (global context) and to the seventh chord (local context). Human performances demonstrated that harmonic expectancies are derived from both the global and local levels of musical structur…

Expectancy theoryConnectionismContext effectComputer scienceSpeech recognitionWord processingChord (music)SchematicMusicMusical formCognitive psychologyMusic Perception
researchProduct

Genre-adaptive Semantic Computing and Audio-based Modelling for Music Mood Annotation

2016

This study investigates whether taking genre into account is beneficial for automatic music mood annotation in terms of core affects valence, arousal, and tension, as well as several other mood scales. Novel techniques employing genre-adaptive semantic computing and audio-based modelling are proposed. A technique called the ACTwg employs genre-adaptive semantic computing of mood-related social tags, whereas ACTwg-SLPwg combines semantic computing and audio-based modelling, both in a genre-adaptive manner. The proposed techniques are experimentally evaluated at predicting listener ratings related to a set of 600 popular music tracks spanning multiple genres. The results show that ACTwg outpe…

ExploitMusic information retrievalmusic information retrievalcomputer.software_genre050105 experimental psychologyGenre-adaptive.030507 speech-language pathology & audiology03 medical and health sciencesAnnotationPopular musicSemantic computingMusic information retrieval0501 psychology and cognitive sciencesValence (psychology)genre-adaptivesocial tagsta113music genrebusiness.industry05 social sciencesComputingMilieux_PERSONALCOMPUTINGmood predictionMusic moodHuman-Computer InteractionMoodta6131semantic computingArtificial intelligence0305 other medical sciencebusinessPsychologycomputerSoftwareNatural language processing
researchProduct

METALANG. Protocolo franco-español de exploración de habilidades metalingüísticas en niños de 6 a 9 años: un estudio preliminar

2012

International audience; In this work are described the principles of functional-pragmatic and the reasons for developing an exploration protocol for natural metalinguistic abilities. This study explains the base hypothesis and structure of METALANG protocol. This protocol consists in a Test and a Questionnaire for parents with two different scales: A = Ability, B = Frequency. Each element of the protocol has 6 sections and 40 items. It is performed a preliminary contrast with 12 subjects aged 6 to 9 years. Among the 12 subjects, 4 were diagnosed with dysphasia. METALANG shows high scores in reliability and internal consistency. This result confirms the hypothesis that it is possible a joint…

Exploration protocolParadigma pragmático-funcional4. Education030507 speech-language pathology & audiology03 medical and health sciencesSpeech and Hearing0302 clinical medicineMetalinguistic abilitiesDysphasia[SCCO.PSYC]Cognitive science/PsychologyHabilidades metalingüísticas0305 other medical scienceDisfasiaParadigm pragmatic-functional030217 neurology & neurosurgeryProtocolo exploración
researchProduct

Inducing the Lyndon Array

2019

In this paper we propose a variant of the induced suffix sorting algorithm by Nong (TOIS, 2013) that computes simultaneously the Lyndon array and the suffix array of a text in $O(n)$ time using $\sigma + O(1)$ words of working space, where $n$ is the length of the text and $\sigma$ is the alphabet size. Our result improves the previous best space requirement for linear time computation of the Lyndon array. In fact, all the known linear algorithms for Lyndon array computation use suffix sorting as a preprocessing step and use $O(n)$ words of working space in addition to the Lyndon array and suffix array. Experimental results with real and synthetic datasets show that our algorithm is not onl…

FOS: Computer and information sciences050101 languages & linguisticsComputer scienceComputationInduced suffix sorting02 engineering and technologySpace (mathematics)law.inventionSuffix sortinglawSuffix arrayComputer Science - Data Structures and Algorithms0202 electrical engineering electronic engineering information engineeringData_FILESPreprocessorData Structures and Algorithms (cs.DS)0501 psychology and cognitive sciencesComputer Science::Data Structures and AlgorithmsTime complexitySettore ING-INF/05 - Sistemi Di Elaborazione Delle InformazioniSettore INF/01 - Informatica05 social sciencesLightweight algorithmSuffix arraySigmaComputer Science::Computation and Language (Computational Linguistics and Natural Language and Speech Processing)Induced suffix sorting; Lightweight algorithms; Lyndon array; Suffix arrayWorking spaceLyndon arrayLightweight algorithms020201 artificial intelligence & image processingAlgorithmComputer Science::Formal Languages and Automata Theory
researchProduct

Using Hankel matrices for dynamics-based facial emotion recognition and pain detection

2015

This paper proposes a new approach to model the temporal dynamics of a sequence of facial expressions. To this purpose, a sequence of Face Image Descriptors (FID) is regarded as the output of a Linear Time Invariant (LTI) system. The temporal dynamics of such sequence of descriptors are represented by means of a Hankel matrix. The paper presents different strategies to compute dynamics-based representation of a sequence of FID, and reports classification accuracy values of the proposed representations within different standard classification frameworks. The representations have been validated in two very challenging application domains: emotion recognition and pain detection. Experiments on…

FOS: Computer and information sciencesComputer Science - Artificial IntelligenceComputer Vision and Pattern Recognition (cs.CV)Speech recognitionFeature extractionComputer Science - Computer Vision and Pattern RecognitionPainLTI system theoryComputer Science - RoboticsLinear time invariant systemRepresentation (mathematics)Hidden Markov modelMathematicsEmotionSettore ING-INF/05 - Sistemi Di Elaborazione Delle InformazioniSequencebusiness.industryPattern recognitiondynamicsClassificationSupport vector machineArtificial Intelligence (cs.AI)Face (geometry)Artificial intelligencebusinessRobotics (cs.RO)Hankel matrix2015 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW)
researchProduct

ASR performance prediction on unseen broadcast programs using convolutional neural networks

2018

In this paper, we address a relatively new task: prediction of ASR performance on unseen broadcast programs. We first propose an heterogenous French corpus dedicated to this task. Two prediction approaches are compared: a state-of-the-art performance prediction based on regression (engineered features) and a new strategy based on convolutional neural networks (learnt features). We particularly focus on the combination of both textual (ASR transcription) and signal inputs. While the joint use of textual and signal features did not work for the regression baseline, the combination of inputs for CNNs leads to the best WER prediction performance. We also show that our CNN prediction remarkably …

FOS: Computer and information sciencesComputer Science - Computation and LanguageComputer scienceSpeech recognitionFeature extractionInformationSystems_INFORMATIONSTORAGEANDRETRIEVAL02 engineering and technology010501 environmental sciences01 natural sciencesConvolutional neural network[INFO.INFO-CL]Computer Science [cs]/Computation and Language [cs.CL]Task (project management)[INFO.INFO-CL] Computer Science [cs]/Computation and Language [cs.CL]0202 electrical engineering electronic engineering information engineeringTask analysisPerformance prediction020201 artificial intelligence & image processingMel-frequency cepstrumTranscription (software)Hidden Markov modelComputation and Language (cs.CL)ComputingMilieux_MISCELLANEOUS0105 earth and related environmental sciences
researchProduct

Analyzing Learned Representations of a Deep ASR Performance Prediction Model

2018

This paper addresses a relatively new task: prediction of ASR performance on unseen broadcast programs. In a previous paper, we presented an ASR performance prediction system using CNNs that encode both text (ASR transcript) and speech, in order to predict word error rate. This work is dedicated to the analysis of speech signal embeddings and text embeddings learnt by the CNN while training our prediction model. We try to better understand which information is captured by the deep model and its relation with different conditioning factors. It is shown that hidden layers convey a clear signal about speech style, accent and broadcast type. We then try to leverage these 3 types of information …

FOS: Computer and information sciencesComputer Science - Computation and LanguageComputer scienceSpeech recognitionWord error rate02 engineering and technology010501 environmental sciences01 natural sciences[INFO.INFO-CL]Computer Science [cs]/Computation and Language [cs.CL][INFO.INFO-CL] Computer Science [cs]/Computation and Language [cs.CL]0202 electrical engineering electronic engineering information engineeringPerformance predictionLeverage (statistics)020201 artificial intelligence & image processingComputation and Language (cs.CL)0105 earth and related environmental sciences
researchProduct

Towards the evaluation of automatic simultaneous speech translation from a communicative perspective

2021

In recent years, automatic speech-to-speech and speech-to-text translation has gained momentum thanks to advances in artificial intelligence, especially in the domains of speech recognition and machine translation. The quality of such applications is commonly tested with automatic metrics, such as BLEU, primarily with the goal of assessing improvements of releases or in the context of evaluation campaigns. However, little is known about how the output of such systems is perceived by end users or how they compare to human performances in similar communicative tasks. In this paper, we present the results of an experiment aimed at evaluating the quality of a real-time speech translation engine…

FOS: Computer and information sciencesComputer Science - Computation and LanguageMachine translationEnd userComputer sciencebusiness.industrymedia_common.quotation_subjectSample (statistics)Context (language use)Intelligibility (communication)computer.software_genreSpeech translationQuality (business)Artificial intelligencebusinessComputation and Language (cs.CL)computerInterpreterNatural language processingmedia_commonProceedings of the 18th International Conference on Spoken Language Translation (IWSLT 2021)
researchProduct

Environment Sound Classification using Multiple Feature Channels and Attention based Deep Convolutional Neural Network

2020

In this paper, we propose a model for the Environment Sound Classification Task (ESC) that consists of multiple feature channels given as input to a Deep Convolutional Neural Network (CNN) with Attention mechanism. The novelty of the paper lies in using multiple feature channels consisting of Mel-Frequency Cepstral Coefficients (MFCC), Gammatone Frequency Cepstral Coefficients (GFCC), the Constant Q-transform (CQT) and Chromagram. Such multiple features have never been used before for signal or audio processing. And, we employ a deeper CNN (DCNN) compared to previous models, consisting of spatially separable convolutions working on time and feature domain separately. Alongside, we use atten…

FOS: Computer and information sciencesComputer Science - Machine LearningSound (cs.SD)Computer science020209 energyMachine Learning (stat.ML)02 engineering and technologycomputer.software_genreConvolutional neural networkComputer Science - SoundDomain (software engineering)Machine Learning (cs.LG)Statistics - Machine LearningAudio and Speech Processing (eess.AS)0202 electrical engineering electronic engineering information engineeringFOS: Electrical engineering electronic engineering information engineeringAudio signal processingVDP::Teknologi: 500::Informasjons- og kommunikasjonsteknologi: 550business.industrySIGNAL (programming language)Pattern recognitionFeature (computer vision)Benchmark (computing)020201 artificial intelligence & image processingArtificial intelligenceMel-frequency cepstrumbusinesscomputerElectrical Engineering and Systems Science - Audio and Speech ProcessingCommunication channel
researchProduct

An Open-set Recognition and Few-Shot Learning Dataset for Audio Event Classification in Domestic Environments

2020

The problem of training with a small set of positive samples is known as few-shot learning (FSL). It is widely known that traditional deep learning (DL) algorithms usually show very good performance when trained with large datasets. However, in many applications, it is not possible to obtain such a high number of samples. In the image domain, typical FSL applications include those related to face recognition. In the audio domain, music fraud or speaker recognition can be clearly benefited from FSL methods. This paper deals with the application of FSL to the detection of specific and intentional acoustic events given by different types of sound alarms, such as door bells or fire alarms, usin…

FOS: Computer and information sciencesComputer Science - Machine LearningSound (cs.SD)sound processingaudio datasetmachine listeningUNESCO::CIENCIAS TECNOLÓGICASComputer Science - SoundMachine Learning (cs.LG)classificationArtificial IntelligenceAudio and Speech Processing (eess.AS)Signal ProcessingFOS: Electrical engineering electronic engineering information engineeringfew-shot learningopen-set recognitionComputer Vision and Pattern RecognitionSoftwareElectrical Engineering and Systems Science - Audio and Speech Processing
researchProduct