Search results for "speech recognition"

showing 10 items of 357 documents

ASR performance prediction on unseen broadcast programs using convolutional neural networks

2018

In this paper, we address a relatively new task: prediction of ASR performance on unseen broadcast programs. We first propose an heterogenous French corpus dedicated to this task. Two prediction approaches are compared: a state-of-the-art performance prediction based on regression (engineered features) and a new strategy based on convolutional neural networks (learnt features). We particularly focus on the combination of both textual (ASR transcription) and signal inputs. While the joint use of textual and signal features did not work for the regression baseline, the combination of inputs for CNNs leads to the best WER prediction performance. We also show that our CNN prediction remarkably …

FOS: Computer and information sciencesComputer Science - Computation and LanguageComputer scienceSpeech recognitionFeature extractionInformationSystems_INFORMATIONSTORAGEANDRETRIEVAL02 engineering and technology010501 environmental sciences01 natural sciencesConvolutional neural network[INFO.INFO-CL]Computer Science [cs]/Computation and Language [cs.CL]Task (project management)[INFO.INFO-CL] Computer Science [cs]/Computation and Language [cs.CL]0202 electrical engineering electronic engineering information engineeringTask analysisPerformance prediction020201 artificial intelligence & image processingMel-frequency cepstrumTranscription (software)Hidden Markov modelComputation and Language (cs.CL)ComputingMilieux_MISCELLANEOUS0105 earth and related environmental sciences

researchProduct

Analyzing Learned Representations of a Deep ASR Performance Prediction Model

2018

This paper addresses a relatively new task: prediction of ASR performance on unseen broadcast programs. In a previous paper, we presented an ASR performance prediction system using CNNs that encode both text (ASR transcript) and speech, in order to predict word error rate. This work is dedicated to the analysis of speech signal embeddings and text embeddings learnt by the CNN while training our prediction model. We try to better understand which information is captured by the deep model and its relation with different conditioning factors. It is shown that hidden layers convey a clear signal about speech style, accent and broadcast type. We then try to leverage these 3 types of information …

FOS: Computer and information sciencesComputer Science - Computation and LanguageComputer scienceSpeech recognitionWord error rate02 engineering and technology010501 environmental sciences01 natural sciences[INFO.INFO-CL]Computer Science [cs]/Computation and Language [cs.CL][INFO.INFO-CL] Computer Science [cs]/Computation and Language [cs.CL]0202 electrical engineering electronic engineering information engineeringPerformance predictionLeverage (statistics)020201 artificial intelligence & image processingComputation and Language (cs.CL)0105 earth and related environmental sciences

researchProduct

Low-Power Audio Keyword Spotting using Tsetlin Machines

2021

The emergence of Artificial Intelligence (AI) driven Keyword Spotting (KWS) technologies has revolutionized human to machine interaction. Yet, the challenge of end-to-end energy efficiency, memory footprint and system complexity of current Neural Network (NN) powered AI-KWS pipelines has remained ever present. This paper evaluates KWS utilizing a learning automata powered machine learning algorithm called the Tsetlin Machine (TM). Through significant reduction in parameter requirements and choosing logic over arithmetic based processing, the TM offers new opportunities for low-power KWS while maintaining high learning efficacy. In this paper we explore a TM based keyword spotting (KWS) pipe…

FOS: Computer and information sciencesspeech commandSound (cs.SD)Computer scienceSpeech recognition02 engineering and technologykeyword spottingMachine learningcomputer.software_genreComputer Science - SoundReduction (complexity)Audio and Speech Processing (eess.AS)020204 information systemsFOS: Electrical engineering electronic engineering information engineering0202 electrical engineering electronic engineering information engineeringElectrical and Electronic EngineeringArtificial neural networkLearning automatabusiness.industrylearning automatalcsh:Applications of electric power020206 networking & telecommunicationslcsh:TK4001-4102Pipeline (software)Power (physics)machine learningTsetlin MachineMFCCKeyword spottingelectrical_electronic_engineeringScalabilityMemory footprintpervasive AI020201 artificial intelligence & image processingMel-frequency cepstrumArtificial intelligencebusinesscomputerartificial neural networkEfficient energy useElectrical Engineering and Systems Science - Audio and Speech Processing

researchProduct

Remote heart rate variability for emotional state monitoring

2018

International audience; Several researches have been conducted to recognize emotions using various modalities such as facial expressions , gestures, speech or physiological signals. Among all these modalities, physiological signals are especially interesting because they are mainly controlled by the autonomic nervous system. It has been shown for example that there is an undeniable relationship between emotional state and Heart Rate Variability (HRV). In this paper, we present a methodology to monitor emotional state from physiological signals acquired remotely. The method is based on a remote photoplethysmography (rPPG) algorithm that estimates remote Heart Rate Variability (rHRV) using a …

Facial expressionModalities[ INFO.INFO-TS ] Computer Science [cs]/Signal and Image Processing[INFO.INFO-TS] Computer Science [cs]/Signal and Image ProcessingComputer scienceSpeech recognition020208 electrical & electronic engineering0206 medical engineering[INFO.INFO-CV]Computer Science [cs]/Computer Vision and Pattern Recognition [cs.CV]02 engineering and technology[ INFO.INFO-CV ] Computer Science [cs]/Computer Vision and Pattern Recognition [cs.CV]020601 biomedical engineeringSignal[INFO.INFO-CV] Computer Science [cs]/Computer Vision and Pattern Recognition [cs.CV][INFO.INFO-TS]Computer Science [cs]/Signal and Image ProcessingFeature (computer vision)Frequency domainPhotoplethysmogram0202 electrical engineering electronic engineering information engineeringHeart rate variabilityGesture

researchProduct

The time course of processing handwritten words: An ERP investigation

2021

Available online 25 June 2021. Behavioral studies have shown that the legibility of handwritten script hinders visual word recognition. Furthermore, when compared with printed words, lexical effects (e.g., word-frequency effect) are magnified for less intelligible (difficult) handwriting (Barnhart and Goldinger, 2010; Perea et al., 2016). This boost has been interpreted in terms of greater influence of top-down mechanisms during visual word recognition. In the present experiment, we registered the participants’ ERPs to uncover top-down processing effects on early perceptual encoding. Participants’ behavioral and EEG responses were recorded to high- and low-frequency words that varied in scr…

HandwritingCognitive Neurosciencemedia_common.quotation_subjectSpeech recognitionExperimental and Cognitive PsychologyLegibility050105 experimental psychology03 medical and health sciencesBehavioral Neuroscience0302 clinical medicineHandwritingPerceptionEncoding (memory)Lexical decision taskHumans0501 psychology and cognitive sciencesEvoked Potentialsmedia_commonVisual word recognitionVisual word processing05 social sciencesERPsComputingMethodologies_PATTERNRECOGNITIONPattern Recognition VisualReadingTime courseComputingMethodologies_DOCUMENTANDTEXTPROCESSINGVisual PerceptionHandwritten word processingVisual word recognitionPsychology030217 neurology & neurosurgery

researchProduct

Synthetic individual binaural audio delivery by pinna image processing

2014

Purpose – The purpose of this paper is to present a system for customized binaural audio delivery based on the extraction of relevant features from a 2-D representation of the listener’s pinna. Design/methodology/approach – The most significant pinna contours are extracted by means of multi-flash imaging, and they provide values for the parameters of a structural head-related transfer function (HRTF) model. The HRTF model spatializes a given sound file according to the listener’s head orientation, tracked by sensor-equipped headphones, with respect to the virtual sound source. Findings – A preliminary localization test shows that the model is able to statically render the elevation of a vi…

Headphonebusiness.product_categoryReferenceGeneral Computer ScienceComputer scienceBinauralSpeech recognitionAuditory localizationImage processingTransfer functionTheoretical Computer Science3D audio; human computer interactionPsychoacousticsRepresentation (mathematics)Headphonesbiology3D audio; human computer interaction; Auditory localization; Binaural; Headphones; HRTF; Pinna; References; Spatial soundSpatial soundSettore INF/01 - InformaticaOrientation (computer vision)PinnaComputer Science (all)3D audio; Auditory localization; Binaural; Headphones; HRTF; Pinna; References; Spatial soundReferencesbiology.organism_classification3D audioHRTFhuman computer interactionPinnabusinessBinaural recordingHeadphones

researchProduct

THE EXTERNAL FRAME FUNCTION IN THE CONTROL OF PITCH IN THE HUMAN VOICE

1968

History and Philosophy of ScienceComputer scienceGeneral NeuroscienceSpeech recognitionFrame (networking)Function (mathematics)Control (linguistics)General Biochemistry Genetics and Molecular BiologyHuman voiceAnnals of the New York Academy of Sciences

researchProduct

STN area detection using K-NN classifiers for MER recordings in Parkinson patients during neurostimulator implant surgery

2016

Deep Brain Stimulation (DBS) applies electric pulses into the subthalamic nucleus (STN) improving tremor and other symptoms associated to Parkinson's disease. Accurate STN detection for proper location and implant of the stimulating electrodes is a complex task and surgeons are not always certain about final location. Signals from the STN acquired during DBS surgery are obtained with microelectrodes, having specific characteristics differing from other brain areas. Using supervised learning, a trained model based on previous microelectrode recordings (MER) can be obtained, being able to successfully classify the STN area for new MER signals. The K Nearest Neighbours (K-NN) algorithm has bee…

HistoryDeep brain stimulationWilcoxon signed-rank testbusiness.industrySpeech recognitionmedicine.medical_treatmentSupervised learning02 engineering and technologyImplant surgerynervous system diseasesComputer Science ApplicationsEducation03 medical and health sciencesSubthalamic nucleussurgical procedures operative0302 clinical medicinenervous system0202 electrical engineering electronic engineering information engineeringmedicine020201 artificial intelligence & image processingK nearest neighbourbusinesstherapeutics030217 neurology & neurosurgeryJournal of Physics: Conference Series

researchProduct

Phonemes in Prime Syllables

2021

HistorySpeech recognitionPrime (order theory)

researchProduct

Tonal Hierarchies in Jazz Improvisation

1995

Statistical methods were used to investigate 18 bebop-styled jazz improvisations based on the so- called Rhythm Changes chord progression. The data were compared with results obtained by C. L. Krumhansl and her colleagues in empirical tests investigating the perceived stability of the tones in the chromatic scale in various contexts. Comparisons were also made with data on the statistical distribution of the 12 chromatic tones in actual European art music. It was found that the chorus- level hierarchies (measured over a whole chorus) are remarkably similar to the rating profiles obtained in empirical tests and to the relative frequencies of the tones in European art music. The chord- level …

ImprovisationClassical musicHierarchyRhythmbiologySpeech recognitionChorusChord (music)Chromatic scaleJazzbiology.organism_classificationMusicMathematicsMusic Perception

researchProduct