Search results for "speech recognition"
showing 10 items of 357 documents
ASR performance prediction on unseen broadcast programs using convolutional neural networks
2018
In this paper, we address a relatively new task: prediction of ASR performance on unseen broadcast programs. We first propose an heterogenous French corpus dedicated to this task. Two prediction approaches are compared: a state-of-the-art performance prediction based on regression (engineered features) and a new strategy based on convolutional neural networks (learnt features). We particularly focus on the combination of both textual (ASR transcription) and signal inputs. While the joint use of textual and signal features did not work for the regression baseline, the combination of inputs for CNNs leads to the best WER prediction performance. We also show that our CNN prediction remarkably …
Analyzing Learned Representations of a Deep ASR Performance Prediction Model
2018
This paper addresses a relatively new task: prediction of ASR performance on unseen broadcast programs. In a previous paper, we presented an ASR performance prediction system using CNNs that encode both text (ASR transcript) and speech, in order to predict word error rate. This work is dedicated to the analysis of speech signal embeddings and text embeddings learnt by the CNN while training our prediction model. We try to better understand which information is captured by the deep model and its relation with different conditioning factors. It is shown that hidden layers convey a clear signal about speech style, accent and broadcast type. We then try to leverage these 3 types of information …
Low-Power Audio Keyword Spotting using Tsetlin Machines
2021
The emergence of Artificial Intelligence (AI) driven Keyword Spotting (KWS) technologies has revolutionized human to machine interaction. Yet, the challenge of end-to-end energy efficiency, memory footprint and system complexity of current Neural Network (NN) powered AI-KWS pipelines has remained ever present. This paper evaluates KWS utilizing a learning automata powered machine learning algorithm called the Tsetlin Machine (TM). Through significant reduction in parameter requirements and choosing logic over arithmetic based processing, the TM offers new opportunities for low-power KWS while maintaining high learning efficacy. In this paper we explore a TM based keyword spotting (KWS) pipe…
Remote heart rate variability for emotional state monitoring
2018
International audience; Several researches have been conducted to recognize emotions using various modalities such as facial expressions , gestures, speech or physiological signals. Among all these modalities, physiological signals are especially interesting because they are mainly controlled by the autonomic nervous system. It has been shown for example that there is an undeniable relationship between emotional state and Heart Rate Variability (HRV). In this paper, we present a methodology to monitor emotional state from physiological signals acquired remotely. The method is based on a remote photoplethysmography (rPPG) algorithm that estimates remote Heart Rate Variability (rHRV) using a …
The time course of processing handwritten words: An ERP investigation
2021
Available online 25 June 2021. Behavioral studies have shown that the legibility of handwritten script hinders visual word recognition. Furthermore, when compared with printed words, lexical effects (e.g., word-frequency effect) are magnified for less intelligible (difficult) handwriting (Barnhart and Goldinger, 2010; Perea et al., 2016). This boost has been interpreted in terms of greater influence of top-down mechanisms during visual word recognition. In the present experiment, we registered the participants’ ERPs to uncover top-down processing effects on early perceptual encoding. Participants’ behavioral and EEG responses were recorded to high- and low-frequency words that varied in scr…
Synthetic individual binaural audio delivery by pinna image processing
2014
Purpose – The purpose of this paper is to present a system for customized binaural audio delivery based on the extraction of relevant features from a 2-D representation of the listener’s pinna. Design/methodology/approach – The most significant pinna contours are extracted by means of multi-flash imaging, and they provide values for the parameters of a structural head-related transfer function (HRTF) model. The HRTF model spatializes a given sound file according to the listener’s head orientation, tracked by sensor-equipped headphones, with respect to the virtual sound source. Findings – A preliminary localization test shows that the model is able to statically render the elevation of a vi…
THE EXTERNAL FRAME FUNCTION IN THE CONTROL OF PITCH IN THE HUMAN VOICE
1968
STN area detection using K-NN classifiers for MER recordings in Parkinson patients during neurostimulator implant surgery
2016
Deep Brain Stimulation (DBS) applies electric pulses into the subthalamic nucleus (STN) improving tremor and other symptoms associated to Parkinson's disease. Accurate STN detection for proper location and implant of the stimulating electrodes is a complex task and surgeons are not always certain about final location. Signals from the STN acquired during DBS surgery are obtained with microelectrodes, having specific characteristics differing from other brain areas. Using supervised learning, a trained model based on previous microelectrode recordings (MER) can be obtained, being able to successfully classify the STN area for new MER signals. The K Nearest Neighbours (K-NN) algorithm has bee…
Phonemes in Prime Syllables
2021
Tonal Hierarchies in Jazz Improvisation
1995
Statistical methods were used to investigate 18 bebop-styled jazz improvisations based on the so- called Rhythm Changes chord progression. The data were compared with results obtained by C. L. Krumhansl and her colleagues in empirical tests investigating the perceived stability of the tones in the chromatic scale in various contexts. Comparisons were also made with data on the statistical distribution of the 12 chromatic tones in actual European art music. It was found that the chorus- level hierarchies (measured over a whole chorus) are remarkably similar to the rating profiles obtained in empirical tests and to the relative frequencies of the tones in European art music. The chord- level …