Search results for "speech recognition"

showing 10 items of 357 documents

Non-speech voice for sonic interaction: a catalogue

2016

This paper surveys the uses of non-speech voice as an interaction modality within sonic applications. Three main contexts of use have been identified: sound retrieval, sound synthesis and control, and sound design. An overview of different choices and techniques regarding the style of interaction, the selection of vocal features and their mapping to sound features or controls is here displayed. A comprehensive collection of examples instantiates the use of non-speech voice in actual tools for sonic interaction. It is pointed out that while voice-based techniques are already being used proficiently in sound retrieval and sound synthesis, their use in sound design is still at an exploratory p…

Computer scienceVoice - Sonic interaction - Information retrieval - Sound synthesis - Sound designSpeech recognitionSound design02 engineering and technologyExploratory phase020204 information systemsSonic interaction design0202 electrical engineering electronic engineering information engineeringSelection (linguistics)Information retrieval0501 psychology and cognitive sciences050107 human factorsSound (geography)Sonic interactiongeographyModality (human–computer interaction)geography.geographical_feature_categoryInformation retrieval; Sonic interaction; Sound design; Sound synthesis; Voice; Signal Processing; Human-Computer InteractionSettore INF/01 - Informatica05 social sciencesSound synthesiSound designHuman-Computer InteractionSignal ProcessingVoice

researchProduct

ERP qualification exploiting waveform, spectral and time-frequency infomax

2008

The present contribution briefly introduces an event related potential (ERP) detector. The specified detector includes three kinds of features of ERP. They are the ERP waveform feature, ERP spectral feature and ERP time-frequency feature respectively. According to these characteristics, two parameters are defined to reflect the timing feature of ERP. The mismatch negativity (MMN) is taken as the example to design an exact qualification detector. The experiment validates that the computer can automatically detect the raw trace to reflect the quality of the dataset, qualify the filtered trace to test whether the artifacts have been filtered out, and select the ERP-like component to reject art…

Computer sciencebusiness.industrySpeech recognitionDetectorMismatch negativityPattern recognitionIndependent component analysisTime–frequency analysisFeature (computer vision)WaveformArtificial intelligenceInfomaxbusinessTRACE (psycholinguistics)2008 3rd International Symposium on Communications, Control and Signal Processing

researchProduct

Analyse des Visuellen Klassifikationssystems Durch Detektionsexperimente

1977

Summary Experiments on recognizing statistically distorted patterns show that the human visual system operates as a linear classifier. The spatial frequency range, within which features are extracted, is determined by the coupling in the area of sharpest vision (2°). The relevant features for classifying patterns are not produced by isotropic filtering

Computer sciencebusiness.industrySpeech recognitionHuman visual system modelPattern recognitionLinear classifierSpatial frequencyArtificial intelligencebusinessIFAC Proceedings Volumes

researchProduct

A Sub-Symbolic Approach to Word Modelling for Domain Specific Speech Recognition

2006

In this work a sub-symbolic technique for automatic, data driven language models construction is presented. Such a technique can be used to arrange a language-modelling module, which can be easily integrated in existing speech recognition architectures, such as the well-found HTK architecture. The proposed technique takes advantages from both the traditional LSA approach and from a novel application of a probability space metric known as "Hellinger's distance". Experimental trials are also presented, in order to validate the proposed approach.

Computer sciencebusiness.industrySpeech recognitionMachine learningcomputer.software_genreDomain (software engineering)Speech enhancementMetric (mathematics)Artificial intelligenceLanguage modelHellinger distanceHidden Markov modelbusinesscomputerNatural languageWord (computer architecture)Seventh International Workshop on Computer Architecture for Machine Perception (CAMP'05)

researchProduct

Detection of TV commercials

2004

This paper presents a system that labels TV shots either as commercial or program shots. The system uses two observations: logo presence and shot duration. These observations are modeled using HMMs, and a Viterbi decoder is finally used for shot labeling. The system has been tested on several hours of real video, achieving more than 99% correct labeling.

Computer sciencebusiness.industrySpeech recognitionShot (filmmaking)ComputingMethodologies_IMAGEPROCESSINGANDCOMPUTERVISIONViterbi algorithmsymbols.namesakeComputingMethodologies_PATTERNRECOGNITIONViterbi decoderPattern recognition (psychology)symbolsComputer visionArtificial intelligenceHidden Markov modelbusinessDecoding methods2004 IEEE International Conference on Acoustics, Speech, and Signal Processing

researchProduct

Spectral tools for Dynamic Tonality and audio morphing

2009

Computer Music Journal Spectral Tools for Dynamic Tonality and Audio Morphing William Sethares sethares@ece.wisc.edu, Department of Electrical and Computer Engineering, University of Wisconsin-Madison, Madison, WI 53706 USA. Andrew Milne andymilne@tonalcentre.org, Department of Music, P.O. Box 35, 40014, University of Jyvaskyla, Finland. Stefan Tiedje Stefan-Tiedje@addcom.de, CCMIX, Paris, France. Anthony Prechtl aprechtl@gmail.com, Department of Music, P.O. Box 35, 40014, University of Jyvaskyla, Finland. James Plamondon jim@thumtronics.com, CEO, Thumtronics Inc., 6911 Thistle Hill Way, Austin, TX 78754 USA

ComputingMilieux_GENERALMorphingEngineeringbusiness.industrySpeech recognitionMedia TechnologyDynamic tonalityComputer musicComputingMilieux_LEGALASPECTSOFCOMPUTINGbusinessMusicComputer Science ApplicationsVisual arts

researchProduct

The neural basis of sublexical speech and corresponding nonspeech processing: a combined EEG-MEG study.

2014

Abstract We addressed the neural organization of speech versus nonspeech sound processing by investigating preattentive cortical auditory processing of changes in five features of a consonant–vowel syllable (consonant, vowel, sound duration, frequency, and intensity) and their acoustically matched nonspeech counterparts in a simultaneous EEG–MEG recording of mismatch negativity (MMN/MMNm). Overall, speech–sound processing was enhanced compared to nonspeech sound processing. This effect was strongest for changes which affect word meaning (consonant, vowel, and vowel duration) in the left and for the vowel identity change in the right hemisphere also. Furthermore, in the right hemisphere, spe…

ConsonantAdultMaleLinguistics and LanguageMemory Long-TermCognitive NeuroscienceSpeech recognitionMismatch negativityExperimental and Cognitive PsychologyAuditory cortexcomputer.software_genreLanguage and LinguisticsLateralization of brain functionFunctional LateralitySpeech and HearingYoung AdultDiscrimination PsychologicalPhoneticsReference ValuesVowelReaction TimeHumansAudio signal processingAuditory CortexCommunicationAnalysis of VarianceDuplex perceptionbusiness.industryMagnetoencephalographyElectroencephalographyMagnetic Resonance ImagingSemanticsAuditory PerceptionEvoked Potentials AuditorySpeech PerceptionSyllablebusinessPsychologycomputerBrain and language

researchProduct

Does letter position coding depend on consonant/vowel status? Evidence with the masked priming technique

2008

Recently, a number of input coding schemes (e.g., SOLAR model, SERIOL model, open-bigram model, overlap model) have been proposed that capture the transposed-letter priming effect (i.e., faster response times for jugde-JUDGE than for jupte-JUDGE). In their current version, these coding schemes do not assume any processing differences between vowels and consonants. However, in a lexical decision task, Perea and Lupker (2004, JML; Lupker, Perea, & Davis, 2008, L&CP) reported that transposed-letter priming effects occurred for consonant transpositions but not for vowel transpositions. This finding poses a challenge for these recently proposed coding schemes. Here, we report four masked priming…

ConsonantDissociation (neuropsychology)media_common.quotation_subjectSpeech recognitionDecision MakingExperimental and Cognitive PsychologyCognitionArts and Humanities (miscellaneous)VowelPerceptionTask Performance and AnalysisReaction TimeDevelopmental and Educational PsychologyLexical decision taskHumansStudentsmedia_commonAnalysis of VariancePsycholinguisticsRecognition PsychologyCognitionGeneral MedicineLinguisticsSpainVisual PerceptionCuesPsychologyPerceptual MaskingPriming (psychology)Photic StimulationCoding (social sciences)Acta Psychologica

researchProduct

Harmony perception and regularity of spike trains in a simple auditory model

2013

A probabilistic approach for investigating the phenomena of dissonance and consonance in a simple auditory sensory model, composed by two sensory neurons and one interneuron, is presented. We calculated the interneuron’s firing statistics, that is the interspike interval statistics of the spike train at the output of the interneuron, for consonant and dissonant inputs in the presence of additional "noise", representing random signals from other, nearby neurons and from the environment. We find that blurry interspike interval distributions (ISIDs) characterize dissonant accords, while quite regular ISIDs characterize consonant accords. The informational entropy of the non-Markov spike train …

ConsonantInterneuronSpeech recognitionSpike trainmedia_common.quotation_subjectSensory systemConsonance and dissonanceSound perceptionSettore FIS/03 - Fisica Della Materiamedicine.anatomical_structureAuditory system consonant and dissonant accords environmental noise hidden Markov chain informational entropy regularityPerceptionmedicineAuditory systemMathematicsmedia_commonAIP Conference Proceedings

researchProduct

Beyond alphabetic orthographies: The role of form and phonology in transposition effects in Katakana

2008

In the past years, there has been growing interest in how the order of letters is attained in visual word recognition. Two critical issues are: (1) whether the front-end of the recently proposed models of letter position encoding can be generalised to non-alphabetic scripts, and (2) whether phonology plays an important role in the process of letter position encoding. In the present masked priming lexical decision experiments, we employed a syllabic/moraic script (Katakana), which allows disentangling form and phonology. In Experiment 1, we found a robust masked transposed-mora priming effect: the prime a.ri.me.ka facilitates the processing of the word a.me.ri.ka relative to a double-substit…

ConsonantLinguistics and LanguageComputer scienceSpeech recognitionKatakanaExperimental and Cognitive PsychologyPhonologyLanguage and LinguisticsLinguisticsEducationVowelWord recognitionLexical decision taskSyllabic versePriming (psychology)Language and Cognitive Processes

researchProduct