Search results for "speech"

showing 10 items of 1281 documents

Algorithmic Aspects of Speech Recognition: A Synopsis

2000

Speech recognition is an area with a sizable literature, but there is little discussion of the topic within the computer science algorithms community. Since many of the problems arising in speech recognition are well suited for algorithmic studies, we present them in terms familiar to algorithm designers. Such cross fertilization can breed fresh insights from new perspectives. This material is abstracted from A. L. Buchsbaum and R. Giancarlo, Algorithmic Aspects of Speech Recognition: An Introduction, ACM Journal of Experimental Algorithmics, Vol. 2, 1997, http://www.jea.acm.org.

Computer scienceSpeech recognitionSpeech corpusHidden Markov modelGeneralLiterature_REFERENCE(e.g.dictionariesencyclopediasglossaries)

researchProduct

What's the difference? comparing humans and machines on the Aurora 2 speech recognition task

2013

Computer scienceSpeech recognitionTask (project management)Interspeech 2013

researchProduct

Processing Continuous Speech in Infancy

2016

The present chapter focuses on fluent speech segmentation abilities in early language development. We first review studies exploring the early use of major prosodic boundary cues which allow infants to cut full utterances into smaller-sized sequences like clauses or phrases. We then summarize studies showing that word segmentation abilities emerge around 8 months, and rely on infants’ processing of various bottom-up word boundary cues and top-down known word recognition cues. Given that most of these cues are specific to the language infants are acquiring, we emphasize how the development of these abilities varies cross-linguistically, and explore their developmental origin. In particular, …

Computer scienceSpeech recognitionText segmentation

researchProduct

Does training in syllable recognition improve reading speed? A computer-based trial with poor readers from second and third grade.

2013

Repeated reading of infrequent syllables has been shown to increase reading speed at the word level in a transparent orthography. This study confirms these results with a computer-based training method and extends them by comparing the training effects of short syllables and long frequent and infrequent syllables, controlling for rapid automatized naming. Our results, based on a sample of 150 poor readers of Finnish, showed clear gains in reading speed regarding all trained syllables, but a transfer effect to the word level was evident only in the case of long infrequent syllables. Rapid automatized naming was associated with initial reading speed, but not with the training effect. peerRevi…

Computer scienceSpeech recognitionmedia_common.quotation_subjectEducationrapid automatized namingcomputerized trainingreading speedReading (process)nopea nimeäminenFinno-Ugric languageslukemisvaikeusta516syllablesRapid automatized namingta515interventiomedia_commontavutreading disabilityTraining (meteorology)Training effectreading fluencyTransfer of trainingPsychology (miscellaneous)lukemisen sujuvuusSyllableOrthography

researchProduct

Tempo Induction from Music Recordings Using Ensemble Empirical Mode Decomposition Analysis

2011

Tempo and beat are among the most important features of Western music. Owing to the perceptual nature of tempo, its automatic analysis and extraction remains a difficult task for a large variety of music genres. Western music notation represents musical events using a hierarchical metrical structure distinguishing different time scales. This hierarchy is often modeled using three levels: the tatum, the tactus, and the measure. The tatum represents the shortest durational value in music that is not just an accidental phenomenon (Bilmes 1993). The tactus period is the most perceptually prominent period, and is the period at which most humans would tap their feet in time with the music (Lerdah…

Computer scienceSpeech recognitionmedia_common.quotation_subjectMusicalNotationHilbert–Huang transformComputer Science ApplicationsRhythmAudio editing softwarePerceptionMedia TechnologyMusic information retrievalBeat (music)Musicmedia_commonComputer Music Journal

researchProduct

Automatic fitting of cochlear implants with evolutionary algorithms

2004

This paper presents an optimisation algorithm designed to perform in-situ automatic fitting of cochlear implants.All patients are different, which means that cochlear parametrisation is a difficult and long task, with results ranging from perfect blind speech recognition to patients who cannot make anything out of their implant and just turn it off.The proposed method combines evolutionary algorithms and medical expertise to achieve autonomous interactive fitting through a Personal Digital Assistant (PDA).

Computer scienceSpeech recognitionotorhinolaryngologic diseasesEvolutionary algorithmImplantTask (project management)Proceedings of the 2004 ACM symposium on Applied computing

researchProduct

Non-speech voice for sonic interaction: a catalogue

2016

This paper surveys the uses of non-speech voice as an interaction modality within sonic applications. Three main contexts of use have been identified: sound retrieval, sound synthesis and control, and sound design. An overview of different choices and techniques regarding the style of interaction, the selection of vocal features and their mapping to sound features or controls is here displayed. A comprehensive collection of examples instantiates the use of non-speech voice in actual tools for sonic interaction. It is pointed out that while voice-based techniques are already being used proficiently in sound retrieval and sound synthesis, their use in sound design is still at an exploratory p…

Computer scienceVoice - Sonic interaction - Information retrieval - Sound synthesis - Sound designSpeech recognitionSound design02 engineering and technologyExploratory phase020204 information systemsSonic interaction design0202 electrical engineering electronic engineering information engineeringSelection (linguistics)Information retrieval0501 psychology and cognitive sciences050107 human factorsSound (geography)Sonic interactiongeographyModality (human–computer interaction)geography.geographical_feature_categoryInformation retrieval; Sonic interaction; Sound design; Sound synthesis; Voice; Signal Processing; Human-Computer InteractionSettore INF/01 - Informatica05 social sciencesSound synthesiSound designHuman-Computer InteractionSignal ProcessingVoice

researchProduct

ERP qualification exploiting waveform, spectral and time-frequency infomax

2008

The present contribution briefly introduces an event related potential (ERP) detector. The specified detector includes three kinds of features of ERP. They are the ERP waveform feature, ERP spectral feature and ERP time-frequency feature respectively. According to these characteristics, two parameters are defined to reflect the timing feature of ERP. The mismatch negativity (MMN) is taken as the example to design an exact qualification detector. The experiment validates that the computer can automatically detect the raw trace to reflect the quality of the dataset, qualify the filtered trace to test whether the artifacts have been filtered out, and select the ERP-like component to reject art…

Computer sciencebusiness.industrySpeech recognitionDetectorMismatch negativityPattern recognitionIndependent component analysisTime–frequency analysisFeature (computer vision)WaveformArtificial intelligenceInfomaxbusinessTRACE (psycholinguistics)2008 3rd International Symposium on Communications, Control and Signal Processing

researchProduct

Analyse des Visuellen Klassifikationssystems Durch Detektionsexperimente

1977

Summary Experiments on recognizing statistically distorted patterns show that the human visual system operates as a linear classifier. The spatial frequency range, within which features are extracted, is determined by the coupling in the area of sharpest vision (2°). The relevant features for classifying patterns are not produced by isotropic filtering

Computer sciencebusiness.industrySpeech recognitionHuman visual system modelPattern recognitionLinear classifierSpatial frequencyArtificial intelligencebusinessIFAC Proceedings Volumes

researchProduct

A Sub-Symbolic Approach to Word Modelling for Domain Specific Speech Recognition

2006

In this work a sub-symbolic technique for automatic, data driven language models construction is presented. Such a technique can be used to arrange a language-modelling module, which can be easily integrated in existing speech recognition architectures, such as the well-found HTK architecture. The proposed technique takes advantages from both the traditional LSA approach and from a novel application of a probability space metric known as "Hellinger's distance". Experimental trials are also presented, in order to validate the proposed approach.

Computer sciencebusiness.industrySpeech recognitionMachine learningcomputer.software_genreDomain (software engineering)Speech enhancementMetric (mathematics)Artificial intelligenceLanguage modelHellinger distanceHidden Markov modelbusinesscomputerNatural languageWord (computer architecture)Seventh International Workshop on Computer Architecture for Machine Perception (CAMP'05)

researchProduct