6533b858fe1ef96bd12b6490

RESEARCH PRODUCT

Comparing identification of vocal imitations and computational sketches of everyday sounds

Guillaume LemaitreOlivier HouixPatrick SusiniNicolas MisdariisFrédéric VoisinFrédéric Voisin

subject

Acoustics and UltrasonicsComputer science[ SHS.MUSIQ ] Humanities and Social Sciences/Musicology and performing artsSpeech recognitionAcoustics[SCCO.COMP]Cognitive science/Computer science[ SPI.SIGNAL ] Engineering Sciences [physics]/Signal and Image processing[INFO.INFO-NE]Computer Science [cs]/Neural and Evolutionary Computing [cs.NE][INFO.INFO-CL]Computer Science [cs]/Computation and Language [cs.CL][INFO.INFO-AI]Computer Science [cs]/Artificial Intelligence [cs.AI][SPI]Engineering Sciences [physics][SCCO]Cognitive scienceArts and Humanities (miscellaneous)[ INFO.INFO-HC ] Computer Science [cs]/Human-Computer Interaction [cs.HC][ INFO.INFO-CL ] Computer Science [cs]/Computation and Language [cs.CL][INFO.INFO-HC]Computer Science [cs]/Human-Computer Interaction [cs.HC][ INFO.INFO-NE ] Computer Science [cs]/Neural and Evolutionary Computing [cs.NE][ INFO.INFO-AI ] Computer Science [cs]/Artificial Intelligence [cs.AI]ComputingMilieux_MISCELLANEOUSSound (medical instrument)[ INFO.INFO-ET ] Computer Science [cs]/Emerging Technologies [cs.ET][SHS.MUSIQ]Humanities and Social Sciences/Musicology and performing arts[SCCO.NEUR]Cognitive science/Neuroscience[SHS.ANTHRO-SE]Humanities and Social Sciences/Social Anthropology and ethnologyIdentification (information)[ SHS.ANTHRO-SE ] Humanities and Social Sciences/Social Anthropology and ethnology[INFO.INFO-MA]Computer Science [cs]/Multiagent Systems [cs.MA][ SCCO.COMP ] Cognitive science/Computer science[ SCCO.NEUR ] Cognitive science/Neuroscience[INFO.INFO-SD]Computer Science [cs]/Sound [cs.SD][ INFO.EIAH ] Computer Science [cs]/Technology for Human Learning[ INFO.INFO-MA ] Computer Science [cs]/Multiagent Systems [cs.MA][INFO.INFO-ET]Computer Science [cs]/Emerging Technologies [cs.ET][INFO.EIAH]Computer Science [cs]/Technology for Human Learning[ INFO.INFO-SD ] Computer Science [cs]/Sound [cs.SD][SPI.SIGNAL]Engineering Sciences [physics]/Signal and Image processing

description

International audience; Sounds are notably difficult to describe. It is thus not surprising that human speakers often use many imitative vocalizations to communicate about sounds. In practice,vocal imitations of non-speech everyday sounds (e.g. the sound of a car passing by) arevery effective: listeners identify sounds better with vocal imitations than with verbal descriptions, despite the fact that vocal imitations are often inaccurate, constrained by the human vocal apparatus. The present study investigated the semantic representations evoked by vocal imitations by experimentally quantifying how well listeners could match sounds to category labels. Itcompared two different types of sounds: human vocal imitations, and computational auditory sketches (created by algorithmic computations), both based on easily identifiable sounds (sounds of human actions and manufactured products). The results show that performance with the best vocal imitations was similar to the best auditory sketches for most categories of sounds.More detailed analyses showed that the acoustic distance between vocal imitations and referent sounds is not sufficient to account for such performance. They suggested that instead of reproducing the acoustic properties of the referent sound as accurately as vocally possible, vocal imitations focus on a few important features dependent on each particular sound category.

10.1121/1.4970854https://hal.archives-ouvertes.fr/hal-01611867