Search results for "speech"
showing 10 items of 1281 documents
Keyword Based Keyframe Extraction in Online Video Collections
2015
Keyframe extraction methods aim to find in a video sequence the most significant frames, according to specific criteria. In this paper we propose a new method to search, in a video database, for frames that are related to a given keyword, and to extract the best ones, according to a proposed quality factor. We first exploit a speech to text algorithm to extract automatic captions from all the video in a specific domain database. Then we select only those sequences (clips), whose captions include a given keyword, thus discarding a lot of information that is useless for our purposes. Each retrieved clip is then divided into shots, using a video segmentation method, that is based on the SURF d…
Inner speech for a self-conscious robot
2018
The experience self-conscious thinking in the verbose form of inner speech is a common one. Such a covert dialogue accompanies the introspection of mental life and fulfills important roles in our cognition, such as self-regulation, self-restructuring, and re-focusing on attentional resources. Although the functional underpinning and the phenomenology of inner speech are largely investigated in psychological and philosophical fields, robotic research generally does not address such a form of self-conscious behavior. Existing models of inner speech inspire computational tools to provide the robot with a form of self-consciousness. Here, the most widespread psychological models of inner speech…
Sub-Symbolic Knowledge Representation for Evocative Chat-Bots
2008
A sub-symbolic knowledge representation oriented to the enhancement of chat bot interaction is proposed. The result of the technique is the introduction of a semantic sub-symbolic layer to a traditional ontology-based knowledge representation. This layer is obtained mapping the ontology concepts into a semantic space built through Latent Semantic Analysis (LSA) technique and it is embedded into a conversational agent. This choice leads to a chat-bot with “evocative” capabilities whose knowledge representation framework is composed of two areas: the rational and the evocative one. As a standard ontology we have chosen the well-founded WordNet lexical dictionary, while as chat-bot the ALICE a…
Robot's Inner Speech Effects on Trust and Anthropomorphic Cues in Human-Robot Cooperation
2021
Inner Speech is an essential but also elusive human psychological process which refers to an everyday covert internal conversation with oneself. We argue that programming a robot with an overt self-talk system, which simulates human inner speech, might enhance human trust by improving robot transparency and anthropomorphism. For this reasons, this work aims to investigate if robot’s inner speech, here intended as overt self-talk, affects human trust and anthropomorphism when human and robot cooperate. A group of participants was engaged in collaboration with the robot. During cooperation, the robot talks to itself. To evaluate if the robot’s inner speech influences human trust, two question…
A perceptual sound space for auditory displays based on sung-vowel synthesis.
2022
AbstractWhen designing displays for the human senses, perceptual spaces are of great importance to give intuitive access to physical attributes. Similar to how perceptual spaces based on hue, saturation, and lightness were constructed for visual color, research has explored perceptual spaces for sounds of a given timbral family based on timbre, brightness, and pitch. To promote an embodied approach to the design of auditory displays, we introduce the Vowel–Type–Pitch (VTP) space, a cylindrical sound space based on human sung vowels, whose timbres can be synthesized by the composition of acoustic formants and can be categorically labeled. Vowels are arranged along the circular dimension, whi…
Investigating Proactive Search Support in Conversations
2018
Conversations among people involve solving disputes, building common ground, and reinforce mutual beliefs and assumptions. Conversations often require external information that can support these human activities. In this paper, we study how a spoken conversation can be supported by a proactive search agent that listens to the conversation, detects entities mentioned in the conversation, and proactively retrieves and presents information related to the conversation. A total of 24 participants (12 pairs) were involved in informal conversations, using either the proactive search agent or a control condition that did not support conversational analysis or proactive information retrieval. Data c…
Real-Time Body Gestures Recognition Using Training Set Constrained Reduction
2017
Gesture recognition is an emerging cross-discipline research field, which aims at interpreting human gestures and associating them to a well-defined meaning. It has been used as a mean for supporting human to machine interaction in several applications of robotics, artificial intelligence, and machine learning. In this paper, we propose a system able to recognize human body gestures which implements a constrained training set reduction technique. This allows the system for a real-time execution. The system has been tested on a publicly available dataset of 7,000 gestures, and experimental results have highlighted that at the cost of a little decrease in the maximum achievable recognition ac…
The Inner Life of a Robot in Human-Robot Teaming
2020
Giving the robot a 'human' inner life, such as the capability to think about itself and to understand what the other team members are doing, would increase the efficiency of trustworthy interactions with the other members of the team. Our long-Term research goal is to provide the robot with a computational model of inner life helping the robot to reason about itself, its capabilities, its environment and its teammates. Robot inner speech is a part of the research goal. In this paper, we summarize the results obtained in this direction.
Embedded Knowledge-based Speech Detectors for Real-Time Recognition Tasks
2006
Speech recognition has become common in many application domains, from dictation systems for professional practices to vocal user interfaces for people with disabilities or hands-free system control. However, so far the performance of automatic speech recognition (ASR) systems are comparable to human speech recognition (HSR) only under very strict working conditions, and in general much lower. Incorporating acoustic-phonetic knowledge into ASR design has been proven a viable approach to raise ASR accuracy. Manner of articulation attributes such as vowel, stop, fricative, approximant, nasal, and silence are examples of such knowledge. Neural networks have already been used successfully as de…
The Sound Design Toolkit
2017
The Sound Design Toolkit is a collection of physically informed sound synthesis models, specifically designed for practice and research in Sonic Interaction Design. The collection is based on a hierarchical, perceptually founded taxonomy of everyday sound events, and implemented by procedural audio algorithms which emphasize the role of sound as a process rather than a product. The models are intuitive to control – and the resulting sounds easy to predict – as they rely on basic everyday listening experience. Physical descriptions of sound events are intentionally simplified to emphasize the most perceptually relevant timbral features, and to reduce computational requirements as well. Keywo…