Search results for "Natural Language Processing"
showing 10 items of 413 documents
Polysemy in Controlled Natural Language Texts
2015
Computational semantics and logic-based controlled natural languages (CNL) do not address systematically the word sense disambiguation problem of content words, i.e., they tend to interpret only some functional words that are crucial for construction of discourse representation structures. We show that micro-ontologies and multi-word units allow integration of the rich and polysemous multi-domain background knowledge into CNL thus providing interpretation for the content words. The proposed approach is demonstrated by extending the Attempto Controlled English (ACE) with polysemous and procedural constructs resulting in a more natural CNL named PAO covering narrative multi-domain texts.
Towards the evaluation of automatic simultaneous speech translation from a communicative perspective
2021
In recent years, automatic speech-to-speech and speech-to-text translation has gained momentum thanks to advances in artificial intelligence, especially in the domains of speech recognition and machine translation. The quality of such applications is commonly tested with automatic metrics, such as BLEU, primarily with the goal of assessing improvements of releases or in the context of evaluation campaigns. However, little is known about how the output of such systems is perceived by end users or how they compare to human performances in similar communicative tasks. In this paper, we present the results of an experiment aimed at evaluating the quality of a real-time speech translation engine…
Facilitating terminology translation with target lemma annotations
2021
Most of the recent work on terminology integration in machine translation has assumed that terminology translations are given already inflected in forms that are suitable for the target language sentence. In day-to-day work of professional translators, however, it is seldom the case as translators work with bilingual glossaries where terms are given in their dictionary forms; finding the right target language form is part of the translation process. We argue that the requirement for apriori specified target language forms is unrealistic and impedes the practical applicability of previous work. In this work, we propose to train machine translation systems using a source-side data augmentatio…
Effectiveness of Data-Driven Induction of Semantic Spaces and Traditional Classifiers for Sarcasm Detection
2019
Irony and sarcasm are two complex linguistic phenomena that are widely used in everyday language and especially over the social media, but they represent two serious issues for automated text understanding. Many labeled corpora have been extracted from several sources to accomplish this task, and it seems that sarcasm is conveyed in different ways for different domains. Nonetheless, very little work has been done for comparing different methods among the available corpora. Furthermore, usually, each author collects and uses their own datasets to evaluate his own method. In this paper, we show that sarcasm detection can be tackled by applying classical machine learning algorithms to input te…
RIGA at SemEval-2016 Task 8: Impact of Smatch Extensions and Character-Level Neural Translation on AMR Parsing Accuracy
2016
Two extensions to the AMR smatch scoring script are presented. The first extension com-bines the smatch scoring script with the C6.0 rule-based classifier to produce a human-readable report on the error patterns frequency observed in the scored AMR graphs. This first extension results in 4% gain over the state-of-art CAMR baseline parser by adding to it a manually crafted wrapper fixing the identified CAMR parser errors. The second extension combines a per-sentence smatch with an en-semble method for selecting the best AMR graph among the set of AMR graphs for the same sentence. This second modification au-tomatically yields further 0.4% gain when ap-plied to outputs of two nondeterministic…
A Freely Available Morphological Analyzer, Disambiguator and Context Sensitive Lemmatizer for German
1998
In this paper we present Morphy, an integrated tool for German morphology, part-of-speech tagging and context-sensitive lemmatization. Its large lexicon of more than 320,000 word forms plus its ability to process German compound nouns guarantee a wide morphological coverage. Syntactic ambiguities can be resolved with a standard statistical part-of-speech tagger. By using the output of the tagger, the lemmatizer can determine the correct root even for ambiguous word forms. The complete package is freely available and can be downloaded from the World Wide Web.
Semantic Computing of Moods Based on Tags in Social Media of Music
2014
Social tags inherent in online music services such as Last.fm provide a rich source of information on musical moods. The abundance of social tags makes this data highly beneficial for developing techniques to manage and retrieve mood information, and enables study of the relationships between music content and mood representations with data substantially larger than that available for conventional emotion research. However, no systematic assessment has been done on the accuracy of social tags and derived semantic models at capturing mood information in music. We propose a novel technique called Affective Circumplex Transformation (ACT) for representing the moods of music tracks in an interp…
Measuring Semantic Coherence of a Conversation
2018
Conversational systems have become increasingly popular as a way for humans to interact with computers. To be able to provide intelligent responses, conversational systems must correctly model the structure and semantics of a conversation. We introduce the task of measuring semantic (in)coherence in a conversation with respect to background knowledge, which relies on the identification of semantic relations between concepts introduced during a conversation. We propose and evaluate graph-based and machine learning-based approaches for measuring semantic coherence using knowledge graphs, their vector space embeddings and word embedding models, as sources of background knowledge. We demonstrat…
EHeBby: An evocative humorist chat-bot
2008
A conversational agent, capable to have a "sense of humor" is presented. The agent can both generate humorous sentences and recognize humoristic expressions introduced by the user during the dialogue. EHeBby is an entertainment oriented conversational agent implemented using the ALICE framework embedded into an Yahoo! Messenger client. It is characterized by two areas: a rational, rule-based area and an evocative area. The first one is based on well founded techniques of computational humor and a standard AIML KB. The second one is based on a conceptual space, automatically induced by a corpus of funny documents, where KB items and user sentences are mapped. This area emulates an associativ…
Effective feature descriptor-based new framework for off-line text-independent writer identification
2018
Feature engineering is a key factor of machine learning applications. It is a fundamental process in writer identification of handwriting, which is an active and challenging field of research for many years. We propose a conceptually computationally efficient, yet simple and fast local descriptor referred to as Block Wise Local Binary Count (BW-LBC) for offline text-independent writer identification of handwritten documents. Proposed BW-LBC operator, which characterizes the writing style of each writer, is applied to a set of connected components extracted and cropped from scanned handwriting samples (documents or set of words/text lines) where each labeled component is seen as a texture im…