Search results for "Computer Science - Computation and Language"

showing 10 items of 31 documents

Explainable Tsetlin Machine framework for fake news detection with credibility score assessment

2021

The proliferation of fake news, i.e., news intentionally spread for misinformation, poses a threat to individuals and society. Despite various fact-checking websites such as PolitiFact, robust detection techniques are required to deal with the increase in fake news. Several deep learning models show promising results for fake news classification, however, their black-box nature makes it difficult to explain their classification decisions and quality-assure the models. We here address this problem by proposing a novel interpretable fake news detection framework based on the recently introduced Tsetlin Machine (TM). In brief, we utilize the conjunctive clauses of the TM to capture lexical and…

FOS: Computer and information sciencesI.2Computer Science - Machine LearningArtificial Intelligence (cs.AI)Computer Science - Computation and LanguageI.5Computer Science - Artificial IntelligenceI.2; I.5; I.7Computation and Language (cs.CL)I.7Machine Learning (cs.LG)

researchProduct

Facilitating terminology translation with target lemma annotations

2021

Most of the recent work on terminology integration in machine translation has assumed that terminology translations are given already inflected in forms that are suitable for the target language sentence. In day-to-day work of professional translators, however, it is seldom the case as translators work with bilingual glossaries where terms are given in their dictionary forms; finding the right target language form is part of the translation process. We argue that the requirement for apriori specified target language forms is unrealistic and impedes the practical applicability of previous work. In this work, we propose to train machine translation systems using a source-side data augmentatio…

FOS: Computer and information sciencesLemma (mathematics)Computer Science - Computation and LanguageMachine translationProcess (engineering)Computer sciencebusiness.industryLatvianTerm (logic)Translation (geometry)computer.software_genrelanguage.human_languageTerminologylanguageArtificial intelligencebusinessComputation and Language (cs.CL)computerNatural language processingSentence

researchProduct

Effectiveness of Data-Driven Induction of Semantic Spaces and Traditional Classifiers for Sarcasm Detection

2019

Irony and sarcasm are two complex linguistic phenomena that are widely used in everyday language and especially over the social media, but they represent two serious issues for automated text understanding. Many labeled corpora have been extracted from several sources to accomplish this task, and it seems that sarcasm is conveyed in different ways for different domains. Nonetheless, very little work has been done for comparing different methods among the available corpora. Furthermore, usually, each author collects and uses their own datasets to evaluate his own method. In this paper, we show that sarcasm detection can be tackled by applying classical machine learning algorithms to input te…

FOS: Computer and information sciencesLinguistics and LanguageComputer Science - Machine LearningComputer sciencemedia_common.quotation_subjectSemantic spaceMachine Learning (stat.ML)02 engineering and technologycomputer.software_genreLanguage and LinguisticsTask (project management)Data-drivenMachine Learning (cs.LG)Artificial IntelligenceStatistics - Machine Learning020204 information systemsEveryday language0202 electrical engineering electronic engineering information engineeringSocial medianatural language processingmedia_commonComputer Science - Computation and LanguageSarcasmSettore INF/01 - Informaticabusiness.industryirony detectionIronymachine learningsemantic spaces020201 artificial intelligence & image processingArtificial intelligencebusinessIrony detectionsemantic spacecomputerComputation and Language (cs.CL)SoftwareNatural language processingsarcasm detection

researchProduct

RIGA at SemEval-2016 Task 8: Impact of Smatch Extensions and Character-Level Neural Translation on AMR Parsing Accuracy

2016

Two extensions to the AMR smatch scoring script are presented. The first extension com-bines the smatch scoring script with the C6.0 rule-based classifier to produce a human-readable report on the error patterns frequency observed in the scored AMR graphs. This first extension results in 4% gain over the state-of-art CAMR baseline parser by adding to it a manually crafted wrapper fixing the identified CAMR parser errors. The second extension combines a per-sentence smatch with an en-semble method for selecting the best AMR graph among the set of AMR graphs for the same sentence. This second modification au-tomatically yields further 0.4% gain when ap-plied to outputs of two nondeterministic…

FOS: Computer and information sciencesParsingComputer Science - Computation and LanguageComputer sciencebusiness.industry02 engineering and technologyExtension (predicate logic)computer.software_genreSemEvalSet (abstract data type)Nondeterministic algorithm020204 information systemsTest setClassifier (linguistics)0202 electrical engineering electronic engineering information engineering020201 artificial intelligence & image processingArtificial intelligencebusinesscomputerComputation and Language (cs.CL)Natural language processingSentence

researchProduct

A Freely Available Morphological Analyzer, Disambiguator and Context Sensitive Lemmatizer for German

1998

In this paper we present Morphy, an integrated tool for German morphology, part-of-speech tagging and context-sensitive lemmatization. Its large lexicon of more than 320,000 word forms plus its ability to process German compound nouns guarantee a wide morphological coverage. Syntactic ambiguities can be resolved with a standard statistical part-of-speech tagger. By using the output of the tagger, the lemmatizer can determine the correct root even for ambiguous word forms. The complete package is freely available and can be downloaded from the World Wide Web.

FOS: Computer and information sciencesSpectrum analyzerRoot (linguistics)Morphology (linguistics)Computer Science - Computation and LanguageComputer sciencebusiness.industryLemmatisationContext (language use)computer.software_genreLexiconSyntaxlanguage.human_languageGermanH.3.4NounlanguageArtificial intelligencebusinesscomputerComputation and Language (cs.CL)Natural language processingWord (computer architecture)

researchProduct

Measuring Semantic Coherence of a Conversation

2018

Conversational systems have become increasingly popular as a way for humans to interact with computers. To be able to provide intelligent responses, conversational systems must correctly model the structure and semantics of a conversation. We introduce the task of measuring semantic (in)coherence in a conversation with respect to background knowledge, which relies on the identification of semantic relations between concepts introduced during a conversation. We propose and evaluate graph-based and machine learning-based approaches for measuring semantic coherence using knowledge graphs, their vector space embeddings and word embedding models, as sources of background knowledge. We demonstrat…

FOS: Computer and information sciencesWord embeddingComputer scienceComputer Science - Artificial Intelligencemedia_common.quotation_subjectihmisen ja tietokoneen vuorovaikutus02 engineering and technologycomputer.software_genrekeskustelu020204 information systems0202 electrical engineering electronic engineering information engineeringConversationconversational systemsmedia_commonComputer Science - Computation and Languagebusiness.industrykoneoppiminenArtificial Intelligence (cs.AI)Knowledge graphsemantiikkaGraph (abstract data type)020201 artificial intelligence & image processingArtificial intelligencebusinesssemantic coherencecomputerComputation and Language (cs.CL)Natural language processing

researchProduct

A First Experiment on Including Text Literals in KGloVe

2018

Graph embedding models produce embedding vectors for entities and relations in Knowledge Graphs, often without taking literal properties into account. We show an initial idea based on the combination of global graph structure with additional information provided by textual information in properties. Our initial experiment shows that this approach might be useful, but does not clearly outperform earlier approaches when evaluated on machine learning tasks.

FOS: Computer and information sciencesgraph embeddingsComputer Science - Computation and LanguageArtificial Intelligence (cs.AI)koneoppiminenknowledge graphComputer Science - Artificial IntelligencetekstinlouhintaattributestiedonlouhintaComputation and Language (cs.CL)

researchProduct

Measuring the Novelty of Natural Language Text Using the Conjunctive Clauses of a Tsetlin Machine Text Classifier

2020

Most supervised text classification approaches assume a closed world, counting on all classes being present in the data at training time. This assumption can lead to unpredictable behaviour during operation, whenever novel, previously unseen, classes appear. Although deep learning-based methods have recently been used for novelty detection, they are challenging to interpret due to their black-box nature. This paper addresses \emph{interpretable} open-world text classification, where the trained classifier must deal with novel classes during operation. To this end, we extend the recently introduced Tsetlin machine (TM) with a novelty scoring mechanism. The mechanism uses the conjunctive clau…

I.2FOS: Computer and information sciencesComputer Science - Machine LearningI.5Computer Science - Artificial IntelligenceComputer scienceI.2; I.5; I.7computer.software_genreI.7Novelty detectionMeasure (mathematics)Machine Learning (cs.LG)Representation (mathematics)Computer Science - Computation and Languagebusiness.industryDeep learningNoveltyPropositional calculusArtificial Intelligence (cs.AI)Artificial intelligencebusinessClassifier (UML)computerComputation and Language (cs.CL)Natural language processingNatural language

researchProduct

Multilayer Network Model of Movie Script

2018

Network models have been increasingly used in the past years to support summarization and analysis of narratives, such as famous TV series, books and news. Inspired by social network analysis, most of these models focus on the characters at play. The network model well captures all characters interactions, giving a broad picture of the narration's content. A few works went beyond by introducing additional semantic elements, always captured in a single layer network. In contrast, we introduce in this work a multilayer network model to capture more elements of the narration of a movie from its script: people, locations, and other semantic elements. This model enables new measures and insights…

Social and Information Networks (cs.SI)FOS: Computer and information sciencesComputer Science - Computation and LanguageComputer Science - Social and Information NetworksComputation and Language (cs.CL)

researchProduct

AMUSED: An Annotation Framework of Multi-modal Social Media Data

2020

In this paper, we present a semi-automated framework called AMUSED for gathering multi-modal annotated data from the multiple social media platforms. The framework is designed to mitigate the issues of collecting and annotating social media data by cohesively combining machine and human in the data collection process. From a given list of the articles from professional news media or blog, AMUSED detects links to the social media posts from news articles and then downloads contents of the same post from the respective social media platform to gather details about that specific post. The framework is capable of fetching the annotated data from multiple platforms like Twitter, YouTube, Reddit.…

researchProduct