Search results for " corpus"

showing 10 items of 202 documents

Creating Corpora of Finland’s Sign Languages

2016

This paper discusses the process of creating corpora of the sign languages used in Finland, Finnish Sign Language (FinSL) and Finland-Swedish Sign Language (FinSSL). It describes the process of getting informants and data, editing and storing the data, the general principles of annotation, and the creation of a web-based lexical database, the FinSL Signbank, developed on the basis of the NGT Signbank, which is a branch of the Auslan Signbank. The corpus project of Finland’s Sign Languages (CFINSL) started in 2014 at the Sign Language Centre of the University of Jyväskylä. Its aim is to collect conversations and narrations from 80 FinSL users and 20 FinSSL users who are living in different p…

Signbanksuomenruotsalainen viittomakielimetadatasuomalainen viittomakieliannotointisign language corpus
researchProduct

Du vin au cacao : enjeux des sources et des méthodes pour une sémantique sensorielle

2018

Ce cours, proposé dans le cadre du séminaire du Professeur Eva Lavric à l’Université d’Innsbruck (Autriche), vise à discuter les enjeux, voire les défis, liés à une description linguistique, et surtout sémantique, du cacao. Ce faisant, il touche à la méthodologie d’une « linguistique sensorielle/de la sensorialité » dont il commence par présenter l’objet et la/les problématique(s), en particulier à travers la nécessité d’opter pour des paradigmes constructivistes de description du sens. Il s’arrête ensuite sur les acquis actuels dans la description de la langue du vin, en particulier de sa terminologie, en mettant en avant les manques de fondement théorique des roues des arômes, pourtant tr…

SociolingistiqueCacaoSémantique lexicaleSémantiqueEspagnolEquateurLinguistique de corpusSémantique cognitiveCacaocultureTerminologie[SHS.LANGUE] Humanities and Social Sciences/LinguisticsAnalyse de discoursKichwa
researchProduct

La terapeutica zoologica del Corpus hippocraticum. Primo saggio di indagine

2010

Il contributo propone una riflessione sul ricorso a ingredienti animali nella farmacopea ippocratica, distinguendo preliminarmente gli aspetti dietetici per poi evidenziare le particolarità in campo ginecologico.

Terapia animali corpus hippocraticumSettore L-FIL-LET/02 - Lingua E Letteratura Greca
researchProduct

Analysis and Comparison of Deep Learning Networks for Supporting Sentiment Mining in Text Corpora

2020

In this paper, we tackle the problem of the irony and sarcasm detection for the Italian language to contribute to the enrichment of the sentiment analysis field. We analyze and compare five deep-learning systems. Results show the high suitability of such systems to face the problem by achieving 93% of F1-Score in the best case. Furthermore, we briefly analyze the model architectures in order to choose the best compromise between performances and complexity.

Text corpusComputer sciencemedia_common.quotation_subjectCompromiseFace (sociological concept)02 engineering and technologycomputer.software_genreField (computer science)020204 information systems0202 electrical engineering electronic engineering information engineeringnatural language processingmedia_commonSettore ING-INF/05 - Sistemi Di Elaborazione Delle InformazioniSettore INF/01 - InformaticaSarcasmbusiness.industryDeep learningSentiment analysisdeep learningirony detectionIrony020201 artificial intelligence & image processingArtificial intelligencebusinesscomputersarcasm detectionNatural language processingProceedings of the 22nd International Conference on Information Integration and Web-based Applications & Services
researchProduct

Automatic Dictionary Creation by Sub-symbolic Encoding of Words

2006

This paper describes a technique for automatic creation of dictionaries using sub-symbolic representation of words in cross-language context. Semantic relationship among words of two languages is extracted from aligned bilingual text corpora. This feature is obtained applying the Latent Semantic Analysis technique to the matrices representing terms co-occurrences in aligned text fragments. The technique allows to find the “best translation” according to a properly defined geometric distance in an automatically created semantic space. Experiments show an interesting correctness of 95% obtained in the best case.

Text corpusCorrectnessProbabilistic latent semantic analysisComputer scienceLatent semantic analysisbusiness.industryContext (language use)Translation (geometry)computer.software_genreFeature (linguistics)Artificial intelligencebusinessRepresentation (mathematics)computerNatural language processing
researchProduct

Syntagmatic and Paradigmatic Associations in Information Retrieval

2003

It is shown that unconscious associative processes taking place in the memory of a searcher during the formulation of a search query in information retrieval — such as the production of free word associations and the generation of synonyms — can be simulated using statistical models that analyze the distribution of words in large text corpora. The free word associations as produced by subjects on presentation of stimulus words can be predicted by applying first-order statistics to the frequencies of word co-occurrences as observed in texts. The generation of synonyms can also be conducted on co-occurrence data but requires second-order statistics. Both approaches are compared and validated …

Text corpusEmpirical dataSyntagmatic analysisInformation retrievalWeb search querySemantic similarityComputer scienceStatistical modelIndependent component analysisAssociative property
researchProduct

Methodological Approach for Messages Classification on Twitter Within E-Government Area

2018

The constant growth in the numbers of Social Media users is a reality of the past few years. Companies, governments and researchers focus on extracting useful data from Social Media. One of the most important things we can extract from the messages transmitted from one user to another is the sentiment—positive, negative or neutral—regarding the subject of the conversation. There are many studies on how to classify these messages, but all of them need a huge amount of data already classified for training, data not available for Romanian language texts. We present a case study in which we use a Naive Bayes classifier trained on an English short text corpus on several thousand Romanian texts. …

Text corpusFocus (computing)Computer scienceRomanianmedia_common.quotation_subjectSubject (documents)language.human_languageWorld Wide WebNaive Bayes classifierConstant (computer programming)languageSocial mediaConversationmedia_common
researchProduct

Review of Non-English Corpora Annotated for Emotion Classification in Text

2020

In this paper we try to systematize the information about the available corpora for emotion classification in text for languages other than English with the goal to find what approaches could be used for low-resource languages with close to no existing works in the field. We analyze the corresponding volume, emotion classification schema, language of each corresponding corpus and methods employed for data preparation and annotation automation. We’ve systematized twenty-four papers representing the corpora and found that corpora were mostly for the most spoken world languages: Hindi, Chinese, Turkish, Arabic, Japanese etc. A typical corpus contained several thousand of manually-annotated ent…

Text corpusHindiArtificial neural networkTurkishComputer sciencebusiness.industryEmotion classificationcomputer.software_genrelanguage.human_languageAnnotationNaive Bayes classifierComputingMethodologies_PATTERNRECOGNITIONSchema (psychology)languageArtificial intelligencebusinesscomputerNatural language processing
researchProduct

A Methodology for Bilingual Lexicon Extraction from Comparable Corpora

2015

Dictionary extraction using parallel corpora is well established. However, for many language pairs parallel corpora are a scarce resource which is why in the current work we discuss methods for dictionary extraction from comparable corpora. Hereby the aim is to push the boundaries of current approaches, which typically utilize correlations between co-occurrence patterns across languages, in several ways: 1) Eliminating the need for initial lexicons by using a bootstrapping approach which only requires a few seed translations. 2) Implementing a new approach which first establishes alignments between comparable documents across languages, and then computes cross-lingual alignments between wor…

Text corpusInterlinguaComputer sciencebusiness.industrymedia_common.quotation_subjectBootstrapping (linguistics)computer.software_genrelanguage.human_languageParallel corporaBilingual lexiconResource (project management)languageQuality (business)Artificial intelligencebusinesscomputerWord (computer architecture)Natural language processingmedia_commonProceedings of the Fourth Workshop on Hybrid Approaches to Translation (HyTra)
researchProduct

Reflection Assignment as a Tool to Support Students’ Metacognitive Awareness in the Context of Computer-Supported Collaborative Learning

2021

The present study explores the potential of a reflection assignment as a tool for supporting master’s degree students’ metacognitive skills in the context of computer-supported collaborative learning (CSCL). The research question (RQ) is formulated as follows: How does a regularly submitted reflection assignment support the development of students’ individual metacognitive awareness in the context of CSCL? The empirical data is a text corpus (7878 words) extracted from individual students’ (N = 13) reflection assignments (N = 65) submitted during one semester. Qualitative content analysis was employed to analyze the data. The results demonstrate that by the end of the course, the students s…

Text corpusReflection (computer programming)05 social sciences050301 educationMetacognition050109 social psychologyCollaborative learningContext (language use)computer.software_genreScripting languageComputer-supported collaborative learningComputingMilieux_COMPUTERSANDEDUCATIONMathematics education0501 psychology and cognitive sciencesPsychology0503 educationcomputerResearch question
researchProduct