Search results for " corpus"
showing 10 items of 202 documents
Creating Corpora of Finland’s Sign Languages
2016
This paper discusses the process of creating corpora of the sign languages used in Finland, Finnish Sign Language (FinSL) and Finland-Swedish Sign Language (FinSSL). It describes the process of getting informants and data, editing and storing the data, the general principles of annotation, and the creation of a web-based lexical database, the FinSL Signbank, developed on the basis of the NGT Signbank, which is a branch of the Auslan Signbank. The corpus project of Finland’s Sign Languages (CFINSL) started in 2014 at the Sign Language Centre of the University of Jyväskylä. Its aim is to collect conversations and narrations from 80 FinSL users and 20 FinSSL users who are living in different p…
Du vin au cacao : enjeux des sources et des méthodes pour une sémantique sensorielle
2018
Ce cours, proposé dans le cadre du séminaire du Professeur Eva Lavric à l’Université d’Innsbruck (Autriche), vise à discuter les enjeux, voire les défis, liés à une description linguistique, et surtout sémantique, du cacao. Ce faisant, il touche à la méthodologie d’une « linguistique sensorielle/de la sensorialité » dont il commence par présenter l’objet et la/les problématique(s), en particulier à travers la nécessité d’opter pour des paradigmes constructivistes de description du sens. Il s’arrête ensuite sur les acquis actuels dans la description de la langue du vin, en particulier de sa terminologie, en mettant en avant les manques de fondement théorique des roues des arômes, pourtant tr…
La terapeutica zoologica del Corpus hippocraticum. Primo saggio di indagine
2010
Il contributo propone una riflessione sul ricorso a ingredienti animali nella farmacopea ippocratica, distinguendo preliminarmente gli aspetti dietetici per poi evidenziare le particolarità in campo ginecologico.
Analysis and Comparison of Deep Learning Networks for Supporting Sentiment Mining in Text Corpora
2020
In this paper, we tackle the problem of the irony and sarcasm detection for the Italian language to contribute to the enrichment of the sentiment analysis field. We analyze and compare five deep-learning systems. Results show the high suitability of such systems to face the problem by achieving 93% of F1-Score in the best case. Furthermore, we briefly analyze the model architectures in order to choose the best compromise between performances and complexity.
Automatic Dictionary Creation by Sub-symbolic Encoding of Words
2006
This paper describes a technique for automatic creation of dictionaries using sub-symbolic representation of words in cross-language context. Semantic relationship among words of two languages is extracted from aligned bilingual text corpora. This feature is obtained applying the Latent Semantic Analysis technique to the matrices representing terms co-occurrences in aligned text fragments. The technique allows to find the “best translation” according to a properly defined geometric distance in an automatically created semantic space. Experiments show an interesting correctness of 95% obtained in the best case.
Syntagmatic and Paradigmatic Associations in Information Retrieval
2003
It is shown that unconscious associative processes taking place in the memory of a searcher during the formulation of a search query in information retrieval — such as the production of free word associations and the generation of synonyms — can be simulated using statistical models that analyze the distribution of words in large text corpora. The free word associations as produced by subjects on presentation of stimulus words can be predicted by applying first-order statistics to the frequencies of word co-occurrences as observed in texts. The generation of synonyms can also be conducted on co-occurrence data but requires second-order statistics. Both approaches are compared and validated …
Methodological Approach for Messages Classification on Twitter Within E-Government Area
2018
The constant growth in the numbers of Social Media users is a reality of the past few years. Companies, governments and researchers focus on extracting useful data from Social Media. One of the most important things we can extract from the messages transmitted from one user to another is the sentiment—positive, negative or neutral—regarding the subject of the conversation. There are many studies on how to classify these messages, but all of them need a huge amount of data already classified for training, data not available for Romanian language texts. We present a case study in which we use a Naive Bayes classifier trained on an English short text corpus on several thousand Romanian texts. …
Review of Non-English Corpora Annotated for Emotion Classification in Text
2020
In this paper we try to systematize the information about the available corpora for emotion classification in text for languages other than English with the goal to find what approaches could be used for low-resource languages with close to no existing works in the field. We analyze the corresponding volume, emotion classification schema, language of each corresponding corpus and methods employed for data preparation and annotation automation. We’ve systematized twenty-four papers representing the corpora and found that corpora were mostly for the most spoken world languages: Hindi, Chinese, Turkish, Arabic, Japanese etc. A typical corpus contained several thousand of manually-annotated ent…
A Methodology for Bilingual Lexicon Extraction from Comparable Corpora
2015
Dictionary extraction using parallel corpora is well established. However, for many language pairs parallel corpora are a scarce resource which is why in the current work we discuss methods for dictionary extraction from comparable corpora. Hereby the aim is to push the boundaries of current approaches, which typically utilize correlations between co-occurrence patterns across languages, in several ways: 1) Eliminating the need for initial lexicons by using a bootstrapping approach which only requires a few seed translations. 2) Implementing a new approach which first establishes alignments between comparable documents across languages, and then computes cross-lingual alignments between wor…
Reflection Assignment as a Tool to Support Students’ Metacognitive Awareness in the Context of Computer-Supported Collaborative Learning
2021
The present study explores the potential of a reflection assignment as a tool for supporting master’s degree students’ metacognitive skills in the context of computer-supported collaborative learning (CSCL). The research question (RQ) is formulated as follows: How does a regularly submitted reflection assignment support the development of students’ individual metacognitive awareness in the context of CSCL? The empirical data is a text corpus (7878 words) extracted from individual students’ (N = 13) reflection assignments (N = 65) submitted during one semester. Qualitative content analysis was employed to analyze the data. The results demonstrate that by the end of the course, the students s…