Search results for "Natural Language Processing"
showing 10 items of 413 documents
Revisiting corpus creation and analysis tools for translation tasks
2016
Many translation scholars have proposed the use of corpora to allow professional translators to produce high quality texts which read like originals. Yet, the diffusion of this methodology has been modest, one reason being the fact that software for corpora analyses have been developed with the linguist in mind, which means that they are generally complex and cumbersome, offering many advanced features, but lacking the level of usability and the specific features that meet translators’ needs. To overcome this shortcoming, we have developed TranslatorBank, a free corpus creation and analysis tool designed for translation tasks. TranslatorBank supports the creation of specialized monolingual …
Discovering the Senses of an Ambiguous Word by Clustering its Local Contexts
2005
As has been shown recently, it is possible to automatically discover the senses of an ambiguous word by statistically analyzing its contextual behavior in a large text corpus. However, this kind of research is still at an early stage. The results need to be improved and there is considerable disagreement on methodological issues. For example, although most researchers use clustering approaches for word sense induction, it is not clear what statistical features the clustering should be based on. Whereas so far most researchers cluster global co-occurrence vectors that reflect the overall behavior of a word in a corpus, in this paper we argue that it is more appropriate to use local context v…
A Controllable Text Simplification System for the Italian Language
2021
Text simplification is a non-trivial task that aims at reducing the linguistic complexity of written texts. Researchers have studied the problem by proposing new methodologies for addressing the English language, but other languages, like the Italian one, are almost unexplored. In this paper, we give a contribution to the enhancement of the Automated Text Simplification research by presenting a deep learning-based system, inspired by a state of the art system for the English language, capable of simplifying Italian texts. The system has been trained and tested by leveraging the Italian version of Newsela; it has shown promising results by achieving a SARI value of 30.17.
On parsing optimality for dictionary-based text compression—the Zip case
2013
Dictionary-based compression schemes are the most commonly used data compression schemes since they appeared in the foundational paper of Ziv and Lempel in 1977, and generally referred to as LZ77. Their work is the base of Zip, gZip, 7-Zip and many other compression software utilities. Some of these compression schemes use variants of the greedy approach to parse the text into dictionary phrases; others have left the greedy approach to improve the compression ratio. Recently, two bit-optimal parsing algorithms have been presented filling the gap between theory and best practice. We present a survey on the parsing problem for dictionary-based text compression, identifying noticeable results …
High Locality Representations for Automated Programming
2011
We study the locality of the genotype-phenotype mapping used in grammatical evolution (GE). GE is a variant of genetic programming that can evolve complete programs in an arbitrary language using a variable-length binary string. In contrast to standard GP, which applies search operators directly to phenotypes, GE uses an additional mapping and applies search operators to binary genotypes. Therefore, there is a large semantic gap between genotypes (binary strings) and phenotypes (programs or expressions). The case study shows that the mapping used in GE has low locality leading to low performance of standard mutation operators. The study at hand is an example of how basic design principles o…
LeSSS: Learned Shared Semantic Spaces for Relating Multi-Modal Representations of 3D Shapes
2015
In this paper, we propose a new method for structuring multi-modal representations of shapes according to semantic relations. We learn a metric that links semantically similar objects represented in different modalities. First, 3D-shapes are associated with textual labels by learning how textual attributes are related to the observed geometry. Correlations between similar labels are captured by simultaneously embedding labels and shape descriptors into a common latent space in which an inner product corresponds to similarity. The mapping is learned robustly by optimizing a rank-based loss function under a sparseness prior for the spectrum of the matrix of all classifiers. Second, we extend …
Project thesaurus 2020 — Linguistic and ontological aspects
2011
Structures and linguistic concepts of thesauri are analyzed and compared. Proposals for the improvement of thesauri are developed.
Using Topic Modeling Methods for Short-Text Data: A Comparative Analysis
2020
With the growth of online social network platforms and applications, large amounts of textual user-generated content are created daily in the form of comments, reviews, and short-text messages. As a result, users often find it challenging to discover useful information or more on the topic being discussed from such content. Machine learning and natural language processing algorithms are used to analyze the massive amount of textual social media data available online, including topic modeling techniques that have gained popularity in recent years. This paper investigates the topic modeling subject and its common application areas, methods, and tools. Also, we examine and compare five frequen…
Use of Machine Learning and Artificial Intelligence to Drive Personalized Medicine Approaches for Spine Care
2020
Personalized medicine is a new paradigm of healthcare in which interventions are based on individual patient characteristics rather than on “one-size-fits-all” guidelines. As epidemiological datasets continue to burgeon in size and complexity, powerful methods such as statistical machine learning and artificial intelligence (AI) become necessary to interpret and develop prognostic models from underlying data. Through such analysis, machine learning can be used to facilitate personalized medicine via its precise predictions. Additionally, other AI tools, such as natural language processing and computer vision, can play an instrumental part in personalizing the care provided to patients with …
Morphometrics of Second Iron Age ceramics - strengths, weaknesses, and comparison with traditional typology.
2014
12 pages; International audience; Although the potential of geometric morphometrics for the study of archaeological artefacts is recognised, quantitative evaluations of the concordance between such methods and traditional typology are rare. The present work seeks to fill this gap, using as a case study a corpus of 154 complete ceramic vessels from the Bibracte oppidum (France), the capital of the Celtic tribe Aedui from the Second Iron Age. Two outline-based approaches were selected: the Elliptic Fourier Analysis and the Discrete Cosine Transform. They were combined with numerous methods of standardisation/normalisation. Although standardisations may use either perimeter or surface, the res…