Search results for "Suffix"
showing 10 items of 75 documents
Constructing Antidictionaries of Long Texts in Output-Sensitive Space
2021
AbstractA wordxthat is absent from a wordyis calledminimalif all its proper factors occur iny. Given a collection ofkwordsy1, … ,ykover an alphabetΣ, we are asked to compute the set$\mathrm {M}^{\ell }_{\{y_1,\ldots ,y_k\}}$M{y1,…,yk}ℓof minimal absent words of length at mostℓof the collection {y1, … ,yk}. The set$\mathrm {M}^{\ell }_{\{y_1,\ldots ,y_k\}}$M{y1,…,yk}ℓcontains all the wordsxsuch thatxis absent from all the words of the collection while there existi,j, such that the maximal proper suffix ofxis a factor ofyiand the maximal proper prefix ofxis a factor ofyj. In data compression, this corresponds to computing the antidictionary ofkdocuments. In bioinformatics, it corresponds to c…
Efficient Algorithms for Sequence Analysis with Entropic Profiles
2017
Entropy, being closely related to repetitiveness and compressibility, is a widely used information-related measure to assess the degree of predictability of a sequence. Entropic profiles are based on information theory principles, and can be used to study the under-/over-representation of subwords, by also providing information about the scale of conserved DNA regions. Here, we focus on the algorithmic aspects related to entropic profiles. In particular, we propose linear time algorithms for their computation that rely on suffix-based data structures, more specifically on the truncated suffix tree (TST) and on the enhanced suffix array (ESA). We performed an extensive experimental campaign …
Parallel and Space-Efficient Construction of Burrows-Wheeler Transform and Suffix Array for Big Genome Data
2016
Next-generation sequencing technologies have led to the sequencing of more and more genomes, propelling related research into the era of big data. In this paper, we present ParaBWT, a parallelized Burrows-Wheeler transform (BWT) and suffix array construction algorithm for big genome data. In ParaBWT, we have investigated a progressive construction approach to constructing the BWT of single genome sequences in linear space complexity, but with a small constant factor. This approach has been further parallelized using multi-threading based on a master-slave coprocessing model. After gaining the BWT, the suffix array is constructed in a memory-efficient manner. The performance of ParaBWT has b…
From engl-isc to whatever-ish: a corpus-based investigation of -ish derivation in the history of English
2020
Drawing on a wide array of historical and contemporary corpora, this article provides one of the first empirical analyses of the intricately related functional changes that -ish underwent in the course of English language history. By investigating the distribution of -ish formations, the analysis sheds light on the productivity of the suffix, which does not only become evident in the numerous hapax legomena, but also in the trajectory of change itself in which -ish occurs with ever new base categories and new functions. Moreover, the article revisits theoretical claims made in the literature about the diachronic development and synchronic properties of -ish and reassesses them in the light …
El diminutivo en el español de Santo Domingo
2016
Esta investigación analiza el uso del sufijo diminutivo en un corpus oral de jóvenes de la República Dominicana. El material procede de la transcripción de veinte entrevistas orales realizadas en los años noventa en Santo Domingo. En este estudio se realiza un análisis de las ocurrencias documentadas, su morfología, sus preferencias en cuanto a la selección de las clases de palabras que se toman como base para la formación de diminutivos, sus posibles valores semánticos y comunicativos, y, por último, se determina la frecuencia de uso del diminutivo en función del sexo de los hablantes. The aim of this research is to analyse the use of the diminutive suffix in an oral corpus of young people…
Computing the Original eBWT Faster, Simpler, and with Less Memory
2021
Mantaci et al. [TCS 2007] defined the \(\mathrm {eBWT}\) to extend the definition of the \(\mathrm {BWT}\) to a collection of strings. However, since this introduction, it has been used more generally to describe any \(\mathrm {BWT}\) of a collection of strings, and the fundamental property of the original definition (i.e., the independence from the input order) is frequently disregarded. In this paper, we propose a simple linear-time algorithm for the construction of the original \(\mathrm {eBWT}\), which does not require the preprocessing of Bannai et al. [CPM 2021]. As a byproduct, we obtain the first linear-time algorithm for computing the \(\mathrm {BWT}\) of a single string that uses …
Neurocognitive processing of auditorily and visually presented inflected words and pseudowords: Evidence from a morphologically rich language
2009
The aim of the study was to investigate how the input modality affects the processing of a morphologically complex word. The processing of Finnish inflected vs. monomorphemic words and pseudowords was examined during a lexical decision task, using behavioral responses and event-related potentials. The stimuli were presented in two modalities, visually and auditorily, to two groups of participants. Half of the words and pseudowords carried a case-inflection. At the behavioral level, the inflected words elicited a processing cost with longer decision latencies and higher error rates. At the neural level, pseudowords elicited an N400 effect, which was more pronounced in the visual modality. In…
Event-related potential (ERP) responses to violations of inflectional and derivational rules of Finnish
2007
Event-related potentials (ERP) were used to investigate the electrophysiological correlates of inflectional and derivational morphology. The participants were presented with visual sentences containing critical words in which either inflectional, derivational or both rules (combined violation) of Finnish were violated. Inflectional anomalies violated a number agreement of a noun with a previous auxiliary word. Derivational violations included a word-internal selectional restriction violation, i.e., a root and suffix category violation. Combined violations contained both a number and a category violation. The phonemic length of the critical words was controlled. Inflectional violations elici…
Variable-order reference-free variant discovery with the Burrows-Wheeler Transform
2020
Abstract Background In [Prezza et al., AMB 2019], a new reference-free and alignment-free framework for the detection of SNPs was suggested and tested. The framework, based on the Burrows-Wheeler Transform (BWT), significantly improves sensitivity and precision of previous de Bruijn graphs based tools by overcoming several of their limitations, namely: (i) the need to establish a fixed value, usually small, for the order k, (ii) the loss of important information such as k-mer coverage and adjacency of k-mers within the same read, and (iii) bad performance in repeated regions longer than k bases. The preliminary tool, however, was able to identify only SNPs and it was too slow and memory con…
Beyond decomposition: Processing zero-derivations in English visual word recognition
2019
Four experiments investigate the effects of covert morphological complexity during visual word recognition. Zero-derivations occur in English in which a change of word class occurs without any change in surface form (e.g., a boat-to boat; to soak-a soak). Boat is object-derived and is a basic noun (N), whereas soak is action-derived and is a basic verb (V). As the suffix {-ing} is only attached to verbs, deriving boating from its base, requires two steps, boat(N) > boat(V) > boating(V), while soaking can be derived in one step from soak(V). Experiments 1 to 3 used masked priming at different prime durations to test matched sets of one- and two-step verbs for morphological (soaking-SOA…