Search results for "Suffix"

showing 10 items of 75 documents

Constructing Antidictionaries of Long Texts in Output-Sensitive Space

2021

AbstractA wordxthat is absent from a wordyis calledminimalif all its proper factors occur iny. Given a collection ofkwordsy1, … ,ykover an alphabetΣ, we are asked to compute the set$\mathrm {M}^{\ell }_{\{y_1,\ldots ,y_k\}}$M{y1,…,yk}ℓof minimal absent words of length at mostℓof the collection {y1, … ,yk}. The set$\mathrm {M}^{\ell }_{\{y_1,\ldots ,y_k\}}$M{y1,…,yk}ℓcontains all the wordsxsuch thatxis absent from all the words of the collection while there existi,j, such that the maximal proper suffix ofxis a factor ofyiand the maximal proper prefix ofxis a factor ofyj. In data compression, this corresponds to computing the antidictionary ofkdocuments. In bioinformatics, it corresponds to c…

0301 basic medicineAntidictionarySettore INF/01 - InformaticaOutput sensitive algorithm0102 computer and information sciencesSpace (mathematics)01 natural sciencesTheoretical Computer ScienceString algorithmPrefixSet (abstract data type)Combinatorics03 medical and health sciences030104 developmental biologyComputational Theory and Mathematics010201 computation theory & mathematicsData compressionOutput-sensitive algorithm[INFO]Computer Science [cs]SuffixAlphabetAbsent wordWord (group theory)MathematicsTheory of Computing Systems
researchProduct

Efficient Algorithms for Sequence Analysis with Entropic Profiles

2017

Entropy, being closely related to repetitiveness and compressibility, is a widely used information-related measure to assess the degree of predictability of a sequence. Entropic profiles are based on information theory principles, and can be used to study the under-/over-representation of subwords, by also providing information about the scale of conserved DNA regions. Here, we focus on the algorithmic aspects related to entropic profiles. In particular, we propose linear time algorithms for their computation that rely on suffix-based data structures, more specifically on the truncated suffix tree (TST) and on the enhanced suffix array (ESA). We performed an extensive experimental campaign …

0301 basic medicineCompressed suffix arrayTheoretical computer scienceEntropySuffix tree0206 medical engineeringGeneralized suffix tree02 engineering and technologyString searching algorithmInformation theorylaw.invention03 medical and health scienceslawGeneticsAnimalsHumansMathematicsApplied MathematicsSuffix arrayComputational BiologyDNASequence Analysis DNAData structure030104 developmental biologySuffixAlignment free Entropy Sequence analysis Sequence comparisonAlgorithms020602 bioinformaticsBiotechnologyIEEE/ACM Transactions on Computational Biology and Bioinformatics
researchProduct

Parallel and Space-Efficient Construction of Burrows-Wheeler Transform and Suffix Array for Big Genome Data

2016

Next-generation sequencing technologies have led to the sequencing of more and more genomes, propelling related research into the era of big data. In this paper, we present ParaBWT, a parallelized Burrows-Wheeler transform (BWT) and suffix array construction algorithm for big genome data. In ParaBWT, we have investigated a progressive construction approach to constructing the BWT of single genome sequences in linear space complexity, but with a small constant factor. This approach has been further parallelized using multi-threading based on a master-slave coprocessing model. After gaining the BWT, the suffix array is constructed in a memory-efficient manner. The performance of ParaBWT has b…

0301 basic medicineTheoretical computer scienceBurrows–Wheeler transformComputer scienceGenomicsData_CODINGANDINFORMATIONTHEORYParallel computingGenomelaw.invention03 medical and health scienceslawGeneticsHumansEnsemblMulti-core processorApplied MathematicsLinear spaceSuffix arrayChromosome MappingHigh-Throughput Nucleotide SequencingGenomicsSequence Analysis DNA030104 developmental biologyAlgorithmsBiotechnologyReference genomeIEEE/ACM Transactions on Computational Biology and Bioinformatics
researchProduct

From engl-isc to whatever-ish: a corpus-based investigation of -ish derivation in the history of English

2020

Drawing on a wide array of historical and contemporary corpora, this article provides one of the first empirical analyses of the intricately related functional changes that -ish underwent in the course of English language history. By investigating the distribution of -ish formations, the analysis sheds light on the productivity of the suffix, which does not only become evident in the numerous hapax legomena, but also in the trajectory of change itself in which -ish occurs with ever new base categories and new functions. Moreover, the article revisits theoretical claims made in the literature about the diachronic development and synchronic properties of -ish and reassesses them in the light …

050101 languages & linguisticsLinguistics and LanguageHistoryHapax legomenon05 social sciencesEnglish languageLanguage and LinguisticsLinguistics030507 speech-language pathology & audiology03 medical and health sciencesHistory of EnglishCorpus based0501 psychology and cognitive sciencesSuffix0305 other medical scienceProductivity (linguistics)English Language and Linguistics
researchProduct

El diminutivo en el español de Santo Domingo

2016

Esta investigación analiza el uso del sufijo diminutivo en un corpus oral de jóvenes de la República Dominicana. El material procede de la transcripción de veinte entrevistas orales realizadas en los años noventa en Santo Domingo. En este estudio se realiza un análisis de las ocurrencias documentadas, su morfología, sus preferencias en cuanto a la selección de las clases de palabras que se toman como base para la formación de diminutivos, sus posibles valores semánticos y comunicativos, y, por último, se determina la frecuencia de uso del diminutivo en función del sexo de los hablantes. The aim of this research is to analyse the use of the diminutive suffix in an oral corpus of young people…

060201 languages & linguisticsLiteratureSufijos diminutivosLinguistics and LanguageHistorybusiness.industryFrequency of use06 humanities and the artsjóvenes dominicanos.060202 literary studiesLanguage and LinguisticsLinguisticsDiminutivediminutive suffixesDominican youngsters.0602 languages and literatureSelection (linguistics)corpus oralSuffixoral corpusbusiness
researchProduct

Computing the Original eBWT Faster, Simpler, and with Less Memory

2021

Mantaci et al. [TCS 2007] defined the \(\mathrm {eBWT}\) to extend the definition of the \(\mathrm {BWT}\) to a collection of strings. However, since this introduction, it has been used more generally to describe any \(\mathrm {BWT}\) of a collection of strings, and the fundamental property of the original definition (i.e., the independence from the input order) is frequently disregarded. In this paper, we propose a simple linear-time algorithm for the construction of the original \(\mathrm {eBWT}\), which does not require the preprocessing of Bannai et al. [CPM 2021]. As a byproduct, we obtain the first linear-time algorithm for computing the \(\mathrm {BWT}\) of a single string that uses …

2019-20 coronavirus outbreakSpeedupString collectionsBig BWTSettore INF/01 - InformaticaSevere acute respiratory syndrome coronavirus 2 (SARS-CoV-2)String (computer science)Suffix arrayOrder (ring theory)omega-orderQuantitative Biology::GenomicsBurrows-Wheeler-TransformBurrows-Wheeler-Transform String collections SAIS Big BWT prefix-free parsing extended BWTlaw.inventionCombinatoricsprefix-free parsingSimple (abstract algebra)lawSAISSAIS algorithmIndependence (probability theory)extended BWTMathematics
researchProduct

Neurocognitive processing of auditorily and visually presented inflected words and pseudowords: Evidence from a morphologically rich language

2009

The aim of the study was to investigate how the input modality affects the processing of a morphologically complex word. The processing of Finnish inflected vs. monomorphemic words and pseudowords was examined during a lexical decision task, using behavioral responses and event-related potentials. The stimuli were presented in two modalities, visually and auditorily, to two groups of participants. Half of the words and pseudowords carried a case-inflection. At the behavioral level, the inflected words elicited a processing cost with longer decision latencies and higher error rates. At the neural level, pseudowords elicited an N400 effect, which was more pronounced in the visual modality. In…

AdultMale050105 experimental psychologyPsycholinguisticsYoung Adult03 medical and health sciencesCognition0302 clinical medicineEvent-related potentialInflectionReaction TimeLexical decision taskHumans0501 psychology and cognitive sciencesMolecular BiologyLanguageCommunicationPsycholinguisticsModality (human–computer interaction)business.industryGeneral Neuroscience05 social sciencesCognitionPseudowordAcoustic StimulationAuditory PerceptionVisual PerceptionFemaleNeurology (clinical)SuffixPsychologybusinessPhotic StimulationPsychomotor Performance030217 neurology & neurosurgeryDevelopmental BiologyCognitive psychologyBrain Research
researchProduct

Event-related potential (ERP) responses to violations of inflectional and derivational rules of Finnish

2007

Event-related potentials (ERP) were used to investigate the electrophysiological correlates of inflectional and derivational morphology. The participants were presented with visual sentences containing critical words in which either inflectional, derivational or both rules (combined violation) of Finnish were violated. Inflectional anomalies violated a number agreement of a noun with a previous auxiliary word. Derivational violations included a word-internal selectional restriction violation, i.e., a root and suffix category violation. Combined violations contained both a number and a category violation. The phonemic length of the critical words was controlled. Inflectional violations elici…

AdultMaleRoot (linguistics)media_common.quotation_subjectContingent Negative Variation050105 experimental psychologyPsycholinguistics03 medical and health sciences0302 clinical medicineEvent-related potentialNounInflectionReaction TimeHumans0501 psychology and cognitive sciencesEvoked PotentialsMolecular BiologyFinlandmedia_commonAnalysis of VarianceBrain MappingCommunicationP600Psycholinguisticsbusiness.industryGeneral Neuroscience05 social sciencesElectroencephalographyMiddle Aged16. Peace & justiceAgreementSemanticsFemaleNeurology (clinical)SuffixComprehensionbusinessPsychologyPhotic Stimulation030217 neurology & neurosurgeryDevelopmental BiologyCognitive psychologyBrain Research
researchProduct

Variable-order reference-free variant discovery with the Burrows-Wheeler Transform

2020

Abstract Background In [Prezza et al., AMB 2019], a new reference-free and alignment-free framework for the detection of SNPs was suggested and tested. The framework, based on the Burrows-Wheeler Transform (BWT), significantly improves sensitivity and precision of previous de Bruijn graphs based tools by overcoming several of their limitations, namely: (i) the need to establish a fixed value, usually small, for the order k, (ii) the loss of important information such as k-mer coverage and adjacency of k-mers within the same read, and (iii) bad performance in repeated regions longer than k bases. The preliminary tool, however, was able to identify only SNPs and it was too slow and memory con…

Burrows–Wheeler transformComputer science[SDV]Life Sciences [q-bio]Value (computer science)SNPAssembly-free0102 computer and information scienceslcsh:Computer applications to medicine. Medical informatics01 natural sciencesBiochemistryPolymorphism Single Nucleotide03 medical and health sciencesBWTChromosome (genetic algorithm)Structural BiologyHumansSensitivity (control systems)Molecular Biologylcsh:QH301-705.5Alignment-free; Assembly-free; BWT; INDEL; SNP030304 developmental biologyAlignment-free; Assembly-free; BWT; INDEL; SNP;De Bruijn sequence0303 health sciencesSettore INF/01 - InformaticaAlignment-freeApplied MathematicsResearchGenomicsSequence Analysis DNAINDELData structureGraphComputer Science ApplicationsVariable (computer science)lcsh:Biology (General)010201 computation theory & mathematicsAdjacency listlcsh:R858-859.7Suffix[INFO.INFO-BI]Computer Science [cs]/Bioinformatics [q-bio.QM]AlgorithmAlgorithmsBMC Bioinformatics
researchProduct

Beyond decomposition: Processing zero-derivations in English visual word recognition

2019

Four experiments investigate the effects of covert morphological complexity during visual word recognition. Zero-derivations occur in English in which a change of word class occurs without any change in surface form (e.g., a boat-to boat; to soak-a soak). Boat is object-derived and is a basic noun (N), whereas soak is action-derived and is a basic verb (V). As the suffix {-ing} is only attached to verbs, deriving boating from its base, requires two steps, boat(N) > boat(V) > boating(V), while soaking can be derived in one step from soak(V). Experiments 1 to 3 used masked priming at different prime durations to test matched sets of one- and two-step verbs for morphological (soaking-SOA…

Cognitive NeuroscienceSpeech recognitionExperimental and Cognitive PsychologyVerbNeuropsychological TestsVocabulary050105 experimental psychology03 medical and health sciencesPrime (symbol)0302 clinical medicineNounReaction TimeHumans0501 psychology and cognitive sciencesLanguageBrain Mapping05 social sciencesPart of speechZero (linguistics)SemanticsNeuropsychology and Physiological PsychologyPattern Recognition VisualCovertSuffixPsychologyPriming (psychology)030217 neurology & neurosurgeryPhotic StimulationCortex
researchProduct