Search results for "Suffix"
showing 10 items of 75 documents
Sturmian Graphs and a conjecture of Moser
2004
In this paper we define Sturmian graphs and we prove that all of them have a “counting” property. We show deep connections between this counting property and two conjectures, by Moser and by Zaremba, on the continued fraction expansion of real numbers. These graphs turn out to be the underlying graphs of CDAWGs of central Sturmian words. We show also that, analogously to the case of Sturmian words, these graphs converge to infinite ones.
On the longest common factor problem
2008
The Longest Common Factor (LCF) of a set of strings is a well studied problem having a wide range of applications in Bioinformatics: from microarrays to DNA sequences analysis. This problem has been solved by Hui (2000) who uses a famous constant-time solution to the Lowest Common Ancestor (LCA) problem in trees coupled with use of suffix trees. A data structure for the LCA problem, although linear in space and construction time, introduces a multiplicative constant in both space and time that reduces the range of applications in many biological applications. In this article we present a new method for solving the LCF problem using the suffix tree structure with an auxiliary array that take…
On the Construction of Classes of Suffix Trees for Square Matrices: Algorithms and Applications
1996
AbstractWe provide a uniform framework for the study of index data structures for a two-dimensional matrixTEXT[1:n, 1:n] whose entries are drawn from an ordered alphabetΣ. An index forTEXTcan be informally seen as the two-dimensional analog of the suffix tree for a string. It allows on-line searches and statistics to be performed onTEXTby representing compactly theΘ(n3) square submatrices ofTEXTin optimalO(n2) space. We identify 4n−1families of indices forTEXT, each containing ∏ni=1(2i−1)! isomorphic data structures. We also develop techniques leading to a single algorithm that efficiently builds any index in any family inO(n2logn) time andO(n2) space. Such an algorithm improves in various …
Effects of reading proficiency and of base and whole-word frequency on reading noun- and verb-derived words: An eye-tracking study in Italian primary…
2018
The aim of this study is to assess the role of readers’ proficiency and of the base-word distributional properties on eye-movement behavior. Sixty-two typically developing children, attending 3rd, 4th, and 5th grade, were asked to read derived words in a sentence context. Target words were nouns derived from noun bases (e.g., umorista, ‘humorist’), which in Italian are shared by few derived words, and nouns derived from verb bases (e.g., punizione, ‘punishment’), which are shared by about 50 different inflected forms and several derived words. Data shows that base and word frequency affected first-fixation duration for nouns derived from noun bases, but in an opposite way: base frequency ha…
Inducing the Lyndon Array
2019
In this paper we propose a variant of the induced suffix sorting algorithm by Nong (TOIS, 2013) that computes simultaneously the Lyndon array and the suffix array of a text in $O(n)$ time using $\sigma + O(1)$ words of working space, where $n$ is the length of the text and $\sigma$ is the alphabet size. Our result improves the previous best space requirement for linear time computation of the Lyndon array. In fact, all the known linear algorithms for Lyndon array computation use suffix sorting as a preprocessing step and use $O(n)$ words of working space in addition to the Lyndon array and suffix array. Experimental results with real and synthetic datasets show that our algorithm is not onl…
Sorting suffixes of a text via its Lyndon Factorization
2013
The process of sorting the suffixes of a text plays a fundamental role in Text Algorithms. They are used for instance in the constructions of the Burrows-Wheeler transform and the suffix array, widely used in several fields of Computer Science. For this reason, several recent researches have been devoted to finding new strategies to obtain effective methods for such a sorting. In this paper we introduce a new methodology in which an important role is played by the Lyndon factorization, so that the local suffixes inside factors detected by this factorization keep their mutual order when extended to the suffixes of the whole word. This property suggests a versatile technique that easily can b…
Adaptive learning of compressible strings
2020
Suppose an oracle knows a string $S$ that is unknown to us and that we want to determine. The oracle can answer queries of the form "Is $s$ a substring of $S$?". In 1995, Skiena and Sundaram showed that, in the worst case, any algorithm needs to ask the oracle $\sigma n/4 -O(n)$ queries in order to be able to reconstruct the hidden string, where $\sigma$ is the size of the alphabet of $S$ and $n$ its length, and gave an algorithm that spends $(\sigma-1)n+O(\sigma \sqrt{n})$ queries to reconstruct $S$. The main contribution of our paper is to improve the above upper-bound in the context where the string is compressible. We first present a universal algorithm that, given a (computable) compre…
Uncommon Suffix Tries
2011
Common assumptions on the source producing the words inserted in a suffix trie with $n$ leaves lead to a $\log n$ height and saturation level. We provide an example of a suffix trie whose height increases faster than a power of $n$ and another one whose saturation level is negligible with respect to $\log n$. Both are built from VLMC (Variable Length Markov Chain) probabilistic sources; they are easily extended to families of sources having the same properties. The first example corresponds to a ''logarithmic infinite comb'' and enjoys a non uniform polynomial mixing. The second one corresponds to a ''factorial infinite comb'' for which mixing is uniform and exponential.
Lightweight LCP construction for very large collections of strings
2016
The longest common prefix array is a very advantageous data structure that, combined with the suffix array and the Burrows-Wheeler transform, allows to efficiently compute some combinatorial properties of a string useful in several applications, especially in biological contexts. Nowadays, the input data for many problems are big collections of strings, for instance the data coming from "next-generation" DNA sequencing (NGS) technologies. In this paper we present the first lightweight algorithm (called extLCP) for the simultaneous computation of the longest common prefix array and the Burrows-Wheeler transform of a very large collection of strings having any length. The computation is reali…
The sequence of open and closed prefixes of a Sturmian word
2017
A finite word is closed if it contains a factor that occurs both as a prefix and as a suffix but does not have internal occurrences, otherwise it is open. We are interested in the {\it oc-sequence} of a word, which is the binary sequence whose $n$-th element is $0$ if the prefix of length $n$ of the word is open, or $1$ if it is closed. We exhibit results showing that this sequence is deeply related to the combinatorial and periodic structure of a word. In the case of Sturmian words, we show that these are uniquely determined (up to renaming letters) by their oc-sequence. Moreover, we prove that the class of finite Sturmian words is a maximal element with this property in the class of binar…