Search results for " Data Structures"
showing 10 items of 80 documents
On the Greedy Algorithm for the Shortest Common Superstring Problem with Reversals
2015
We study a variation of the classical Shortest Common Superstring (SCS) problem in which a shortest superstring of a finite set of strings $S$ is sought containing as a factor every string of $S$ or its reversal. We call this problem Shortest Common Superstring with Reversals (SCS-R). This problem has been introduced by Jiang et al., who designed a greedy-like algorithm with length approximation ratio $4$. In this paper, we show that a natural adaptation of the classical greedy algorithm for SCS has (optimal) compression ratio $\frac12$, i.e., the sum of the overlaps in the output string is at least half the sum of the overlaps in an optimal solution. We also provide a linear-time implement…
Inducing the Lyndon Array
2019
In this paper we propose a variant of the induced suffix sorting algorithm by Nong (TOIS, 2013) that computes simultaneously the Lyndon array and the suffix array of a text in $O(n)$ time using $\sigma + O(1)$ words of working space, where $n$ is the length of the text and $\sigma$ is the alphabet size. Our result improves the previous best space requirement for linear time computation of the Lyndon array. In fact, all the known linear algorithms for Lyndon array computation use suffix sorting as a preprocessing step and use $O(n)$ words of working space in addition to the Lyndon array and suffix array. Experimental results with real and synthetic datasets show that our algorithm is not onl…
A Big Data Approach for Sequences Indexing on the Cloud via Burrows Wheeler Transform
2020
Indexing sequence data is important in the context of Precision Medicine, where large amounts of ``omics'' data have to be daily collected and analyzed in order to categorize patients and identify the most effective therapies. Here we propose an algorithm for the computation of Burrows Wheeler transform relying on Big Data technologies, i.e., Apache Spark and Hadoop. Our approach is the first that distributes the index computation and not only the input dataset, allowing to fully benefit of the available cloud resources.
Sorting suffixes of a text via its Lyndon Factorization
2013
The process of sorting the suffixes of a text plays a fundamental role in Text Algorithms. They are used for instance in the constructions of the Burrows-Wheeler transform and the suffix array, widely used in several fields of Computer Science. For this reason, several recent researches have been devoted to finding new strategies to obtain effective methods for such a sorting. In this paper we introduce a new methodology in which an important role is played by the Lyndon factorization, so that the local suffixes inside factors detected by this factorization keep their mutual order when extended to the suffixes of the whole word. This property suggests a versatile technique that easily can b…
Novel Results on the Number of Runs of the Burrows-Wheeler-Transform
2021
The Burrows-Wheeler-Transform (BWT), a reversible string transformation, is one of the fundamental components of many current data structures in string processing. It is central in data compression, as well as in efficient query algorithms for sequence data, such as webpages, genomic and other biological sequences, or indeed any textual data. The BWT lends itself well to compression because its number of equal-letter-runs (usually referred to as $r$) is often considerably lower than that of the original string; in particular, it is well suited for strings with many repeated factors. In fact, much attention has been paid to the $r$ parameter as measure of repetitiveness, especially to evalua…
Adaptive learning of compressible strings
2020
Suppose an oracle knows a string $S$ that is unknown to us and that we want to determine. The oracle can answer queries of the form "Is $s$ a substring of $S$?". In 1995, Skiena and Sundaram showed that, in the worst case, any algorithm needs to ask the oracle $\sigma n/4 -O(n)$ queries in order to be able to reconstruct the hidden string, where $\sigma$ is the size of the alphabet of $S$ and $n$ its length, and gave an algorithm that spends $(\sigma-1)n+O(\sigma \sqrt{n})$ queries to reconstruct $S$. The main contribution of our paper is to improve the above upper-bound in the context where the string is compressible. We first present a universal algorithm that, given a (computable) compre…
Uncommon Suffix Tries
2011
Common assumptions on the source producing the words inserted in a suffix trie with $n$ leaves lead to a $\log n$ height and saturation level. We provide an example of a suffix trie whose height increases faster than a power of $n$ and another one whose saturation level is negligible with respect to $\log n$. Both are built from VLMC (Variable Length Markov Chain) probabilistic sources; they are easily extended to families of sources having the same properties. The first example corresponds to a ''logarithmic infinite comb'' and enjoys a non uniform polynomial mixing. The second one corresponds to a ''factorial infinite comb'' for which mixing is uniform and exponential.
Adaptive Lower Bound for Testing Monotonicity on the Line
2018
In the property testing model, the task is to distinguish objects possessing some property from the objects that are far from it. One of such properties is monotonicity, when the objects are functions from one poset to another. This is an active area of research. In this paper we study query complexity of $\epsilon$-testing monotonicity of a function $f\colon [n]\to[r]$. All our lower bounds are for adaptive two-sided testers. * We prove a nearly tight lower bound for this problem in terms of $r$. The bound is $\Omega(\frac{\log r}{\log \log r})$ when $\epsilon = 1/2$. No previous satisfactory lower bound in terms of $r$ was known. * We completely characterise query complexity of this probl…
Fast Matrix Multiplication: Limitations of the Laser Method
2014
Until a few years ago, the fastest known matrix multiplication algorithm, due to Coppersmith and Winograd (1990), ran in time $O(n^{2.3755})$. Recently, a surge of activity by Stothers, Vassilevska-Williams, and Le Gall has led to an improved algorithm running in time $O(n^{2.3729})$. These algorithms are obtained by analyzing higher and higher tensor powers of a certain identity of Coppersmith and Winograd. We show that this exact approach cannot result in an algorithm with running time $O(n^{2.3725})$, and identify a wide class of variants of this approach which cannot result in an algorithm with running time $O(n^{2.3078})$; in particular, this approach cannot prove the conjecture that f…
Quantum versus Classical Online Streaming Algorithms with Advice
2018
We consider online algorithms with respect to the competitive ratio. Here, we investigate quantum and classical one-way automata with non-constant size of memory (streaming algorithms) as a model for online algorithms. We construct problems that can be solved by quantum online streaming algorithms better than by classical ones in a case of logarithmic or sublogarithmic size of memory, even if classical online algorithms get advice bits. Furthermore, we show that a quantum online algorithm with a constant number of qubits can be better than any deterministic online algorithm with a constant number of advice bits and unlimited computational power.