Search results for " Data Structures"

showing 10 items of 80 documents

On the Greedy Algorithm for the Shortest Common Superstring Problem with Reversals

2015

We study a variation of the classical Shortest Common Superstring (SCS) problem in which a shortest superstring of a finite set of strings $S$ is sought containing as a factor every string of $S$ or its reversal. We call this problem Shortest Common Superstring with Reversals (SCS-R). This problem has been introduced by Jiang et al., who designed a greedy-like algorithm with length approximation ratio $4$. In this paper, we show that a natural adaptation of the classical greedy algorithm for SCS has (optimal) compression ratio $\frac12$, i.e., the sum of the overlaps in the output string is at least half the sum of the overlaps in an optimal solution. We also provide a linear-time implement…

FOS: Computer and information sciences0102 computer and information sciences02 engineering and technologyInformation System01 natural sciencesString (physics)Theoretical Computer ScienceCombinatoricsHigh Energy Physics::TheoryAnalysis of algorithmGreedy algorithmComputer Science - Data Structures and Algorithms0202 electrical engineering electronic engineering information engineeringData Structures and Algorithms (cs.DS)Greedy algorithmFinite setAnalysis of algorithmsMathematicsSuperstring theoryShortest Common SuperstringComputer Science Applications1707 Computer Vision and Pattern RecognitionComputer Science ApplicationsReversalShortest Path Faster Algorithm010201 computation theory & mathematicsCompression ratioSignal Processing020201 artificial intelligence & image processingK shortest path routingInformation Systems
researchProduct

Inducing the Lyndon Array

2019

In this paper we propose a variant of the induced suffix sorting algorithm by Nong (TOIS, 2013) that computes simultaneously the Lyndon array and the suffix array of a text in $O(n)$ time using $\sigma + O(1)$ words of working space, where $n$ is the length of the text and $\sigma$ is the alphabet size. Our result improves the previous best space requirement for linear time computation of the Lyndon array. In fact, all the known linear algorithms for Lyndon array computation use suffix sorting as a preprocessing step and use $O(n)$ words of working space in addition to the Lyndon array and suffix array. Experimental results with real and synthetic datasets show that our algorithm is not onl…

FOS: Computer and information sciences050101 languages & linguisticsComputer scienceComputationInduced suffix sorting02 engineering and technologySpace (mathematics)law.inventionSuffix sortinglawSuffix arrayComputer Science - Data Structures and Algorithms0202 electrical engineering electronic engineering information engineeringData_FILESPreprocessorData Structures and Algorithms (cs.DS)0501 psychology and cognitive sciencesComputer Science::Data Structures and AlgorithmsTime complexitySettore ING-INF/05 - Sistemi Di Elaborazione Delle InformazioniSettore INF/01 - Informatica05 social sciencesLightweight algorithmSuffix arraySigmaComputer Science::Computation and Language (Computational Linguistics and Natural Language and Speech Processing)Induced suffix sorting; Lightweight algorithms; Lyndon array; Suffix arrayWorking spaceLyndon arrayLightweight algorithms020201 artificial intelligence & image processingAlgorithmComputer Science::Formal Languages and Automata Theory
researchProduct

A Big Data Approach for Sequences Indexing on the Cloud via Burrows Wheeler Transform

2020

Indexing sequence data is important in the context of Precision Medicine, where large amounts of ``omics'' data have to be daily collected and analyzed in order to categorize patients and identify the most effective therapies. Here we propose an algorithm for the computation of Burrows Wheeler transform relying on Big Data technologies, i.e., Apache Spark and Hadoop. Our approach is the first that distributes the index computation and not only the input dataset, allowing to fully benefit of the available cloud resources.

FOS: Computer and information sciencesArtificial Intelligence (cs.AI)Computer Science - Distributed Parallel and Cluster ComputingComputer Science - Artificial IntelligenceComputer Science - Data Structures and AlgorithmsData_FILESData Structures and Algorithms (cs.DS)Distributed Parallel and Cluster Computing (cs.DC)
researchProduct

Sorting suffixes of a text via its Lyndon Factorization

2013

The process of sorting the suffixes of a text plays a fundamental role in Text Algorithms. They are used for instance in the constructions of the Burrows-Wheeler transform and the suffix array, widely used in several fields of Computer Science. For this reason, several recent researches have been devoted to finding new strategies to obtain effective methods for such a sorting. In this paper we introduce a new methodology in which an important role is played by the Lyndon factorization, so that the local suffixes inside factors detected by this factorization keep their mutual order when extended to the suffixes of the whole word. This property suggests a versatile technique that easily can b…

FOS: Computer and information sciencesBWTLyndon FactorizationSettore INF/01 - InformaticaSorting Suffixes; Lyndon Factorization; Lyndon WordsSuffix arrayComputer Science - Data Structures and AlgorithmsData_FILESData Structures and Algorithms (cs.DS)Lyndon wordSorting suffixeSorting SuffixesLyndon Words
researchProduct

Novel Results on the Number of Runs of the Burrows-Wheeler-Transform

2021

The Burrows-Wheeler-Transform (BWT), a reversible string transformation, is one of the fundamental components of many current data structures in string processing. It is central in data compression, as well as in efficient query algorithms for sequence data, such as webpages, genomic and other biological sequences, or indeed any textual data. The BWT lends itself well to compression because its number of equal-letter-runs (usually referred to as $r$) is often considerably lower than that of the original string; in particular, it is well suited for strings with many repeated factors. In fact, much attention has been paid to the $r$ parameter as measure of repetitiveness, especially to evalua…

FOS: Computer and information sciencesBurrows–Wheeler transformSettore INF/01 - InformaticaCombinatorics on wordsFormal Languages and Automata Theory (cs.FL)Computer scienceString (computer science)Search engine indexingCompressed data structuresComputer Science - Formal Languages and Automata TheoryString indexingData structureMeasure (mathematics)Burrows-Wheeler-TransformRepetitivenessCombinatorics on wordsBurrows-Wheeler-Transform Compressed data structures String indexing Repetitiveness Combinatorics on wordsTransformation (function)Computer Science - Data Structures and AlgorithmsData Structures and Algorithms (cs.DS)AlgorithmData compression
researchProduct

Adaptive learning of compressible strings

2020

Suppose an oracle knows a string $S$ that is unknown to us and that we want to determine. The oracle can answer queries of the form "Is $s$ a substring of $S$?". In 1995, Skiena and Sundaram showed that, in the worst case, any algorithm needs to ask the oracle $\sigma n/4 -O(n)$ queries in order to be able to reconstruct the hidden string, where $\sigma$ is the size of the alphabet of $S$ and $n$ its length, and gave an algorithm that spends $(\sigma-1)n+O(\sigma \sqrt{n})$ queries to reconstruct $S$. The main contribution of our paper is to improve the above upper-bound in the context where the string is compressible. We first present a universal algorithm that, given a (computable) compre…

FOS: Computer and information sciencesCentroid decompositionGeneral Computer ScienceString compressionAdaptive learningKolmogorov complexityContext (language use)Data_CODINGANDINFORMATIONTHEORYString reconstructionTheoretical Computer ScienceCombinatoricsString reconstruction; String learning; Adaptive learning; Kolmogorov complexity; String compression; Lempel-Ziv; Centroid decomposition; Suffix treeSuffix treeIntegerComputer Science - Data Structures and AlgorithmsOrder (group theory)Data Structures and Algorithms (cs.DS)Adaptive learning; Centroid decomposition; Kolmogorov complexity; Lempel-Ziv; String compression; String learning; String reconstruction; Suffix treeTime complexityComputer Science::DatabasesMathematicsLempel-ZivSettore INF/01 - InformaticaLinear spaceString (computer science)SubstringBounded functionString learningTheoretical Computer Science
researchProduct

Uncommon Suffix Tries

2011

Common assumptions on the source producing the words inserted in a suffix trie with $n$ leaves lead to a $\log n$ height and saturation level. We provide an example of a suffix trie whose height increases faster than a power of $n$ and another one whose saturation level is negligible with respect to $\log n$. Both are built from VLMC (Variable Length Markov Chain) probabilistic sources; they are easily extended to families of sources having the same properties. The first example corresponds to a ''logarithmic infinite comb'' and enjoys a non uniform polynomial mixing. The second one corresponds to a ''factorial infinite comb'' for which mixing is uniform and exponential.

FOS: Computer and information sciencesCompressed suffix arrayPolynomialLogarithmGeneral MathematicsSuffix treevariable length Markov chain[INFO.INFO-DS]Computer Science [cs]/Data Structures and Algorithms [cs.DS]Generalized suffix treeprobabilistic source0102 computer and information sciences02 engineering and technologysuffix trie01 natural scienceslaw.inventionCombinatoricslawComputer Science - Data Structures and AlgorithmsTrieFOS: Mathematics0202 electrical engineering electronic engineering information engineeringData Structures and Algorithms (cs.DS)Mixing (physics)[ INFO.INFO-DS ] Computer Science [cs]/Data Structures and Algorithms [cs.DS]MathematicsDiscrete mathematicsApplied MathematicsProbability (math.PR)020206 networking & telecommunicationssuffix trie.Computer Graphics and Computer-Aided Design[MATH.MATH-PR]Mathematics [math]/Probability [math.PR]010201 computation theory & mathematicsmixing properties60J05 37E05Suffix[ MATH.MATH-PR ] Mathematics [math]/Probability [math.PR]Mathematics - ProbabilitySoftware
researchProduct

Adaptive Lower Bound for Testing Monotonicity on the Line

2018

In the property testing model, the task is to distinguish objects possessing some property from the objects that are far from it. One of such properties is monotonicity, when the objects are functions from one poset to another. This is an active area of research. In this paper we study query complexity of $\epsilon$-testing monotonicity of a function $f\colon [n]\to[r]$. All our lower bounds are for adaptive two-sided testers. * We prove a nearly tight lower bound for this problem in terms of $r$. The bound is $\Omega(\frac{\log r}{\log \log r})$ when $\epsilon = 1/2$. No previous satisfactory lower bound in terms of $r$ was known. * We completely characterise query complexity of this probl…

FOS: Computer and information sciencesComputer Science - Computational Complexity000 Computer science knowledge general worksComputer Science - Data Structures and AlgorithmsComputer ScienceData Structures and Algorithms (cs.DS)Computational Complexity (cs.CC)
researchProduct

Fast Matrix Multiplication: Limitations of the Laser Method

2014

Until a few years ago, the fastest known matrix multiplication algorithm, due to Coppersmith and Winograd (1990), ran in time $O(n^{2.3755})$. Recently, a surge of activity by Stothers, Vassilevska-Williams, and Le Gall has led to an improved algorithm running in time $O(n^{2.3729})$. These algorithms are obtained by analyzing higher and higher tensor powers of a certain identity of Coppersmith and Winograd. We show that this exact approach cannot result in an algorithm with running time $O(n^{2.3725})$, and identify a wide class of variants of this approach which cannot result in an algorithm with running time $O(n^{2.3078})$; in particular, this approach cannot prove the conjecture that f…

FOS: Computer and information sciencesComputer Science - Computational ComplexityComputer Science - Data Structures and AlgorithmsData Structures and Algorithms (cs.DS)Computational Complexity (cs.CC)
researchProduct

Quantum versus Classical Online Streaming Algorithms with Advice

2018

We consider online algorithms with respect to the competitive ratio. Here, we investigate quantum and classical one-way automata with non-constant size of memory (streaming algorithms) as a model for online algorithms. We construct problems that can be solved by quantum online streaming algorithms better than by classical ones in a case of logarithmic or sublogarithmic size of memory, even if classical online algorithms get advice bits. Furthermore, we show that a quantum online algorithm with a constant number of qubits can be better than any deterministic online algorithm with a constant number of advice bits and unlimited computational power.

FOS: Computer and information sciencesComputer Science - Computational ComplexityQuantum PhysicsComputer Science - Data Structures and AlgorithmsFOS: Physical sciencesData Structures and Algorithms (cs.DS)Computational Complexity (cs.CC)Quantum Physics (quant-ph)
researchProduct