Search results for "Computer Science - Data Structures and Algorithms"

showing 10 items of 64 documents

Adaptive learning of compressible strings

2020

Suppose an oracle knows a string $S$ that is unknown to us and that we want to determine. The oracle can answer queries of the form "Is $s$ a substring of $S$?". In 1995, Skiena and Sundaram showed that, in the worst case, any algorithm needs to ask the oracle $\sigma n/4 -O(n)$ queries in order to be able to reconstruct the hidden string, where $\sigma$ is the size of the alphabet of $S$ and $n$ its length, and gave an algorithm that spends $(\sigma-1)n+O(\sigma \sqrt{n})$ queries to reconstruct $S$. The main contribution of our paper is to improve the above upper-bound in the context where the string is compressible. We first present a universal algorithm that, given a (computable) compre…

FOS: Computer and information sciencesCentroid decompositionGeneral Computer ScienceString compressionAdaptive learningKolmogorov complexityContext (language use)Data_CODINGANDINFORMATIONTHEORYString reconstructionTheoretical Computer ScienceCombinatoricsString reconstruction; String learning; Adaptive learning; Kolmogorov complexity; String compression; Lempel-Ziv; Centroid decomposition; Suffix treeSuffix treeIntegerComputer Science - Data Structures and AlgorithmsOrder (group theory)Data Structures and Algorithms (cs.DS)Adaptive learning; Centroid decomposition; Kolmogorov complexity; Lempel-Ziv; String compression; String learning; String reconstruction; Suffix treeTime complexityComputer Science::DatabasesMathematicsLempel-ZivSettore INF/01 - InformaticaLinear spaceString (computer science)SubstringBounded functionString learningTheoretical Computer Science

researchProduct

Uncommon Suffix Tries

2011

Common assumptions on the source producing the words inserted in a suffix trie with $n$ leaves lead to a $\log n$ height and saturation level. We provide an example of a suffix trie whose height increases faster than a power of $n$ and another one whose saturation level is negligible with respect to $\log n$. Both are built from VLMC (Variable Length Markov Chain) probabilistic sources; they are easily extended to families of sources having the same properties. The first example corresponds to a ''logarithmic infinite comb'' and enjoys a non uniform polynomial mixing. The second one corresponds to a ''factorial infinite comb'' for which mixing is uniform and exponential.

FOS: Computer and information sciencesCompressed suffix arrayPolynomialLogarithmGeneral MathematicsSuffix treevariable length Markov chain[INFO.INFO-DS]Computer Science [cs]/Data Structures and Algorithms [cs.DS]Generalized suffix treeprobabilistic source0102 computer and information sciences02 engineering and technologysuffix trie01 natural scienceslaw.inventionCombinatoricslawComputer Science - Data Structures and AlgorithmsTrieFOS: Mathematics0202 electrical engineering electronic engineering information engineeringData Structures and Algorithms (cs.DS)Mixing (physics)[ INFO.INFO-DS ] Computer Science [cs]/Data Structures and Algorithms [cs.DS]MathematicsDiscrete mathematicsApplied MathematicsProbability (math.PR)020206 networking & telecommunicationssuffix trie.Computer Graphics and Computer-Aided Design[MATH.MATH-PR]Mathematics [math]/Probability [math.PR]010201 computation theory & mathematicsmixing properties60J05 37E05Suffix[ MATH.MATH-PR ] Mathematics [math]/Probability [math.PR]Mathematics - ProbabilitySoftware

researchProduct

Adaptive Lower Bound for Testing Monotonicity on the Line

2018

In the property testing model, the task is to distinguish objects possessing some property from the objects that are far from it. One of such properties is monotonicity, when the objects are functions from one poset to another. This is an active area of research. In this paper we study query complexity of $\epsilon$-testing monotonicity of a function $f\colon [n]\to[r]$. All our lower bounds are for adaptive two-sided testers. * We prove a nearly tight lower bound for this problem in terms of $r$. The bound is $\Omega(\frac{\log r}{\log \log r})$ when $\epsilon = 1/2$. No previous satisfactory lower bound in terms of $r$ was known. * We completely characterise query complexity of this probl…

FOS: Computer and information sciencesComputer Science - Computational Complexity000 Computer science knowledge general worksComputer Science - Data Structures and AlgorithmsComputer ScienceData Structures and Algorithms (cs.DS)Computational Complexity (cs.CC)

researchProduct

Fast Matrix Multiplication: Limitations of the Laser Method

2014

Until a few years ago, the fastest known matrix multiplication algorithm, due to Coppersmith and Winograd (1990), ran in time $O(n^{2.3755})$. Recently, a surge of activity by Stothers, Vassilevska-Williams, and Le Gall has led to an improved algorithm running in time $O(n^{2.3729})$. These algorithms are obtained by analyzing higher and higher tensor powers of a certain identity of Coppersmith and Winograd. We show that this exact approach cannot result in an algorithm with running time $O(n^{2.3725})$, and identify a wide class of variants of this approach which cannot result in an algorithm with running time $O(n^{2.3078})$; in particular, this approach cannot prove the conjecture that f…

FOS: Computer and information sciencesComputer Science - Computational ComplexityComputer Science - Data Structures and AlgorithmsData Structures and Algorithms (cs.DS)Computational Complexity (cs.CC)

researchProduct

Quantum versus Classical Online Streaming Algorithms with Advice

2018

We consider online algorithms with respect to the competitive ratio. Here, we investigate quantum and classical one-way automata with non-constant size of memory (streaming algorithms) as a model for online algorithms. We construct problems that can be solved by quantum online streaming algorithms better than by classical ones in a case of logarithmic or sublogarithmic size of memory, even if classical online algorithms get advice bits. Furthermore, we show that a quantum online algorithm with a constant number of qubits can be better than any deterministic online algorithm with a constant number of advice bits and unlimited computational power.

FOS: Computer and information sciencesComputer Science - Computational ComplexityQuantum PhysicsComputer Science - Data Structures and AlgorithmsFOS: Physical sciencesData Structures and Algorithms (cs.DS)Computational Complexity (cs.CC)Quantum Physics (quant-ph)

researchProduct

Quantum versus Classical Online Streaming Algorithms with Logarithmic Size of Memory

2023

FOS: Computer and information sciencesComputer Science - Computational ComplexityQuantum PhysicsFormal Languages and Automata Theory (cs.FL)General MathematicsComputer Science - Data Structures and AlgorithmsFOS: Physical sciencesData Structures and Algorithms (cs.DS)Computer Science - Formal Languages and Automata TheoryComputational Complexity (cs.CC)Quantum Physics (quant-ph)

researchProduct

Computing the original eBWT faster, simpler, and with less memory

2021

Mantaci et al. [TCS 2007] defined the eBWT to extend the definition of the BWT to a collection of strings, however, since this introduction, it has been used more generally to describe any BWT of a collection of strings and the fundamental property of the original definition (i.e., the independence from the input order) is frequently disregarded. In this paper, we propose a simple linear-time algorithm for the construction of the original eBWT, which does not require the preprocessing of Bannai et al. [CPM 2021]. As a byproduct, we obtain the first linear-time algorithm for computing the BWT of a single string that uses neither an end-of-string symbol nor Lyndon rotations. We combine our ne…

FOS: Computer and information sciencesComputer Science - Data Structures and AlgorithmsData Structures and Algorithms (cs.DS)

researchProduct

Substring Complexity in Sublinear Space

2020

Shannon's entropy is a definitive lower bound for statistical compression. Unfortunately, no such clear measure exists for the compressibility of repetitive strings. Thus, ad-hoc measures are employed to estimate the repetitiveness of strings, e.g., the size $z$ of the Lempel-Ziv parse or the number $r$ of equal-letter runs of the Burrows-Wheeler transform. A more recent one is the size $\gamma$ of a smallest string attractor. Unfortunately, Kempa and Prezza [STOC 2018] showed that computing $\gamma$ is NP-hard. Kociumaka et al. [LATIN 2020] considered a new measure that is based on the function $S_T$ counting the cardinalities of the sets of substrings of each length of $T$, also known as …

FOS: Computer and information sciencesComputer Science - Data Structures and AlgorithmsData Structures and Algorithms (cs.DS)

researchProduct

A Constructive Arboricity Approximation Scheme

2018

The arboricity $\Gamma$ of a graph is the minimum number of forests its edge set can be partitioned into. Previous approximation schemes were nonconstructive, i.e., they only approximated the arboricity as a value without computing a corresponding forest partition. This is because they operate on the related pseudoforest partitions or the dual problem of finding dense subgraphs. We propose an algorithm for converting a partition of $k$ pseudoforests into a partition of $k+1$ forests in $O(mk\log k + m \log n)$ time with a data structure by Brodal and Fagerberg that stores graphs of arboricity $k$. A slightly better bound can be given when perfect hashing is used. When applied to a pseudofor…

FOS: Computer and information sciencesComputer Science - Data Structures and AlgorithmsData Structures and Algorithms (cs.DS)MathematicsofComputing_DISCRETEMATHEMATICS

researchProduct

Burrows Wheeler Transform on a Large Scale: Algorithms Implemented in Apache Spark

2021

With the rapid growth of Next Generation Sequencing (NGS) technologies, large amounts of "omics" data are daily collected and need to be processed. Indexing and compressing large sequences datasets are some of the most important tasks in this context. Here we propose algorithms for the computation of Burrows Wheeler transform relying on Big Data technologies, i.e., Apache Spark and Hadoop. Our algorithms are the first ones that distribute the index computation and not only the input dataset, allowing to fully benefit of the available cloud resources.

FOS: Computer and information sciencesComputer Science - Distributed Parallel and Cluster ComputingComputer Science - Data Structures and AlgorithmsData_FILESData Structures and Algorithms (cs.DS)Distributed Parallel and Cluster Computing (cs.DC)

researchProduct