Search results for " Data Structures"

showing 10 items of 80 documents

Quantum versus Classical Online Streaming Algorithms with Logarithmic Size of Memory

2023

We consider online algorithms with respect to the competitive ratio. Here, we investigate quantum and classical one-way automata with non-constant size of memory (streaming algorithms) as a model for online algorithms. We construct problems that can be solved by quantum online streaming algorithms better than by classical ones in a case of logarithmic or sublogarithmic size of memory.

FOS: Computer and information sciencesComputer Science - Computational ComplexityQuantum PhysicsFormal Languages and Automata Theory (cs.FL)General MathematicsComputer Science - Data Structures and AlgorithmsFOS: Physical sciencesData Structures and Algorithms (cs.DS)Computer Science - Formal Languages and Automata TheoryComputational Complexity (cs.CC)Quantum Physics (quant-ph)
researchProduct

Computing the original eBWT faster, simpler, and with less memory

2021

Mantaci et al. [TCS 2007] defined the eBWT to extend the definition of the BWT to a collection of strings, however, since this introduction, it has been used more generally to describe any BWT of a collection of strings and the fundamental property of the original definition (i.e., the independence from the input order) is frequently disregarded. In this paper, we propose a simple linear-time algorithm for the construction of the original eBWT, which does not require the preprocessing of Bannai et al. [CPM 2021]. As a byproduct, we obtain the first linear-time algorithm for computing the BWT of a single string that uses neither an end-of-string symbol nor Lyndon rotations. We combine our ne…

FOS: Computer and information sciencesComputer Science - Data Structures and AlgorithmsData Structures and Algorithms (cs.DS)
researchProduct

Substring Complexity in Sublinear Space

2020

Shannon's entropy is a definitive lower bound for statistical compression. Unfortunately, no such clear measure exists for the compressibility of repetitive strings. Thus, ad-hoc measures are employed to estimate the repetitiveness of strings, e.g., the size $z$ of the Lempel-Ziv parse or the number $r$ of equal-letter runs of the Burrows-Wheeler transform. A more recent one is the size $\gamma$ of a smallest string attractor. Unfortunately, Kempa and Prezza [STOC 2018] showed that computing $\gamma$ is NP-hard. Kociumaka et al. [LATIN 2020] considered a new measure that is based on the function $S_T$ counting the cardinalities of the sets of substrings of each length of $T$, also known as …

FOS: Computer and information sciencesComputer Science - Data Structures and AlgorithmsData Structures and Algorithms (cs.DS)
researchProduct

A Constructive Arboricity Approximation Scheme

2018

The arboricity $\Gamma$ of a graph is the minimum number of forests its edge set can be partitioned into. Previous approximation schemes were nonconstructive, i.e., they only approximated the arboricity as a value without computing a corresponding forest partition. This is because they operate on the related pseudoforest partitions or the dual problem of finding dense subgraphs. We propose an algorithm for converting a partition of $k$ pseudoforests into a partition of $k+1$ forests in $O(mk\log k + m \log n)$ time with a data structure by Brodal and Fagerberg that stores graphs of arboricity $k$. A slightly better bound can be given when perfect hashing is used. When applied to a pseudofor…

FOS: Computer and information sciencesComputer Science - Data Structures and AlgorithmsData Structures and Algorithms (cs.DS)MathematicsofComputing_DISCRETEMATHEMATICS
researchProduct

Burrows Wheeler Transform on a Large Scale: Algorithms Implemented in Apache Spark

2021

With the rapid growth of Next Generation Sequencing (NGS) technologies, large amounts of "omics" data are daily collected and need to be processed. Indexing and compressing large sequences datasets are some of the most important tasks in this context. Here we propose algorithms for the computation of Burrows Wheeler transform relying on Big Data technologies, i.e., Apache Spark and Hadoop. Our algorithms are the first ones that distribute the index computation and not only the input dataset, allowing to fully benefit of the available cloud resources.

FOS: Computer and information sciencesComputer Science - Distributed Parallel and Cluster ComputingComputer Science - Data Structures and AlgorithmsData_FILESData Structures and Algorithms (cs.DS)Distributed Parallel and Cluster Computing (cs.DC)
researchProduct

Multi-label Methods for Prediction with Sequential Data

2017

The number of methods available for classification of multi-label data has increased rapidly over recent years, yet relatively few links have been made with the related task of classification of sequential data. If labels indices are considered as time indices, the problems can often be seen as equivalent. In this paper we detect and elaborate on connections between multi-label methods and Markovian models, and study the suitability of multi-label methods for prediction in sequential data. From this study we draw upon the most suitable techniques from the area and develop two novel competitive approaches which can be applied to either kind of data. We carry out an empirical evaluation inves…

FOS: Computer and information sciencesComputer Science - Machine LearningComputer scienceMarkov modelsMulti-label classificationMachine Learning (stat.ML)02 engineering and technologycomputer.software_genreMarkov modelMachine learningTask (project management)Machine Learning (cs.LG)Statistics - Machine LearningArtificial Intelligence020204 information systemsComputer Science - Data Structures and Algorithms0202 electrical engineering electronic engineering information engineeringSequential dataData Structures and Algorithms (cs.DS)Multi-label classificationta113business.industryProblem transformationSignal ProcessingSequence prediction020201 artificial intelligence & image processingSequential dataComputer Vision and Pattern RecognitionData miningArtificial intelligencebusinesscomputerSoftware
researchProduct

Learning from Data to Speed-up Sorted Table Search Procedures: Methodology and Practical Guidelines

2020

Sorted Table Search Procedures are the quintessential query-answering tool, with widespread usage that now includes also Web Applications, e.g, Search Engines (Google Chrome) and ad Bidding Systems (AppNexus). Speeding them up, at very little cost in space, is still a quite significant achievement. Here we study to what extend Machine Learning Techniques can contribute to obtain such a speed-up via a systematic experimental comparison of known efficient implementations of Sorted Table Search procedures, with different Data Layouts, and their Learned counterparts developed here. We characterize the scenarios in which those latter can be profitably used with respect to the former, accounting …

FOS: Computer and information sciencesComputer Science - Machine LearningStatistics - Machine LearningComputer Science - Data Structures and AlgorithmsData Structures and Algorithms (cs.DS)Machine Learning (stat.ML)E.1; I.2.068T07 68P05 62J05 68P10E.1I.2.0Machine Learning (cs.LG)
researchProduct

On the Complexity of Solving Subtraction Games

2018

We study algorithms for solving Subtraction games, which sometimes are referred to as one-heap Nim games. We describe a quantum algorithm which is applicable to any game on DAG, and show that its query compexity for solving an arbitrary Subtraction game of $n$ stones is $O(n^{3/2}\log n)$. The best known deterministic algorithms for solving such games are based on the dynamic programming approach. We show that this approach is asymptotically optimal and that classical query complexity for solving a Subtraction game is generally $\Theta(n^2)$. This paper perhaps is the first explicit "quantum" contribution to algorithmic game theory.

FOS: Computer and information sciencesComputer Science::Computer Science and Game TheoryQuantum PhysicsComputer Science - Computational ComplexityComputer Science - Computer Science and Game TheoryComputer Science - Data Structures and AlgorithmsComputingMilieux_PERSONALCOMPUTINGFOS: Physical sciencesData Structures and Algorithms (cs.DS)Computational Complexity (cs.CC)Quantum Physics (quant-ph)Computer Science and Game Theory (cs.GT)
researchProduct

Lightweight LCP construction for very large collections of strings

2016

The longest common prefix array is a very advantageous data structure that, combined with the suffix array and the Burrows-Wheeler transform, allows to efficiently compute some combinatorial properties of a string useful in several applications, especially in biological contexts. Nowadays, the input data for many problems are big collections of strings, for instance the data coming from "next-generation" DNA sequencing (NGS) technologies. In this paper we present the first lightweight algorithm (called extLCP) for the simultaneous computation of the longest common prefix array and the Burrows-Wheeler transform of a very large collection of strings having any length. The computation is reali…

FOS: Computer and information sciencesComputer scienceComputation0102 computer and information sciences02 engineering and technologyParallel computing01 natural sciencesGeneralized Suffix ArrayTheoretical Computer Sciencelaw.inventionlawComputational Theory and MathematicComputer Science - Data Structures and AlgorithmsExtended Burrows-Wheeler TransformData_FILES0202 electrical engineering electronic engineering information engineeringDiscrete Mathematics and CombinatoricsData Structures and Algorithms (cs.DS)Discrete Mathematics and CombinatoricAuxiliary memoryLongest Common Prefix Array; Extended Burrows-Wheeler Transform; Generalized Suffix Array;String (computer science)LCP arraySuffix arrayData structureComputational Theory and Mathematics010201 computation theory & mathematicsLongest Common Prefix Array020201 artificial intelligence & image processingJournal of Discrete Algorithms
researchProduct

Finding k -dissimilar paths with minimum collective length

2018

Shortest path computation is a fundamental problem in road networks. However, in many real-world scenarios, determining solely the shortest path is not enough. In this paper, we study the problem of finding k-Dissimilar Paths with Minimum Collective Length (kDPwML), which aims at computing a set of paths from a source s to a target t such that all paths are pairwise dissimilar by at least \theta and the sum of the path lengths is minimal. We introduce an exact algorithm for the kDPwML problem, which iterates over all possible s-t paths while employing two pruning techniques to reduce the prohibitively expensive computational cost. To achieve scalability, we also define the much smaller set …

FOS: Computer and information sciencesComputer scienceDatabases (cs.DB)0102 computer and information sciences02 engineering and technology01 natural sciencesSet (abstract data type)Exact algorithmComputer Science - Databases010201 computation theory & mathematicsIterated function020204 information systemsComputer Science - Data Structures and AlgorithmsShortest path problemScalabilityPath (graph theory)0202 electrical engineering electronic engineering information engineeringData Structures and Algorithms (cs.DS)Pairwise comparisonPruning (decision trees)AlgorithmProceedings of the 26th ACM SIGSPATIAL International Conference on Advances in Geographic Information Systems
researchProduct