Search results for "Substring"
showing 10 items of 29 documents
On the longest common factor problem
2008
The Longest Common Factor (LCF) of a set of strings is a well studied problem having a wide range of applications in Bioinformatics: from microarrays to DNA sequences analysis. This problem has been solved by Hui (2000) who uses a famous constant-time solution to the Lowest Common Ancestor (LCA) problem in trees coupled with use of suffix trees. A data structure for the LCA problem, although linear in space and construction time, introduces a multiplicative constant in both space and time that reduces the range of applications in many biological applications. In this article we present a new method for solving the LCF problem using the suffix tree structure with an auxiliary array that take…
Longest Common Subsequence from Fragments via Sparse Dynamic Programming
1998
Sparse Dynamic Programming has emerged as an essential tool for the design of efficient algorithms for optimization problems coming from such diverse areas as Computer Science, Computational Biology and Speech Recognition [7,11,15]. We provide a new Sparse Dynamic Programming technique that extends the Hunt-Szymanski [2,9,8] paradigm for the computation of the Longest Common Subsequence (LCS) and apply it to solve the LCS from Fragments problem: given a pair of strings X and Y (of length n and m, resp.) and a set M of matching substrings of X and Y, find the longest common subsequence based only on the symbol correspondences induced by the substrings. This problem arises in an application t…
Adaptive learning of compressible strings
2020
Suppose an oracle knows a string $S$ that is unknown to us and that we want to determine. The oracle can answer queries of the form "Is $s$ a substring of $S$?". In 1995, Skiena and Sundaram showed that, in the worst case, any algorithm needs to ask the oracle $\sigma n/4 -O(n)$ queries in order to be able to reconstruct the hidden string, where $\sigma$ is the size of the alphabet of $S$ and $n$ its length, and gave an algorithm that spends $(\sigma-1)n+O(\sigma \sqrt{n})$ queries to reconstruct $S$. The main contribution of our paper is to improve the above upper-bound in the context where the string is compressible. We first present a universal algorithm that, given a (computable) compre…
Algorithms for Anti-Powers in Strings
2018
Abstract A string S [ 1 , n ] is a power (or tandem repeat) of order k and period n / k if it can be decomposed into k consecutive equal-length blocks of letters. Powers and periods are fundamental to string processing, and algorithms for their efficient computation have wide application and are heavily studied. Recently, Fici et al. (Proc. ICALP 2016) defined an anti-power of order k to be a string composed of k pairwise-distinct blocks of the same length ( n / k , called anti-period). Anti-powers are a natural converse to powers, and are objects of combinatorial interest in their own right. In this paper we initiate the algorithmic study of anti-powers. Given a string S, we describe an op…
A subquadratic algorithm for minimum palindromic factorization
2014
We give an $\mathcal{O}(n \log n)$-time, $\mathcal{O}(n)$-space algorithm for factoring a string into the minimum number of palindromic substrings. That is, given a string $S [1..n]$, in $\mathcal{O}(n \log n)$ time our algorithm returns the minimum number of palindromes $S_1,\ldots, S_\ell$ such that $S = S_1 \cdots S_\ell$. We also show that the time complexity is $\mathcal{O}(n)$ on average and $\Omega(n\log n)$ in the worst case. The last result is based on a characterization of the palindromic structure of Zimin words.
Online Computation of Abelian Runs
2015
Given a word $w$ and a Parikh vector $\mathcal{P}$, an abelian run of period $\mathcal{P}$ in $w$ is a maximal occurrence of a substring of $w$ having abelian period $\mathcal{P}$. We give an algorithm that finds all the abelian runs of period $\mathcal{P}$ in a word of length $n$ in time $O(n\times |\mathcal{P}|)$ and space $O(\sigma+|\mathcal{P}|)$.
Fast computation of abelian runs
2016
Given a word $w$ and a Parikh vector $\mathcal{P}$, an abelian run of period $\mathcal{P}$ in $w$ is a maximal occurrence of a substring of $w$ having abelian period $\mathcal{P}$. Our main result is an online algorithm that, given a word $w$ of length $n$ over an alphabet of cardinality $\sigma$ and a Parikh vector $\mathcal{P}$, returns all the abelian runs of period $\mathcal{P}$ in $w$ in time $O(n)$ and space $O(\sigma+p)$, where $p$ is the norm of $\mathcal{P}$, i.e., the sum of its components. We also present an online algorithm that computes all the abelian runs with periods of norm $p$ in $w$ in time $O(np)$, for any given norm $p$. Finally, we give an $O(n^2)$-time offline randomi…
Generating a Gray code for prefix normal words in amortized polylogarithmic time per word
2020
A prefix normal word is a binary word with the property that no substring has more $1$s than the prefix of the same length. By proving that the set of prefix normal words is a bubble language, we can exhaustively list all prefix normal words of length $n$ as a combinatorial Gray code, where successive strings differ by at most two swaps or bit flips. This Gray code can be generated in $\Oh(\log^2 n)$ amortized time per word, while the best generation algorithm hitherto has $\Oh(n)$ running time per word. We also present a membership tester for prefix normal words, as well as a novel characterization of bubble languages.
ALGORITHMS FOR JUMBLED PATTERN MATCHING IN STRINGS
2011
The Parikh vector p(s) of a string s is defined as the vector of multiplicities of the characters. Parikh vector q occurs in s if s has a substring t with p(t)=q. We present two novel algorithms for searching for a query q in a text s. One solves the decision problem over a binary text in constant time, using a linear size index of the text. The second algorithm, for a general finite alphabet, finds all occurrences of a given Parikh vector q and has sub-linear expected time complexity; we present two variants, which both use a linear size index of the text.
Binary jumbled string matching for highly run-length compressible texts
2012
The Binary Jumbled String Matching problem is defined as: Given a string $s$ over $\{a,b\}$ of length $n$ and a query $(x,y)$, with $x,y$ non-negative integers, decide whether $s$ has a substring $t$ with exactly $x$ $a$'s and $y$ $b$'s. Previous solutions created an index of size O(n) in a pre-processing step, which was then used to answer queries in constant time. The fastest algorithms for construction of this index have running time $O(n^2/\log n)$ [Burcsi et al., FUN 2010; Moosa and Rahman, IPL 2010], or $O(n^2/\log^2 n)$ in the word-RAM model [Moosa and Rahman, JDA 2012]. We propose an index constructed directly from the run-length encoding of $s$. The construction time of our index i…