Author: Péter Burcsi

0000000000018031

AUTHOR

Péter Burcsi

0000-0003-3306-6500

showing 7 related works from this author

On Combinatorial Generation of Prefix Normal Words

2014

A prefix normal word is a binary word with the property that no substring has more 1s than the prefix of the same length. This class of words is important in the context of binary jumbled pattern matching. In this paper we present an efficient algorithm for exhaustively listing the prefix normal words with a fixed length. The algorithm is based on the fact that the language of prefix normal words is a bubble language, a class of binary languages with the property that, for any word w in the language, exchanging the first occurrence of 01 by 10 in w results in another word in the language. We prove that each prefix normal word is produced in O(n) amortized time, and conjecture, based on expe…

Amortized analysisConjecturePrefix Normal WordBinary numbercombinatorial generation; formal languages; prefix normal words; binary strings; jumbled pattern matching; bubble languages; efficient algorithmsContext (language use)prefix normal wordsData_CODINGANDINFORMATIONTHEORYformal languagesbubble languagesSubstringcombinatorial generationbinary stringsPrefixCombinatoricsjumbled pattern matchingefficient algorithmsPattern matchingAlgorithmsWord (computer architecture)Mathematics

researchProduct

ALGORITHMS FOR JUMBLED PATTERN MATCHING IN STRINGS

2011

The Parikh vector p(s) of a string s is defined as the vector of multiplicities of the characters. Parikh vector q occurs in s if s has a substring t with p(t)=q. We present two novel algorithms for searching for a query q in a text s. One solves the decision problem over a binary text in constant time, using a linear size index of the text. The second algorithm, for a general finite alphabet, finds all occurrences of a given Parikh vector q and has sub-linear expected time complexity; we present two variants, which both use a linear size index of the text.

FOS: Computer and information sciencesJ.3average case analysis.Binary numberaverage case analysispermuted stringpermuted stringsComputer Science - Data Structures and AlgorithmsComputer Science (miscellaneous)Parikh vectorData Structures and Algorithms (cs.DS)Pattern matchingTime complexityMathematicsString (computer science)Parikh vectorsstring algorithmDecision problemstring algorithmsSubstringParikh vectors; permuted strings; pattern matching; string algorithms; average case analysisF.2.2; J.3Index (publishing)pattern matchingF.2.2Constant (mathematics)AlgorithmComputer Science::Formal Languages and Automata Theory

researchProduct

Generating a Gray code for prefix normal words in amortized polylogarithmic time per word

2020

A prefix normal word is a binary word with the property that no substring has more $1$s than the prefix of the same length. By proving that the set of prefix normal words is a bubble language, we can exhaustively list all prefix normal words of length $n$ as a combinatorial Gray code, where successive strings differ by at most two swaps or bit flips. This Gray code can be generated in $\Oh(\log^2 n)$ amortized time per word, while the best generation algorithm hitherto has $\Oh(n)$ running time per word. We also present a membership tester for prefix normal words, as well as a novel characterization of bubble languages.

FOS: Computer and information sciencesGeneral Computer ScienceFormal Languages and Automata Theory (cs.FL)Property (programming)combinatorial Gray codeComputer Science - Formal Languages and Automata TheoryData_CODINGANDINFORMATIONTHEORY0102 computer and information sciences02 engineering and technologyCharacterization (mathematics)01 natural sciencesTheoretical Computer ScienceCombinatoricsSet (abstract data type)Gray codeComputer Science - Data Structures and Algorithms0202 electrical engineering electronic engineering information engineeringData Structures and Algorithms (cs.DS)MathematicsAmortized analysisSettore INF/01 - Informaticaprefix normal wordsSubstringcombinatorial generationPrefixjumbled pattern matching010201 computation theory & mathematics020201 artificial intelligence & image processingbinary languagesprefix normal words binary languages combinatorial Gray code combinatorial generation jumbled pattern matchingWord (computer architecture)Theoretical Computer Science

researchProduct

On prefix normal words and prefix normal forms

2016

A $1$-prefix normal word is a binary word with the property that no factor has more $1$s than the prefix of the same length; a $0$-prefix normal word is defined analogously. These words arise in the context of indexed binary jumbled pattern matching, where the aim is to decide whether a word has a factor with a given number of $1$s and $0$s (a given Parikh vector). Each binary word has an associated set of Parikh vectors of the factors of the word. Using prefix normal words, we provide a characterization of the equivalence class of binary words having the same set of Parikh vectors of their factors. We prove that the language of prefix normal words is not context-free and is strictly contai…

FOS: Computer and information sciencesPrefix codePrefix normal wordPre-necklaceDiscrete Mathematics (cs.DM)General Computer ScienceFormal Languages and Automata Theory (cs.FL)Binary numberComputer Science - Formal Languages and Automata TheoryContext (language use)Binary languageLyndon words0102 computer and information sciences02 engineering and technologyPrefix grammarprefix normal formsKraft's inequalityCharacterization (mathematics)Lyndon word01 natural sciencesPrefix normal formenumerationTheoretical Computer ScienceFOS: Mathematics0202 electrical engineering electronic engineering information engineeringMathematics - CombinatoricsMathematicsDiscrete mathematicsprefix normal words prefix normal forms binary languages binary jumbled pattern matching pre-necklaces Lyndon words enumerationbinary jumbled pattern matchingSettore INF/01 - InformaticaComputer Science (all)pre-necklacesComputer Science::Computation and Language (Computational Linguistics and Natural Language and Speech Processing)prefix normal wordsPrefix010201 computation theory & mathematics020201 artificial intelligence & image processingCombinatorics (math.CO)binary languagesComputer Science::Formal Languages and Automata TheoryWord (group theory)Computer Science - Discrete MathematicsTheoretical Computer Science

researchProduct

On Table Arrangements, Scrabble Freaks, and Jumbled Pattern Matching

2010

Given a string s, the Parikh vector of s, denoted p(s), counts the multiplicity of each character in s. Searching for a match of Parikh vector q (a “jumbled string”) in the text s requires to find a substring t of s with p(t) = q. The corresponding decision problem is to verify whether at least one such match exists. So, for example for the alphabet Σ = {a, b, c}, the string s = abaccbabaaa has Parikh vector p(s) = (6,3,2), and the Parikh vector q = (2,1,1) appears once in s in position (1,4). Like its more precise counterpart, the renown Exact String Matching, Jumbled Pattern Matching has ubiquitous applications, e.g., string matching with a dyslectic word processor, table rearrangements, …

Discrete mathematicsParikh vectors jumbled pattern matching scrabble approximate pattern matching000AnagramParikh vectorsString searching algorithmApproximate string matchingDecision problemalgorithmsData structureJumbled Pattern MatchingSubstringscrabbleapproximate pattern matchingString MatchingWavelet TreePattern matchingMathematics

researchProduct

On Approximate Jumbled Pattern Matching in Strings

2011

Given a string s, the Parikh vector of s, denoted p(s), counts the multiplicity of each character in s. Searching for a match of a Parikh vector q in the text s requires finding a substring t of s with p(t) = q. This can be viewed as the task of finding a jumbled (permuted) version of a query pattern, hence the term Jumbled Pattern Matching. We present several algorithms for the approximate version of the problem: Given a string s and two Parikh vectors u, v (the query bounds), find all maximal occurrences in s of some Parikh vector q such that u <= q <= v. This definition encompasses several natural versions of approximate Parikh vector search. We present an algorithm solving this problem …

Parikh vectors: Average case analysiApproximate searchString algorithmsDiscrete mathematicsWeight functionanalysisSearch engine indexingParikh vectorsAverage case analysisApproximate string matchingSubstringString algorithmTheoretical Computer ScienceCombinatoricsComputational Theory and MathematicsString algorithms Pattern matching Parikh vectors Average case analysis Approximate search Permuted stringsPermuted stringsAverage caseTheory of computationWavelet TreePreprocessorPattern matchingPattern matchingMathematicsTheory of Computing Systems

researchProduct

Normal, Abby Normal, Prefix Normal

2014

A prefix normal word is a binary word with the property that no substring has more 1s than the prefix of the same length. This class of words is important in the context of binary jumbled pattern matching. In this paper we present results about the number $\textit{pnw}(n)$ of prefix normal words of length n, showing that $\textit{pnw}(n) =\Omega\left(2^{n - c\sqrt{n\ln n}}\right)$ for some c and $\textit{pnw}(n) = O \left(\frac{2^n (\ln n)^2}{n}\right)$. We introduce efficient algorithms for testing the prefix normal property and a “mechanical algorithm” for computing prefix normal forms. We also include games which can be played with prefix normal words. In these games Alice wishes t…

binary jumbled pattern matchingEfficient algorithmmembership testBinary numberContext (language use)Prefix Normal Word AlgorithmData_CODINGANDINFORMATIONTHEORYprefix normal wordsOmegaSubstringenumerationCombinatoricsPrefixprefix normal words; binary jumbled pattern matching; normal forms; enumeration; membership test; binary languagesEnumerationnormal formsbinary languagesWord (group theory)Mathematics

researchProduct