0000000000483731

AUTHOR

Gabriele Fici

showing 72 related works from this author

Binary jumbled string matching for highly run-length compressible texts

2012

The Binary Jumbled String Matching problem is defined as: Given a string $s$ over $\{a,b\}$ of length $n$ and a query $(x,y)$, with $x,y$ non-negative integers, decide whether $s$ has a substring $t$ with exactly $x$ $a$'s and $y$ $b$'s. Previous solutions created an index of size O(n) in a pre-processing step, which was then used to answer queries in constant time. The fastest algorithms for construction of this index have running time $O(n^2/\log n)$ [Burcsi et al., FUN 2010; Moosa and Rahman, IPL 2010], or $O(n^2/\log^2 n)$ in the word-RAM model [Moosa and Rahman, JDA 2012]. We propose an index constructed directly from the run-length encoding of $s$. The construction time of our index i…

FOS: Computer and information sciencesString algorithmsStructure (category theory)Binary numberG.2.1Data_CODINGANDINFORMATIONTHEORY0102 computer and information sciences02 engineering and technologyString searching algorithm01 natural sciencesComputer Science - Information RetrievalTheoretical Computer ScienceCombinatoricsdata structuresSimple (abstract algebra)Computer Science - Data Structures and AlgorithmsString algorithms; jumbled pattern matching; prefix normal form; data structures0202 electrical engineering electronic engineering information engineeringParikh vectorData Structures and Algorithms (cs.DS)Run-length encodingMathematics68W32 68P05 68P20String (computer science)prefix normal formSubstringComputer Science Applicationsjumbled pattern matching010201 computation theory & mathematicsData structureSignal ProcessingRun-length encoding020201 artificial intelligence & image processingConstant (mathematics)Information Retrieval (cs.IR)Information SystemsInformation Processing Letters
researchProduct

Alignment-free sequence comparison using absent words

2018

Sequence comparison is a prerequisite to virtually all comparative genomic analyses. It is often realised by sequence alignment techniques, which are computationally expensive. This has led to increased research into alignment-free techniques, which are based on measures referring to the composition of sequences in terms of their constituent patterns. These measures, such as $q$-gram distance, are usually computed in time linear with respect to the length of the sequences. In this paper, we focus on the complementary idea: how two sequences can be efficiently compared based on information that does not occur in the sequences. A word is an {\em absent word} of some sequence if it does not oc…

0301 basic medicineFOS: Computer and information sciencesFormal Languages and Automata Theory (cs.FL)Computer Science - Formal Languages and Automata TheorySequence alignmentInformation System0102 computer and information sciencesCircular wordAbsent words01 natural sciencesUpper and lower boundsSequence comparisonTheoretical Computer ScienceCombinatorics03 medical and health sciencesComputer Science - Data Structures and AlgorithmsData Structures and Algorithms (cs.DS)Absent wordCircular wordsMathematicsSequenceSettore INF/01 - InformaticaProcess (computing)q-gramComputer Science Applications1707 Computer Vision and Pattern Recognitionq-gramsComposition (combinatorics)Computer Science Applications030104 developmental biologyComputational Theory and MathematicsForbidden words010201 computation theory & mathematicsFocus (optics)Forbidden wordWord (computer architecture)Information SystemsInteger (computer science)
researchProduct

Marked systems and circular splicing

2007

Splicing systems are generative devices of formal languages, introduced by Head in 1987 to model biological phenomena on linear and circular DNA molecules. In this paper we introduce a special class of finite circular splicing systems named marked systems. We prove that a marked system S generates a regular circular language if and only if S satisfies a special (decidable) property. As a consequence, we show that we can decide whether a regular circular language is generated by a marked system and we characterize the structure of these regular circular languages.

Discrete mathematicsProperty (programming)Structure (category theory)Molecular computingCircular wordDecidabilityRegular languageIf and only ifRNA splicingFormal languageSplicing systemFormal languageGenerative grammarAutomata theoryMathematics
researchProduct

On Combinatorial Generation of Prefix Normal Words

2014

A prefix normal word is a binary word with the property that no substring has more 1s than the prefix of the same length. This class of words is important in the context of binary jumbled pattern matching. In this paper we present an efficient algorithm for exhaustively listing the prefix normal words with a fixed length. The algorithm is based on the fact that the language of prefix normal words is a bubble language, a class of binary languages with the property that, for any word w in the language, exchanging the first occurrence of 01 by 10 in w results in another word in the language. We prove that each prefix normal word is produced in O(n) amortized time, and conjecture, based on expe…

Amortized analysisConjecturePrefix Normal WordBinary numbercombinatorial generation; formal languages; prefix normal words; binary strings; jumbled pattern matching; bubble languages; efficient algorithmsContext (language use)prefix normal wordsData_CODINGANDINFORMATIONTHEORYformal languagesbubble languagesSubstringcombinatorial generationbinary stringsPrefixCombinatoricsjumbled pattern matchingefficient algorithmsPattern matchingAlgorithmsWord (computer architecture)Mathematics
researchProduct

A Classification of Trapezoidal Words

2011

Trapezoidal words are finite words having at most n+1 distinct factors of length n, for every n>=0. They encompass finite Sturmian words. We distinguish trapezoidal words into two disjoint subsets: open and closed trapezoidal words. A trapezoidal word is closed if its longest repeated prefix has exactly two occurrences in the word, the second one being a suffix of the word. Otherwise it is open. We show that open trapezoidal words are all primitive and that closed trapezoidal words are all Sturmian. We then show that trapezoidal palindromes are closed (and therefore Sturmian). This allows us to characterize the special factors of Sturmian palindromes. We end with several open problems.

FOS: Computer and information sciencesFormal Languages and Automata Theory (cs.FL)lcsh:Mathematicstrapezoidal words Sturmian words special factors palindromesPalindromeComputer Science - Formal Languages and Automata TheoryDisjoint setslcsh:QA1-939lcsh:QA75.5-76.95PrefixCombinatoricsF.4.3FOS: MathematicsMathematics - CombinatoricsCombinatorics (math.CO)lcsh:Electronic computers. Computer scienceSuffixWord (group theory)Mathematics
researchProduct

A Characterization of Bispecial Sturmian Words

2012

A finite Sturmian word w over the alphabet {a,b} is left special (resp. right special) if aw and bw (resp. wa and wb) are both Sturmian words. A bispecial Sturmian word is a Sturmian word that is both left and right special. We show as a main result that bispecial Sturmian words are exactly the maximal internal factors of Christoffel words, that are words coding the digital approximations of segments in the Euclidean plane. This result is an extension of the known relation between central words and primitive Christoffel words. Our characterization allows us to give an enumerative formula for bispecial Sturmian words. We also investigate the minimal forbidden words for the set of Sturmian wo…

CombinatoricsChristoffel symbolsApproximations of πEuclidean geometrySturmian wordAlphabetMathematicsSturmian words Christoffel words special factors minimal forbidden words enumerative formula
researchProduct

Reverse-safe data structures for text indexing

2021

We introduce the notion of reverse-safe data structures. These are data structures that prevent the reconstruction of the data they encode (i.e., they cannot be easily reversed). A data structure D is called z-reverse-safe when there exist at least z datasets with the same set of answers as the ones stored by D. The main challenge is to ensure that D stores as many answers to useful queries as possible, is constructed efficiently, and has size close to the size of the original dataset it encodes. Given a text of length n and an integer z, we propose an algorithm which constructs a z-reverse-safe data structure that has size O(n) and answers pattern matching queries of length at most d optim…

050101 languages & linguisticsComputer sciencedata structure02 engineering and technologyprivacySet (abstract data type)combinatoric0202 electrical engineering electronic engineering information engineering0501 psychology and cognitive sciencesPattern matchingSettore ING-INF/05 - Sistemi Di Elaborazione Delle InformazionialgorithmSettore INF/01 - Informatica05 social sciencesSearch engine indexingINF/01 - INFORMATICAdata miningData structureMatrix multiplicationcombinatoricsExponent020201 artificial intelligence & image processingdata structure; algorithm; combinatorics; de Bruijn graph; data mining; privacyAlgorithmAdversary modelde Bruijn graphInteger (computer science)
researchProduct

Constructing Antidictionaries of Long Texts in Output-Sensitive Space

2021

AbstractA wordxthat is absent from a wordyis calledminimalif all its proper factors occur iny. Given a collection ofkwordsy1, … ,ykover an alphabetΣ, we are asked to compute the set$\mathrm {M}^{\ell }_{\{y_1,\ldots ,y_k\}}$M{y1,…,yk}ℓof minimal absent words of length at mostℓof the collection {y1, … ,yk}. The set$\mathrm {M}^{\ell }_{\{y_1,\ldots ,y_k\}}$M{y1,…,yk}ℓcontains all the wordsxsuch thatxis absent from all the words of the collection while there existi,j, such that the maximal proper suffix ofxis a factor ofyiand the maximal proper prefix ofxis a factor ofyj. In data compression, this corresponds to computing the antidictionary ofkdocuments. In bioinformatics, it corresponds to c…

0301 basic medicineAntidictionarySettore INF/01 - InformaticaOutput sensitive algorithm0102 computer and information sciencesSpace (mathematics)01 natural sciencesTheoretical Computer ScienceString algorithmPrefixSet (abstract data type)Combinatorics03 medical and health sciences030104 developmental biologyComputational Theory and Mathematics010201 computation theory & mathematicsData compressionOutput-sensitive algorithm[INFO]Computer Science [cs]SuffixAlphabetAbsent wordWord (group theory)MathematicsTheory of Computing Systems
researchProduct

Abelian-Square-Rich Words

2017

An abelian square is the concatenation of two words that are anagrams of one another. A word of length $n$ can contain at most $\Theta(n^2)$ distinct factors, and there exist words of length $n$ containing $\Theta(n^2)$ distinct abelian-square factors, that is, distinct factors that are abelian squares. This motivates us to study infinite words such that the number of distinct abelian-square factors of length $n$ grows quadratically with $n$. More precisely, we say that an infinite word $w$ is {\it abelian-square-rich} if, for every $n$, every factor of $w$ of length $n$ contains, on average, a number of distinct abelian-square factors that is quadratic in $n$; and {\it uniformly abelian-sq…

FOS: Computer and information sciencesGeneral Computer ScienceDiscrete Mathematics (cs.DM)Formal Languages and Automata Theory (cs.FL)Abelian squareComputer Science - Formal Languages and Automata Theory0102 computer and information sciences02 engineering and technology68R1501 natural sciencesSquare (algebra)Theoretical Computer ScienceCombinatorics0202 electrical engineering electronic engineering information engineeringFOS: MathematicsMathematics - CombinatoricsAbelian groupQuotientMathematicsDiscrete mathematicsComputer Science (all)Sturmian wordSturmian wordFunction (mathematics)Thue–Morse word010201 computation theory & mathematicsBounded functionThue-Morse wordExponentAbelian square; Sturmian word; Thue-Morse word; Theoretical Computer Science; Computer Science (all)020201 artificial intelligence & image processingCombinatorics (math.CO)Word (group theory)Computer Science::Formal Languages and Automata TheoryComputer Science - Discrete Mathematics
researchProduct

Maximal Closed Substrings

2022

A string is closed if it has length 1 or has a nonempty border without internal occurrences. In this paper we introduce the definition of a maximal closed substring (MCS), which is an occurrence of a closed substring that cannot be extended to the left nor to the right into a longer closed substring. MCSs with exponent at least 2 are commonly called runs; those with exponent smaller than 2, instead, are particular cases of maximal gapped repeats. We show that a string of length n contains O(n1.5) MCSs. We also provide an output-sensitive algorithm that, given a string of length n over a constant-size alphabet, locates all m MCSs the string contains in O(nlog n+ m) time.

Closed word Maximal closed substring Run
researchProduct

Enumeration and Structure of Trapezoidal Words

2013

Trapezoidal words are words having at most $n+1$ distinct factors of length $n$ for every $n\ge 0$. They therefore encompass finite Sturmian words. We give combinatorial characterizations of trapezoidal words and exhibit a formula for their enumeration. We then separate trapezoidal words into two disjoint classes: open and closed. A trapezoidal word is closed if it has a factor that occurs only as a prefix and as a suffix; otherwise it is open. We investigate open and closed trapezoidal words, in relation with their special factors. We prove that Sturmian palindromes are closed trapezoidal words and that a closed trapezoidal word is a Sturmian palindrome if and only if its longest repeated …

FOS: Computer and information sciencesFibonacci numberSpecial factorGeneral Computer ScienceFormal Languages and Automata Theory (cs.FL)Computer Science - Formal Languages and Automata TheoryEnumerative formulaDisjoint sets68R15Theoretical Computer ScienceFOS: MathematicsPalindromeMathematics - CombinatoricsClosed wordsFibonacci wordMathematicsDiscrete mathematicsClosed wordSequenceta111Sturmian wordPrefixCombinatorics on wordsRich wordtrapezoidal wordF.4.3Combinatorics (math.CO)SuffixWord (group theory)Computer Science(all)
researchProduct

Presentations of constrained systems with unconstrained positions

2005

International audience; We give a polynomial-time construction of the set of sequences that satisfy a finite-memory constraint defined by a finite list of forbidden blocks, with a specified set of bit positions unconstrained. Such a construction can be used to build modulation/error-correction codes (ECC codes) like the ones defined by the Immink-Wijngaarden scheme in which certain bit positions are reserved for ECC parity. We give a lineartime construction of a finite-state presentation of a constrained system defined by a periodic list of forbidden blocks. These systems, called periodic-finite-type systems, were introduced by Moision and Siegel. Finally, we present a linear-time algorithm for con…

[INFO.INFO-DS]Computer Science [cs]/Data Structures and Algorithms [cs.DS]finite-memory systemperiodic-finite-type (PFT) system[INFO.INFO-DS] Computer Science [cs]/Data Structures and Algorithms [cs.DS]0102 computer and information sciences02 engineering and technologyLibrary and Information Sciences01 natural sciencesModulation coding0202 electrical engineering electronic engineering information engineeringMathematicsDiscrete mathematicsChannel codefinite-state encodermodulation codeDAWG020206 networking & telecommunicationsDirected graphDirected acyclic graphforbidden blockComputer Science ApplicationsFinite sequence010201 computation theory & mathematicscodeError detection and correctionrun-length limited (RLL) codesInformation SystemsCoding (social sciences)maximum transition run (MTR)
researchProduct

ALGORITHMS FOR JUMBLED PATTERN MATCHING IN STRINGS

2011

The Parikh vector p(s) of a string s is defined as the vector of multiplicities of the characters. Parikh vector q occurs in s if s has a substring t with p(t)=q. We present two novel algorithms for searching for a query q in a text s. One solves the decision problem over a binary text in constant time, using a linear size index of the text. The second algorithm, for a general finite alphabet, finds all occurrences of a given Parikh vector q and has sub-linear expected time complexity; we present two variants, which both use a linear size index of the text.

FOS: Computer and information sciencesJ.3average case analysis.Binary numberaverage case analysispermuted stringpermuted stringsComputer Science - Data Structures and AlgorithmsComputer Science (miscellaneous)Parikh vectorData Structures and Algorithms (cs.DS)Pattern matchingTime complexityMathematicsString (computer science)Parikh vectorsstring algorithmDecision problemstring algorithmsSubstringParikh vectors; permuted strings; pattern matching; string algorithms; average case analysisF.2.2; J.3Index (publishing)pattern matchingF.2.2Constant (mathematics)AlgorithmComputer Science::Formal Languages and Automata Theory
researchProduct

A note on easy and efficient computation of full abelian periods of a word

2016

Constantinescu and Ilie (Bulletin of the EATCS 89, 167-170, 2006) introduced the idea of an Abelian period with head and tail of a finite word. An Abelian period is called full if both the head and the tail are empty. We present a simple and easy-to-implement $O(n\log\log n)$-time algorithm for computing all the full Abelian periods of a word of length $n$ over a constant-size alphabet. Experiments show that our algorithm significantly outperforms the $O(n)$ algorithm proposed by Kociumaka et al. (Proc. of STACS, 245-256, 2013) for the same problem.

FOS: Computer and information sciencesDiscrete Mathematics (cs.DM)Formal Languages and Automata Theory (cs.FL)[INFO.INFO-DS]Computer Science [cs]/Data Structures and Algorithms [cs.DS][INFO.INFO-DS] Computer Science [cs]/Data Structures and Algorithms [cs.DS]Elementary abelian groupComputer Science - Formal Languages and Automata Theory0102 computer and information sciences02 engineering and technology[INFO] Computer Science [cs]01 natural sciencesRank of an abelian groupCombinatoricsSimple (abstract algebra)Computer Science - Data Structures and Algorithms0202 electrical engineering electronic engineering information engineeringDiscrete Mathematics and CombinatoricsData Structures and Algorithms (cs.DS)[INFO]Computer Science [cs]Abelian groupHidden subgroup problemDiscrete Mathematics and CombinatoricComputingMilieux_MISCELLANEOUSMathematicsCombinatorics on wordDiscrete mathematicsApplied Mathematics020206 networking & telecommunicationsAbelian periodText algorithmWeak repetitionFree abelian groupAbelian powerCombinatorics on wordsDesign of algorithm010201 computation theory & mathematicsWord (computer architecture)Computer Science::Formal Languages and Automata TheoryComputer Science - Discrete Mathematics
researchProduct

Constructing Antidictionaries in Output-Sensitive Space

2021

A word $x$ that is absent from a word $y$ is called minimal if all its proper factors occur in $y$. Given a collection of $k$ words $y_1,y_2,\ldots,y_k$ over an alphabet $\Sigma$, we are asked to compute the set $\mathrm{M}^{\ell}_{y_{1}\#\ldots\#y_{k}}$ of minimal absent words of length at most $\ell$ of word $y=y_1\#y_2\#\ldots\#y_k$, $\#\notin\Sigma$. In data compression, this corresponds to computing the antidictionary of $k$ documents. In bioinformatics, it corresponds to computing words that are absent from a genome of $k$ chromosomes. This computation generally requires $\Omega(n)$ space for $n=|y|$ using any of the plenty available $\mathcal{O}(n)$-time algorithms. This is because a…

FOS: Computer and information sciencesSettore ING-INF/05 - Sistemi Di Elaborazione Delle InformazioniOutput sensitive algorithmsString algorithmsPhysicsAntidictionarieSettore INF/01 - InformaticaOutput sensitive algorithm0102 computer and information sciencesAbsent wordsSpace (mathematics)01 natural sciencesAntidictionariesCombinatorics010201 computation theory & mathematicsTheoryofComputation_ANALYSISOFALGORITHMSANDPROBLEMCOMPLEXITYData compressionComputer Science - Data Structures and AlgorithmsData Structures and Algorithms (cs.DS)Computer Science::Symbolic Computation[INFO]Computer Science [cs]Absent wordAlphabetWord (group theory)2019 Data Compression Conference (DCC)
researchProduct

Generating a Gray code for prefix normal words in amortized polylogarithmic time per word

2020

A prefix normal word is a binary word with the property that no substring has more $1$s than the prefix of the same length. By proving that the set of prefix normal words is a bubble language, we can exhaustively list all prefix normal words of length $n$ as a combinatorial Gray code, where successive strings differ by at most two swaps or bit flips. This Gray code can be generated in $\Oh(\log^2 n)$ amortized time per word, while the best generation algorithm hitherto has $\Oh(n)$ running time per word. We also present a membership tester for prefix normal words, as well as a novel characterization of bubble languages.

FOS: Computer and information sciencesGeneral Computer ScienceFormal Languages and Automata Theory (cs.FL)Property (programming)combinatorial Gray codeComputer Science - Formal Languages and Automata TheoryData_CODINGANDINFORMATIONTHEORY0102 computer and information sciences02 engineering and technologyCharacterization (mathematics)01 natural sciencesTheoretical Computer ScienceCombinatoricsSet (abstract data type)Gray codeComputer Science - Data Structures and Algorithms0202 electrical engineering electronic engineering information engineeringData Structures and Algorithms (cs.DS)MathematicsAmortized analysisSettore INF/01 - Informaticaprefix normal wordsSubstringcombinatorial generationPrefixjumbled pattern matching010201 computation theory & mathematics020201 artificial intelligence & image processingbinary languagesprefix normal words binary languages combinatorial Gray code combinatorial generation jumbled pattern matchingWord (computer architecture)Theoretical Computer Science
researchProduct

Online Computation of Abelian Runs

2015

Given a word $w$ and a Parikh vector $\mathcal{P}$, an abelian run of period $\mathcal{P}$ in $w$ is a maximal occurrence of a substring of $w$ having abelian period $\mathcal{P}$. We give an algorithm that finds all the abelian runs of period $\mathcal{P}$ in a word of length $n$ in time $O(n\times |\mathcal{P}|)$ and space $O(\sigma+|\mathcal{P}|)$.

FOS: Computer and information sciencesFormal Languages and Automata Theory (cs.FL)Abelian run[INFO.INFO-DS]Computer Science [cs]/Data Structures and Algorithms [cs.DS][INFO.INFO-DS] Computer Science [cs]/Data Structures and Algorithms [cs.DS]Computer Science - Formal Languages and Automata Theory0102 computer and information sciences02 engineering and technology[INFO] Computer Science [cs]01 natural sciencesOnline computationTheoretical Computer ScienceCombinatoricsComputer Science - Data Structures and Algorithms0202 electrical engineering electronic engineering information engineeringData Structures and Algorithms (cs.DS)[INFO]Computer Science [cs]Abelian groupComputingMilieux_MISCELLANEOUSMathematicsCombinatorics on wordDiscrete mathematicsComputer Science (all)020206 networking & telecommunicationsAbelian periodText algorithm16. Peace & justiceSubstringCombinatorics on words010201 computation theory & mathematicsWord (group theory)Computer Science::Formal Languages and Automata Theory
researchProduct

Topological properties of cellular automata on trees

2012

We prove that there do not exist positively expansive cellular automata defined on the full k-ary tree shift (for k>=2). Moreover, we investigate some topological properties of these automata and their relationships, namely permutivity, surjectivity, preinjectivity, right-closingness and openness.

[INFO.INFO-CC]Computer Science [cs]/Computational Complexity [cs.CC]FOS: Computer and information sciencesDiscrete Mathematics (cs.DM)Formal Languages and Automata Theory (cs.FL)FOS: Physical sciencesComputer Science - Formal Languages and Automata Theory0102 computer and information sciences[INFO.INFO-DM]Computer Science [cs]/Discrete Mathematics [cs.DM]Computational Complexity (cs.CC)Topology01 natural scienceslcsh:QA75.5-76.95[INFO.INFO-FL]Computer Science [cs]/Formal Languages and Automata Theory [cs.FL]0101 mathematicsF.1.1;F.1.2;F.1.3MathematicsCellular Automata and Lattice Gases (nlin.CG)lcsh:Mathematics010102 general mathematicsCellular automaton tree shift expansivity permutivity right-closingness opennesslcsh:QA1-939Nonlinear Sciences::Cellular Automata and Lattice GasesCellular automatonAutomatonComputer Science - Computational Complexity010201 computation theory & mathematicsTree (set theory)lcsh:Electronic computers. Computer scienceF.1.2F.1.3ExpansiveNonlinear Sciences - Cellular Automata and Lattice GasesF.1.1Computer Science::Formal Languages and Automata TheoryComputer Science - Discrete Mathematics
researchProduct

Adaptive learning of compressible strings

2020

Suppose an oracle knows a string $S$ that is unknown to us and that we want to determine. The oracle can answer queries of the form "Is $s$ a substring of $S$?". In 1995, Skiena and Sundaram showed that, in the worst case, any algorithm needs to ask the oracle $\sigma n/4 -O(n)$ queries in order to be able to reconstruct the hidden string, where $\sigma$ is the size of the alphabet of $S$ and $n$ its length, and gave an algorithm that spends $(\sigma-1)n+O(\sigma \sqrt{n})$ queries to reconstruct $S$. The main contribution of our paper is to improve the above upper-bound in the context where the string is compressible. We first present a universal algorithm that, given a (computable) compre…

FOS: Computer and information sciencesCentroid decompositionGeneral Computer ScienceString compressionAdaptive learningKolmogorov complexityContext (language use)Data_CODINGANDINFORMATIONTHEORYString reconstructionTheoretical Computer ScienceCombinatoricsString reconstruction; String learning; Adaptive learning; Kolmogorov complexity; String compression; Lempel-Ziv; Centroid decomposition; Suffix treeSuffix treeIntegerComputer Science - Data Structures and AlgorithmsOrder (group theory)Data Structures and Algorithms (cs.DS)Adaptive learning; Centroid decomposition; Kolmogorov complexity; Lempel-Ziv; String compression; String learning; String reconstruction; Suffix treeTime complexityComputer Science::DatabasesMathematicsLempel-ZivSettore INF/01 - InformaticaLinear spaceString (computer science)SubstringBounded functionString learningTheoretical Computer Science
researchProduct

Automata and differentiable words

2011

We exhibit the construction of a deterministic automaton that, given k > 0, recognizes the (regular) language of k-differentiable words. Our approach follows a scheme of Crochemore et al. based on minimal forbidden words. We extend this construction to the case of C\infinity-words, i.e., words differentiable arbitrary many times. We thus obtain an infinite automaton for representing the set of C\infinity-words. We derive a classification of C\infinity-words induced by the structure of the automaton. Then, we introduce a new framework for dealing with \infinity-words, based on a three letter alphabet. This allows us to define a compacted version of the automaton, that we use to prove that ev…

Discrete mathematicsKolakoski wordGeneral Computer ScienceC∞-wordsPowerset constructionTimed automatonPushdown automatonBüchi automatonComputer Science - Formal Languages and Automata TheoryComputer Science::Computation and Language (Computational Linguistics and Natural Language and Speech Processing)68R15AutomataTheoretical Computer ScienceCombinatoricsForbidden wordsDeterministic automatonProbabilistic automatonTwo-way deterministic finite automatonNondeterministic finite automatonC∞ -wordForbidden wordComputer Science::Formal Languages and Automata TheoryComputer Science(all)Computer Science - Discrete MathematicsMathematicsTheoretical Computer Science
researchProduct

Searching for Jumbled Patterns in Strings

2009

Parikh vectors permuted strings pattern matching string algorithms average case analysisString algorithmsAverage case analysis; Parikh vectors; Pattern matching; Permuted strings; String algorithmsPermuted stringsParikh vectorsAverage case analysisPattern matching
researchProduct

Special factors and the combinatorics of suffix and factor automata

2011

AbstractThe suffix automaton (resp. factor automaton) of a finite word w is the minimal deterministic automaton recognizing the set of suffixes (resp. factors) of w. We study the relationships between the structure of the suffix and factor automata and classical combinatorial parameters related to the special factors of w. We derive formulae for the number of states of these automata. We also characterize the languages LSA and LFA of words having respectively suffix automaton and factor automaton with the minimal possible number of states.

Special factorGeneral Computer ScienceSpecial factorsFactor automatonBüchi automatonω-automatonTheoretical Computer ScienceCombinatoricsDeterministic automatonTwo-way deterministic finite automatonNondeterministic finite automatonComputer Science::Data Structures and AlgorithmsCombinatorics on wordStandard Sturmian wordsMathematicsDiscrete mathematicsCombinatorics on wordsDAWGPushdown automatonComputer Science::Computation and Language (Computational Linguistics and Natural Language and Speech Processing)Nonlinear Sciences::Cellular Automata and Lattice GasesSuffix automatonProbabilistic automatonSuffix automatonComputer Science::Formal Languages and Automata TheoryComputer Science(all)Theoretical Computer Science
researchProduct

Algorithms for Anti-Powers in Strings

2018

Abstract A string S [ 1 , n ] is a power (or tandem repeat) of order k and period n / k if it can be decomposed into k consecutive equal-length blocks of letters. Powers and periods are fundamental to string processing, and algorithms for their efficient computation have wide application and are heavily studied. Recently, Fici et al. (Proc. ICALP 2016) defined an anti-power of order k to be a string composed of k pairwise-distinct blocks of the same length ( n / k , called anti-period). Anti-powers are a natural converse to powers, and are objects of combinatorial interest in their own right. In this paper we initiate the algorithmic study of anti-powers. Given a string S, we describe an op…

FOS: Computer and information sciencesDiscrete Mathematics (cs.DM)Formal Languages and Automata Theory (cs.FL)ComputationComputer Science - Formal Languages and Automata Theory0102 computer and information sciencesString processingInformation System01 natural sciencesUpper and lower boundsAnti-powersTheoretical Computer ScienceLemma (logic)ConverseComputer Science - Data Structures and AlgorithmsData Structures and Algorithms (cs.DS)0101 mathematicsMathematicsCombinatorics on wordSignal processingCombinatorics on wordsComputer Science Applications1707 Computer Vision and Pattern RecognitionAnti-power16. Peace & justice113 Computer and information sciencesSubstringComputer Science Applications010101 applied mathematicsAlgorithmCombinatorics on words010201 computation theory & mathematicsSignal ProcessingAlgorithmAlgorithmsInformation SystemsComputer Science - Discrete Mathematics
researchProduct

On Prefix Normal Words

2011

We present a new class of binary words: the prefix normal words. They are defined by the property that for any given length $k$, no factor of length $k$ has more $a$'s than the prefix of the same length. These words arise in the context of indexing for jumbled pattern matching (a.k.a. permutation matching or Parikh vector matching), where the aim is to decide whether a string has a factor with a given multiplicity of characters, i.e., with a given Parikh vector. Using prefix normal words, we give the first non-trivial characterization of binary words having the same set of Parikh vectors of factors. We prove that the language of prefix normal words is not context-free and is strictly contai…

permutation matchingcontext-free languagesSearch engine indexingpre-necklacesBinary numberParikh vectorsComputer Science::Computation and Language (Computational Linguistics and Natural Language and Speech Processing)Lyndon wordsnon- standard pattern matchingLyndon wordsCombinatoricsPrefixjumbled pattern matchingPattern matchingParikh vectors; pre-necklaces; Lyndon words; context-free languages; jumbled pattern matching; permutation matching; non- standard pattern matching; indexingComputer Science::Formal Languages and Automata TheoryParikh vectors pre-necklaces Lyndon words context-free languages jumbled pattern matching permutation matching non-standard pattern matching indexingMathematicsindexing
researchProduct

Minimal Absent Words in Rooted and Unrooted Trees

2019

We extend the theory of minimal absent words to (rooted and unrooted) trees, having edges labeled by letters from an alphabet \(\varSigma \) of cardinality \(\sigma \). We show that the set \(\text {MAW}(T)\) of minimal absent words of a rooted (resp. unrooted) tree T with n nodes has cardinality \(O(n\sigma )\) (resp. \(O(n^{2}\sigma )\)), and we show that these bounds are realized. Then, we exhibit algorithms to compute all minimal absent words in a rooted (resp. unrooted) tree in output-sensitive time \(O(n+|\text {MAW}(T)|)\) (resp. \(O(n^{2}+|\text {MAW}(T)|)\) assuming an integer alphabet of size polynomial in n.

Polynomial (hyperelastic model)050101 languages & linguistics05 social sciencesComputer Science::Computation and Language (Computational Linguistics and Natural Language and Speech Processing)02 engineering and technologyCombinatoricsTree (descriptive set theory)CardinalityInteger0202 electrical engineering electronic engineering information engineering020201 artificial intelligence & image processing0501 psychology and cognitive sciencesAlphabetMinimal Absent Words Rooted trees Unrooted Trees AlgorithmsNonlinear Sciences::Pattern Formation and SolitonsComputer Science::Formal Languages and Automata TheoryMathematics
researchProduct

A characterization of regular circular languages generated by marked splicing systems

2009

AbstractSplicing systems are generative devices of formal languages, introduced by Head in 1987 to model biological phenomena on linear and circular DNA molecules. A splicing system is defined by giving an initial set I and a set R of rules. Some unanswered questions are related to the computational power of circular splicing systems. In particular, a still open question is to find a characterization of circular languages generated by finite circular splicing systems (i.e., circular splicing systems with both I and R finite sets). In this paper we introduce a special class of the latter systems named marked systems. We prove that a marked system S generates a regular circular language if an…

Pure mathematicsGeneral Computer ScienceMolecular computing Splicing systems Circular words Formal languages Automata theoryMolecular computingQuantitative Biology::GenomicsDecidabilityTheoretical Computer ScienceSet (abstract data type)Formal languagesRegular languageFormal languageRNA splicingAutomata theorySplicing systemsCircular wordsFinite setAlgorithmWord (computer architecture)Automata theoryMathematicsComputer Science(all)
researchProduct

Cyclic Complexity of Words

2014

We introduce and study a complexity function on words $c_x(n),$ called \emph{cyclic complexity}, which counts the number of conjugacy classes of factors of length $n$ of an infinite word $x.$ We extend the well-known Morse-Hedlund theorem to the setting of cyclic complexity by showing that a word is ultimately periodic if and only if it has bounded cyclic complexity. Unlike most complexity functions, cyclic complexity distinguishes between Sturmian words of different slopes. We prove that if $x$ is a Sturmian word and $y$ is a word having the same cyclic complexity of $x,$ then up to renaming letters, $x$ and $y$ have the same set of factors. In particular, $y$ is also Sturmian of slope equ…

FOS: Computer and information sciencesDiscrete Mathematics (cs.DM)Formal Languages and Automata Theory (cs.FL)Computer Science - Formal Languages and Automata Theory0102 computer and information sciences68R15Characterization (mathematics)[INFO.INFO-DM]Computer Science [cs]/Discrete Mathematics [cs.DM]01 natural sciencesTheoretical Computer ScienceCombinatoricsConjugacy class[INFO.INFO-FL]Computer Science [cs]/Formal Languages and Automata Theory [cs.FL][MATH.MATH-CO]Mathematics [math]/Combinatorics [math.CO]FOS: MathematicsDiscrete Mathematics and CombinatoricsMathematics - Combinatorics0101 mathematics[MATH]Mathematics [math]Discrete Mathematics and CombinatoricMathematicsDiscrete mathematicsFactor complexity010102 general mathematicsSturmian wordSturmian wordComputer Science::Computation and Language (Computational Linguistics and Natural Language and Speech Processing)Sturmian wordsCyclic complexity factor complexity Sturmian words minimal forbidden factorInfimum and supremumToeplitz matrixComputational Theory and Mathematics010201 computation theory & mathematicsCyclic complexityBounded functionComplexity functionCombinatorics (math.CO)Word (group theory)Computer Science::Formal Languages and Automata TheoryComputer Science - Discrete Mathematics
researchProduct

On the Lie complexity of Sturmian words

2022

Bell and Shallit recently introduced the Lie complexity of an infinite word $s$ as the function counting for each length the number of conjugacy classes of words whose elements are all factors of $s$. They proved, using algebraic techniques, that the Lie complexity is bounded above by the first difference of the factor complexity plus one; hence, it is uniformly bounded for words with linear factor complexity, and, in particular, it is at most 2 for Sturmian words, which are precisely the words with factor complexity $n+1$ for every $n$. In this note, we provide an elementary combinatorial proof of the result of Bell and Shallit and give an exact formula for the Lie complexity of any Sturmi…

FOS: Computer and information sciencesGeneral Computer ScienceSettore INF/01 - InformaticaDiscrete Mathematics (cs.DM)Formal Languages and Automata Theory (cs.FL)Sturmian wordComputer Science - Formal Languages and Automata TheoryComputer Science::Computation and Language (Computational Linguistics and Natural Language and Speech Processing)G.2.168R15Lie complexityTheoretical Computer ScienceLie complexity Sturmian wordFOS: MathematicsMathematics - CombinatoricsCombinatorics (math.CO)Computer Science::Formal Languages and Automata TheoryComputer Science - Discrete Mathematics
researchProduct

Combinatorics of Finite Words and Suffix Automata

2009

The suffix automaton of a finite word is the minimal deterministic automaton accepting the language of its suffixes. The states of the suffix automaton are the classes of an equivalence relation defined on the set of factors. We explore the relationship between the combinatorial properties of a finite word and the structural properties of its suffix automaton. We give formulas for expressing the total number of states and the total number of edges of the suffix automaton in terms of special factors of the word.

Discrete mathematicsComputer Science::Computation and Language (Computational Linguistics and Natural Language and Speech Processing)special factorNonlinear Sciences::Cellular Automata and Lattice GasesCombinatorics on WordAutomatonCombinatoricsCombinatorics on wordsDeterministic automatonSuffix automatonEquivalence relationQuantum finite automataSuffix automatonSuffixComputer Science::Data Structures and AlgorithmsComputer Science::Formal Languages and Automata TheoryWord (computer architecture)Mathematics
researchProduct

On the regularity of circular splicing languages : A survey and new developments

2009

Circular splicing has been introduced to model a specific recombinant behaviour of circular DNA, continuing the investigation initiated with linear splicing. In this paper we focus on the relationship between regular circular languages and languages generated by finite circular splicing systems. We survey the known results towards a characterization of the intersection between these two classes and provide new contributions on the open problem of finding this characterization. First, we exhibit a non-regular circular language generated by a circular simple system thus disproving a known result in this area. Then we give new results related to a restrictive class of circular splicing systems…

Discrete mathematicsComputer scienceOpen problemINF/01 - INFORMATICAGraph theoryCircular wordMolecular computingComputer Science ApplicationsGraph theoryAutomata theory Circular words Formal languages Graph theory Molecular computing Splicing systemsIntersectionFormal languageTheory of computationGraph (abstract data type)CographFormal languageSplicing systemComplement (set theory)Automata theory
researchProduct

Open and Closed Prefixes of Sturmian Words

2013

A word is closed if it contains a proper factor that occurs both as a prefix and as a suffix but does not have internal occurrences, otherwise it is open. We deal with the sequence of open and closed prefixes of Sturmian words and prove that this sequence characterizes every finite or infinite Sturmian word up to isomorphisms of the alphabet. We then characterize the combinatorial structure of the sequence of open and closed prefixes of standard Sturmian words. We prove that every standard Sturmian word, after swapping its first letter, can be written as an infinite product of squares of reversed standard words.

FOS: Computer and information sciencesSequenceFibonacci numberDiscrete Mathematics (cs.DM)Formal Languages and Automata Theory (cs.FL)Sturmian wordStructure (category theory)Sturmian wordInfinite productComputer Science::Computation and Language (Computational Linguistics and Natural Language and Speech Processing)Computer Science - Formal Languages and Automata Theory68R15CombinatoricsPrefixComputer Science::Discrete MathematicsCombinatorics on words Sturmian wordFOS: MathematicsMathematics - CombinatoricsClosed wordsCombinatorics (math.CO)SuffixWord (group theory)Computer Science::Formal Languages and Automata TheoryMathematicsComputer Science - Discrete Mathematics
researchProduct

Abelian Powers and Repetitions in Sturmian Words

2016

Richomme, Saari and Zamboni (J. Lond. Math. Soc. 83: 79-95, 2011) proved that at every position of a Sturmian word starts an abelian power of exponent $k$ for every $k > 0$. We improve on this result by studying the maximum exponents of abelian powers and abelian repetitions (an abelian repetition is an analogue of a fractional power) in Sturmian words. We give a formula for computing the maximum exponent of an abelian power of abelian period $m$ starting at a given position in any Sturmian word of rotation angle $\alpha$. vAs an analogue of the critical exponent, we introduce the abelian critical exponent $A(s_\alpha)$ of a Sturmian word $s_\alpha$ of angle $\alpha$ as the quantity $A(s_\a…

FOS: Computer and information sciencesFibonacci numberGeneral Computer ScienceDiscrete Mathematics (cs.DM)Formal Languages and Automata Theory (cs.FL)[INFO.INFO-DS]Computer Science [cs]/Data Structures and Algorithms [cs.DS]Computer Science - Formal Languages and Automata Theory0102 computer and information sciences01 natural sciencesTheoretical Computer ScienceCombinatoricsFOS: MathematicsMathematics - Combinatorics[INFO]Computer Science [cs]Number Theory (math.NT)0101 mathematicsAbelian groupContinued fractionFibonacci wordComputingMilieux_MISCELLANEOUSQuotientMathematicsMathematics - Number Theoryta111010102 general mathematicsComputer Science (all)Sturmian wordSturmian wordAbelian period; Abelian power; Critical exponent; Lagrange constant; Sturmian word; Theoretical Computer Science; Computer Science (all)Abelian periodLagrange constantCritical exponentAbelian power010201 computation theory & mathematicsBounded functionExponentCombinatorics (math.CO)Computer Science::Formal Languages and Automata TheoryComputer Science - Discrete Mathematics
researchProduct

On the Structure of Bispecial Sturmian Words

2013

A balanced word is one in which any two factors of the same length contain the same number of each letter of the alphabet up to one. Finite binary balanced words are called Sturmian words. A Sturmian word is bispecial if it can be extended to the left and to the right with both letters remaining a Sturmian word. There is a deep relation between bispecial Sturmian words and Christoffel words, that are the digital approximations of Euclidean segments in the plane. In 1997, J. Berstel and A. de Luca proved that \emph{palindromic} bispecial Sturmian words are precisely the maximal internal factors of \emph{primitive} Christoffel words. We extend this result by showing that bispecial Sturmian wo…

FOS: Computer and information sciencesGeneral Computer ScienceSpecial factorDiscrete Mathematics (cs.DM)Computer Networks and CommunicationsApproximations of πFormal Languages and Automata Theory (cs.FL)Computer Science - Formal Languages and Automata TheoryEnumerative formula68R15Characterization (mathematics)Minimal forbidden wordTheoretical Computer ScienceCombinatoricsComputer Science::Discrete MathematicsEuclidean geometryPhysics::Atomic PhysicsMathematicsChristoffel symbolsApplied MathematicsPalindromeSturmian wordSturmian wordComputer Science::Computation and Language (Computational Linguistics and Natural Language and Speech Processing)Combinatorics on wordsComputational Theory and MathematicsWord (group theory)Computer Science::Formal Languages and Automata TheoryChristoffel wordComputer Science - Discrete Mathematics
researchProduct

On prefix normal words and prefix normal forms

2016

A $1$-prefix normal word is a binary word with the property that no factor has more $1$s than the prefix of the same length; a $0$-prefix normal word is defined analogously. These words arise in the context of indexed binary jumbled pattern matching, where the aim is to decide whether a word has a factor with a given number of $1$s and $0$s (a given Parikh vector). Each binary word has an associated set of Parikh vectors of the factors of the word. Using prefix normal words, we provide a characterization of the equivalence class of binary words having the same set of Parikh vectors of their factors. We prove that the language of prefix normal words is not context-free and is strictly contai…

FOS: Computer and information sciencesPrefix codePrefix normal wordPre-necklaceDiscrete Mathematics (cs.DM)General Computer ScienceFormal Languages and Automata Theory (cs.FL)Binary numberComputer Science - Formal Languages and Automata TheoryContext (language use)Binary languageLyndon words0102 computer and information sciences02 engineering and technologyPrefix grammarprefix normal formsKraft's inequalityCharacterization (mathematics)Lyndon word01 natural sciencesPrefix normal formenumerationTheoretical Computer ScienceFOS: Mathematics0202 electrical engineering electronic engineering information engineeringMathematics - CombinatoricsMathematicsDiscrete mathematicsprefix normal words prefix normal forms binary languages binary jumbled pattern matching pre-necklaces Lyndon words enumerationbinary jumbled pattern matchingSettore INF/01 - InformaticaComputer Science (all)pre-necklacesComputer Science::Computation and Language (Computational Linguistics and Natural Language and Speech Processing)prefix normal wordsPrefix010201 computation theory & mathematics020201 artificial intelligence & image processingCombinatorics (math.CO)binary languagesComputer Science::Formal Languages and Automata TheoryWord (group theory)Computer Science - Discrete MathematicsTheoretical Computer Science
researchProduct

On Table Arrangements, Scrabble Freaks, and Jumbled Pattern Matching

2010

Given a string s, the Parikh vector of s, denoted p(s), counts the multiplicity of each character in s. Searching for a match of Parikh vector q (a “jumbled string”) in the text s requires to find a substring t of s with p(t) = q. The corresponding decision problem is to verify whether at least one such match exists. So, for example for the alphabet Σ = {a, b, c}, the string s = abaccbabaaa has Parikh vector p(s) = (6,3,2), and the Parikh vector q = (2,1,1) appears once in s in position (1,4). Like its more precise counterpart, the renown Exact String Matching, Jumbled Pattern Matching has ubiquitous applications, e.g., string matching with a dyslectic word processor, table rearrangements, …

Discrete mathematicsParikh vectors jumbled pattern matching scrabble approximate pattern matching000AnagramParikh vectorsString searching algorithmApproximate string matchingDecision problemalgorithmsData structureJumbled Pattern MatchingSubstringscrabbleapproximate pattern matchingString MatchingWavelet TreePattern matchingMathematics
researchProduct

On Sets of Words of Rank Two

2019

Given a (finite or infinite) subset X of the free monoid A∗ over a finite alphabet A, the rank of X is the minimal cardinality of a set F such that X⊆ F∗. A submonoid M generated by k elements of A∗ is k-maximal if there does not exist another submonoid generated by at most k words containing M. We call a set X⊆ A∗ primitive if it is the basis of a |X|-maximal submonoid. This extends the notion of primitive word: indeed, w is a primitive set if and only if w is a primitive word. By definition, for any set X, there exists a primitive set Y such that X⊆ Y∗. The set Y is therefore called a primitive root of X. As a main result, we prove that if a set has rank 2, then it has a unique primitive …

Hidden repetitionPrimitive setExistential quantificationBinary rootk-maximal monoidPseudo-repetitionBasis (universal algebra)CombinatoricsSet (abstract data type)RepetitionCardinalityFree monoidRank (graph theory)Primitive root modulo nComputer Science::Formal Languages and Automata TheoryWord (group theory)Mathematics
researchProduct

Factorizations of the Fibonacci Infinite Word

2015

The aim of this note is to survey the factorizations of the Fibonacci infinite word that make use of the Fibonacci words and other related words, and to show that all these factorizations can be easily derived in sequence starting from elementary properties of the Fibonacci numbers.

FOS: Computer and information sciencesDiscrete Mathematics (cs.DM)Formal Languages and Automata Theory (cs.FL)Crochemore factorizationComputer Science - Formal Languages and Automata Theory68R15Fibonacci wordLempel-Ziv factorizationLyndon factorizationFOS: MathematicsDiscrete Mathematics and CombinatoricsMathematics - CombinatoricsZeckendorf representationCrochemore factorization; Fibonacci word; Lempel-Ziv factorization; Lyndon factorization; Zeckendorf representation; Discrete Mathematics and CombinatoricsCombinatorics (math.CO)Computer Science - Discrete Mathematics
researchProduct

On the least number of palindromes contained in an infinite word

2013

We investigate the least number of palindromic factors in an infinite word. We first consider general alphabets, and give answers to this problem for periodic and non-periodic words, closed or not under reversal of factors. We then investigate the same problem when the alphabet has size two.

FOS: Computer and information sciencesGeneral Computer ScienceDiscrete Mathematics (cs.DM)Formal Languages and Automata Theory (cs.FL)Computer Science - Formal Languages and Automata Theory0102 computer and information sciences68R1501 natural sciencesTheoretical Computer ScienceCombinatorics[MATH.MATH-CO]Mathematics [math]/Combinatorics [math.CO]FOS: MathematicsMathematics - CombinatoricsPalindromes0101 mathematicsComputingMilieux_MISCELLANEOUSMathematicsCombinatorics on wordDiscrete mathematics010102 general mathematicsPalindromeCombinatorics on words010201 computation theory & mathematicsCombinatorics (math.CO)AlphabetWord (group theory)Computer Science::Formal Languages and Automata TheoryComputer Science - Discrete Mathematics
researchProduct

Fast computation of abelian runs

2016

Given a word $w$ and a Parikh vector $\mathcal{P}$, an abelian run of period $\mathcal{P}$ in $w$ is a maximal occurrence of a substring of $w$ having abelian period $\mathcal{P}$. Our main result is an online algorithm that, given a word $w$ of length $n$ over an alphabet of cardinality $\sigma$ and a Parikh vector $\mathcal{P}$, returns all the abelian runs of period $\mathcal{P}$ in $w$ in time $O(n)$ and space $O(\sigma+p)$, where $p$ is the norm of $\mathcal{P}$, i.e., the sum of its components. We also present an online algorithm that computes all the abelian runs with periods of norm $p$ in $w$ in time $O(np)$, for any given norm $p$. Finally, we give an $O(n^2)$-time offline randomi…

FOS: Computer and information sciencesGeneral Computer ScienceComputationAbelian run[INFO.INFO-DS]Computer Science [cs]/Data Structures and Algorithms [cs.DS]Elementary abelian group0102 computer and information sciences02 engineering and technology01 natural sciencesRank of an abelian groupTheoretical Computer ScienceCombinatoricsComputer Science - Data Structures and Algorithms0202 electrical engineering electronic engineering information engineeringData Structures and Algorithms (cs.DS)[INFO]Computer Science [cs]Online algorithmAbelian groupComputingMilieux_MISCELLANEOUSMathematicsCombinatorics on wordDiscrete mathematicsComputer Science (all)Abelian periodText algorithm16. Peace & justiceSubstringRandomized algorithmCombinatorics on words010201 computation theory & mathematics020201 artificial intelligence & image processingComputer Science::Formal Languages and Automata Theory
researchProduct

Properties of a Class of Toeplitz Words

2021

We study the properties of the uncountable set of Stewart words. These are Toeplitz words specified by infinite sequences of Toeplitz patterns of the form $\alpha\beta\gamma$, where $\alpha,\beta,\gamma$ is any permutation of the symbols 0,1,?. We determine the critical exponent of the Stewart words, prove that they avoid the pattern $xxyyxx$, find all factors that are palindromes, and determine their subword complexity. An interesting aspect of our work is that we use automata-theoretic methods and a decision procedure for automata to carry out the proofs.

FOS: Computer and information sciencesDecision procedureSubword complexityDiscrete Mathematics (cs.DM)Combinatorics on wordsSettore INF/01 - InformaticaGeneral Computer ScienceFormal Languages and Automata Theory (cs.FL)Computer Science - Formal Languages and Automata TheoryToeplitz wordTheoretical Computer ScienceComputer Science::Discrete MathematicsPattern avoidanceFOS: MathematicsAutomatic sequenceMathematics - CombinatoricsCombinatorics (math.CO)Computer Science::Formal Languages and Automata TheoryComputer Science - Discrete Mathematics
researchProduct

On Approximate Jumbled Pattern Matching in Strings

2011

Given a string s, the Parikh vector of s, denoted p(s), counts the multiplicity of each character in s. Searching for a match of a Parikh vector q in the text s requires finding a substring t of s with p(t) = q. This can be viewed as the task of finding a jumbled (permuted) version of a query pattern, hence the term Jumbled Pattern Matching. We present several algorithms for the approximate version of the problem: Given a string s and two Parikh vectors u, v (the query bounds), find all maximal occurrences in s of some Parikh vector q such that u <= q <= v. This definition encompasses several natural versions of approximate Parikh vector search. We present an algorithm solving this problem …

Parikh vectors: Average case analysiApproximate searchString algorithmsDiscrete mathematicsWeight functionanalysisSearch engine indexingParikh vectorsAverage case analysisApproximate string matchingSubstringString algorithmTheoretical Computer ScienceCombinatoricsComputational Theory and MathematicsString algorithms Pattern matching Parikh vectors Average case analysis Approximate search Permuted stringsPermuted stringsAverage caseTheory of computationWavelet TreePreprocessorPattern matchingPattern matchingMathematicsTheory of Computing Systems
researchProduct

Some Remarks on Differentiable Sequences and Recursivity

2010

International audience; We investigate the recursive structure of differentiable sequences over the alphabet {1, 2}. We derive a recursive formula for the (n + 1)-th symbol of a differentiable sequence, which yields to a new recursive formula for the Kolakoski sequence. Finally, we show that the sequence of absolute differences of consecutive symbols of a differentiable sequence u is a morphic image of the run-length encoding of u.

Kolakoski word[MATH.MATH-CO]Mathematics [math]/Combinatorics [math.CO]recursivitydifferentiable wordscombinatorics on words68R15[INFO.INFO-DM]Computer Science [cs]/Discrete Mathematics [cs.DM]Kolakoski sequence recursivity
researchProduct

Word assembly through minimal forbidden words

2006

AbstractWe give a linear-time algorithm to reconstruct a finite word w over a finite alphabet A of constant size starting from a finite set of factors of w verifying a suitable hypothesis. We use combinatorics techniques based on the minimal forbidden words, which have been introduced in previous papers. This improves a previous algorithm which worked under the assumption of stronger hypothesis.

General Computer ScienceFragment assemblyFactor automaton[INFO.INFO-DS]Computer Science [cs]/Data Structures and Algorithms [cs.DS][INFO.INFO-DS] Computer Science [cs]/Data Structures and Algorithms [cs.DS]0102 computer and information sciences02 engineering and technology01 natural sciencesMinimal forbidden wordTheoretical Computer ScienceCombinatorics0202 electrical engineering electronic engineering information engineeringFinite setComputingMilieux_MISCELLANEOUSCombinatorics on wordMathematicsShortest superstringCombinatorics on wordsRepetition index16. Peace & justice010201 computation theory & mathematics020201 artificial intelligence & image processingAlphabetConstant (mathematics)Word (computer architecture)Computer Science::Formal Languages and Automata TheoryComputer Science(all)
researchProduct

A subquadratic algorithm for minimum palindromic factorization

2014

We give an $\mathcal{O}(n \log n)$-time, $\mathcal{O}(n)$-space algorithm for factoring a string into the minimum number of palindromic substrings. That is, given a string $S [1..n]$, in $\mathcal{O}(n \log n)$ time our algorithm returns the minimum number of palindromes $S_1,\ldots, S_\ell$ such that $S = S_1 \cdots S_\ell$. We also show that the time complexity is $\mathcal{O}(n)$ on average and $\Omega(n\log n)$ in the worst case. The last result is based on a characterization of the palindromic structure of Zimin words.

FOS: Computer and information sciencesDiscrete Mathematics (cs.DM)PalindromeCharacterization (mathematics)Binary logarithmOmegaSubstringTheoretical Computer ScienceString algorithmComputational Theory and MathematicsFactorizationComputer Science - Data Structures and AlgorithmsC++ string handlingPalindromeDiscrete Mathematics and CombinatoricsData Structures and Algorithms (cs.DS)FactorizationTime complexityAlgorithmMathematicsComputer Science - Discrete Mathematics
researchProduct

Anti-powers in infinite words

2018

In combinatorics of words, a concatenation of $k$ consecutive equal blocks is called a power of order $k$. In this paper we take a different point of view and define an anti-power of order $k$ as a concatenation of $k$ consecutive pairwise distinct blocks of the same length. As a main result, we show that every infinite word contains powers of any order or anti-powers of any order. That is, the existence of powers or anti-powers is an unavoidable regularity. Indeed, we prove a stronger result, which relates the density of anti-powers to the existence of a factor that occurs with arbitrary exponent. As a consequence, we show that in every aperiodic uniformly recurrent word, anti-powers of ev…

FOS: Computer and information sciencesDiscrete Mathematics (cs.DM)Formal Languages and Automata Theory (cs.FL)ConcatenationComputer Science - Formal Languages and Automata Theory68R150102 computer and information sciences01 natural sciencesTheoretical Computer ScienceCombinatoricsUnavoidable regularityPosition (vector)Infinite wordAvoidability[MATH.MATH-CO]Mathematics [math]/Combinatorics [math.CO]FOS: MathematicsMathematics - CombinatoricsDiscrete Mathematics and CombinatoricsOrder (group theory)Point (geometry)0101 mathematicsDiscrete Mathematics and CombinatoricMathematicsDiscrete mathematics000 Computer science knowledge general worksAnti-power010101 applied mathematicsComputational Theory and Mathematics010201 computation theory & mathematicsAperiodic graphComputer ScienceExponentPairwise comparisonCombinatorics (math.CO)SoftwareWord (group theory)Computer Science - Discrete Mathematics
researchProduct

Universal Lyndon Words

2014

A word w over an alphabet Σ is a Lyndon word if there exists an order defined on Σ for which w is lexicographically smaller than all of its conjugates (other than itself). We introduce and study universal Lyndon words, which are words over an n-letter alphabet that have length n! and such that all the conjugates are Lyndon words. We show that universal Lyndon words exist for every n and exhibit combinatorial and structural properties of these words. We then define particular prefix codes, which we call Hamiltonian lex-codes, and show that every Hamiltonian lex-code is in bijection with the set of the shortest unrepeated prefixes of the conjugates of a universal Lyndon word. This allows us t…

Discrete mathematicsExistential quantificationLyndon word Universal cycle Universal Lyndon wordComputer Science::Computation and Language (Computational Linguistics and Natural Language and Speech Processing)Lyndon word Universal cycle Universal Lyndon word Lex-codeLexicographical orderLyndon wordUniversal Lyndon wordLyndon wordsPrefixCombinatoricsMathematics::Group TheoryCombinatorics on wordsComputer Science::Discrete MathematicsUniversal cycleBijectionAlphabetMathematics::Representation TheoryComputer Science::Formal Languages and Automata TheoryMathematics
researchProduct

Abelian antipowers in infinite words

2019

Abstract An abelian antipower of order k (or simply an abelian k-antipower) is a concatenation of k consecutive words of the same length having pairwise distinct Parikh vectors. This definition generalizes to the abelian setting the notion of a k-antipower, as introduced in Fici et al. (2018) [7] , that is a concatenation of k pairwise distinct words of the same length. We aim to study whether a word contains abelian k-antipowers for arbitrarily large k. S. Holub proved that all paperfolding words contain abelian powers of every order (Holub, 2013 [8] ). We show that they also contain abelian antipowers of every order.

Settore ING-INF/05 - Sistemi Di Elaborazione Delle InformazioniSierpiǹski wordSettore INF/01 - InformaticaApplied MathematicsConcatenationAbelian complexityCombinatoricsArbitrarily largeOrder (group theory)Pairwise comparisonk-antipowerAbelian groupPaperfolding wordComputer Science::Formal Languages and Automata TheoryWord (group theory)Abelian antipowerMathematicsAdvances in Applied Mathematics
researchProduct

Reverse-Safe Text Indexing

2021

We introduce the notion of reverse-safe data structures. These are data structures that prevent the reconstruction of the data they encode (i.e., they cannot be easily reversed). A data structure D is called z - reverse-safe when there exist at least z datasets with the same set of answers as the ones stored by D . The main challenge is to ensure that D stores as many answers to useful queries as possible, is constructed efficiently, and has size close to the size of the original dataset it encodes. Given a text of length n and an integer z , we propose an algorithm that constructs a z -reverse-safe data structure ( z -RSDS) that has size O(n) and answers decision and counting pattern matc…

Data structuresComputer scienceSuffix treesuffix tree0102 computer and information sciences02 engineering and technologytext indexing01 natural sciencesTheoretical Computer Sciencelaw.inventionSet (abstract data type)law020204 information systems0202 electrical engineering electronic engineering information engineeringPattern matchingdata privacySettore INF/01 - InformaticaSearch engine indexingdata privacy; Data structures; pattern matching; suffix tree; text indexingData structureMatrix multiplicationpattern matching010201 computation theory & mathematicsData structureAlgorithmAdversary modelInteger (computer science)ACM Journal of Experimental Algorithmics
researchProduct

Abelian combinatorics on words: A survey

2022

We survey known results and open problems in abelian combinatorics on words. Abelian combinatorics on words is the extension to the commutative setting of the classical theory of combinatorics on words. The extension is based on \emph{abelian equivalence}, which is the equivalence relation defined in the set of words by having the same Parikh vector, that is, the same number of occurrences of each letter of the alphabet. In the past few years, there was a lot of research on abelian analogues of classical definitions and properties in combinatorics on words. This survey aims to gather these results.

FOS: Computer and information sciencesDiscrete Mathematics (cs.DM)General Computer ScienceFormal Languages and Automata Theory (cs.FL)Computer Science - Formal Languages and Automata TheoryAbelian combinatorics on word68R15Discrete mathematicsTheoretical Computer ScienceFOS: MathematicsMathematics - CombinatoricsCombinatorics (math.CO)Computer Science - Discrete MathematicsCombinatorics on wordComputer Science Review
researchProduct

Computing abelian periods in words

2011

International audience

Abelian period Abelian repetition weak repetition design of algorithms text algorithms combinatorics on words[INFO.INFO-DS]Computer Science [cs]/Data Structures and Algorithms [cs.DS]ComputingMilieux_MISCELLANEOUS
researchProduct

Linear-time sequence comparison using minimal absent words & applications

2016

Sequence comparison is a prerequisite to virtually all comparative genomic analyses. It is often realized by sequence alignment techniques, which are computationally expensive. This has led to increased research into alignment-free techniques, which are based on measures referring to the composition of sequences in terms of their constituent patterns. These measures, such as q-gram distance, are usually computed in time linear with respect to the length of the sequences. In this article, we focus on the complementary idea: how two sequences can be efficiently compared based on information that does not occur in the sequences. A word is an absent word of some sequence if it does not occur in…

0301 basic medicineLatin AmericansComputer Science (all)Library science0102 computer and information sciencesCircular wordAlgorithms on string01 natural sciencesAlignmentfree comparisonSequence comparisonTheoretical Computer Science03 medical and health sciences030104 developmental biology010201 computation theory & mathematicsInformaticsPolitical scienceAbsent wordForbidden word
researchProduct

Abelian Repetitions in Sturmian Words

2012

We investigate abelian repetitions in Sturmian words. We exploit a bijection between factors of Sturmian words and subintervals of the unitary segment that allows us to study the periods of abelian repetitions by using classical results of elementary Number Theory. We prove that in any Sturmian word the superior limit of the ratio between the maximal exponent of an abelian repetition of period $m$ and $m$ is a number $\geq\sqrt{5}$, and the equality holds for the Fibonacci infinite word. We further prove that the longest prefix of the Fibonacci infinite word that is an abelian repetition of period $F_j$, $j&gt;1$, has length $F_j(F_{j+1}+F_{j-1} +1)-2$ if $j$ is even or $F_j(F_{j+1}+F_{j-1}…

FOS: Computer and information sciencesFibonacci numberDiscrete Mathematics (cs.DM)Formal Languages and Automata Theory (cs.FL)Computer Science - Formal Languages and Automata TheoryG.2.168R15FOS: MathematicsCombinatorics on words Sturmian wordMathematics - CombinatoricsAbelian groupFibonacci wordMathematicsDiscrete mathematicsMathematics::CombinatoricsSturmian wordCombinatorics on wordsNumber theoryF.2.2; F.4.3; G.2.1F.4.3ExponentCombinatorics (math.CO)F.2.2Word (group theory)Computer Science::Formal Languages and Automata TheoryComputer Science - Discrete Mathematics
researchProduct

On the Greedy Algorithm for the Shortest Common Superstring Problem with Reversals

2015

We study a variation of the classical Shortest Common Superstring (SCS) problem in which a shortest superstring of a finite set of strings $S$ is sought containing as a factor every string of $S$ or its reversal. We call this problem Shortest Common Superstring with Reversals (SCS-R). This problem has been introduced by Jiang et al., who designed a greedy-like algorithm with length approximation ratio $4$. In this paper, we show that a natural adaptation of the classical greedy algorithm for SCS has (optimal) compression ratio $\frac12$, i.e., the sum of the overlaps in the output string is at least half the sum of the overlaps in an optimal solution. We also provide a linear-time implement…

FOS: Computer and information sciences0102 computer and information sciences02 engineering and technologyInformation System01 natural sciencesString (physics)Theoretical Computer ScienceCombinatoricsHigh Energy Physics::TheoryAnalysis of algorithmGreedy algorithmComputer Science - Data Structures and Algorithms0202 electrical engineering electronic engineering information engineeringData Structures and Algorithms (cs.DS)Greedy algorithmFinite setAnalysis of algorithmsMathematicsSuperstring theoryShortest Common SuperstringComputer Science Applications1707 Computer Vision and Pattern RecognitionComputer Science ApplicationsReversalShortest Path Faster Algorithm010201 computation theory & mathematicsCompression ratioSignal Processing020201 artificial intelligence & image processingK shortest path routingInformation Systems
researchProduct

Minimal forbidden factors of circular words

2017

Minimal forbidden factors are a useful tool for investigating properties of words and languages. Two factorial languages are distinct if and only if they have different (antifactorial) sets of minimal forbidden factors. There exist algorithms for computing the minimal forbidden factors of a word, as well as of a regular factorial language. Conversely, Crochemore et al. [IPL, 1998] gave an algorithm that, given the trie recognizing a finite antifactorial language $M$, computes a DFA recognizing the language whose set of minimal forbidden factors is $M$. In the same paper, they showed that the obtained DFA is minimal if the input trie recognizes the minimal forbidden factors of a single word.…

FOS: Computer and information sciencesSettore ING-INF/05 - Sistemi Di Elaborazione Delle InformazioniGeneral Computer ScienceDiscrete Mathematics (cs.DM)Finite automatonSettore INF/01 - InformaticaFormal Languages and Automata Theory (cs.FL)Factor automatonComputer Science - Formal Languages and Automata TheoryComputer Science::Computation and Language (Computational Linguistics and Natural Language and Speech Processing)Circular wordFibonacci wordMinimal forbidden factorTheoretical Computer ScienceComputer Science::Formal Languages and Automata TheoryComputer Science - Discrete Mathematics
researchProduct

On the Number of Closed Factors in a Word

2015

A closed word (a.k.a. periodic-like word or complete first return) is a word whose longest border does not have internal occurrences, or, equivalently, whose longest repeated prefix is not right special. We investigate the structure of closed factors of words. We show that a word of length $n$ contains at least $n+1$ distinct closed factors, and characterize those words having exactly $n+1$ closed factors. Furthermore, we show that a word of length $n$ can contain $\Theta(n^{2})$ many distinct closed factors.

FOS: Computer and information sciencesClosed wordCombinatorics on wordsComplete returnFormal Languages and Automata Theory (cs.FL)Computer scienceComputer Science (all)Structure (category theory)Computer Science - Formal Languages and Automata TheoryCombinatorics on words Closed word Complete return Rich word Bitonic word68R15Theoretical Computer ScienceCombinatoricsPrefixCombinatorics on wordsRich wordBitonic wordFOS: MathematicsMathematics - CombinatoricsCombinatorics (math.CO)ArithmeticWord (computer architecture)Combinatorics on word
researchProduct

The sequence of open and closed prefixes of a Sturmian word

2017

A finite word is closed if it contains a factor that occurs both as a prefix and as a suffix but does not have internal occurrences, otherwise it is open. We are interested in the {\it oc-sequence} of a word, which is the binary sequence whose $n$-th element is $0$ if the prefix of length $n$ of the word is open, or $1$ if it is closed. We exhibit results showing that this sequence is deeply related to the combinatorial and periodic structure of a word. In the case of Sturmian words, we show that these are uniquely determined (up to renaming letters) by their oc-sequence. Moreover, we prove that the class of finite Sturmian words is a maximal element with this property in the class of binar…

FOS: Computer and information sciencesDiscrete Mathematics (cs.DM)Formal Languages and Automata Theory (cs.FL)Sturmian word closed wordComputer Science - Formal Languages and Automata Theory0102 computer and information sciences68R1501 natural sciencesPseudorandom binary sequenceCombinatorics[MATH.MATH-CO]Mathematics [math]/Combinatorics [math.CO]FOS: MathematicsMathematics - Combinatorics0101 mathematicsMathematicsSequenceClosed wordSettore INF/01 - InformaticaApplied Mathematics010102 general mathematicsSturmian wordSturmian wordPrefix010201 computation theory & mathematicsCombinatorics (math.CO)SuffixElement (category theory)Word (computer architecture)Maximal elementComputer Science - Discrete Mathematics
researchProduct

Minimal Forbidden Factors of Circular Words

2017

Minimal forbidden factors are a useful tool for investigating properties of words and languages. Two factorial languages are distinct if and only if they have different (antifactorial) sets of minimal forbidden factors. There exist algorithms for computing the minimal forbidden factors of a word, as well as of a regular factorial language. Conversely, Crochemore et al. [IPL, 1998] gave an algorithm that, given the trie recognizing a finite antifactorial language M, computes a DFA of the language having M as set of minimal forbidden factors. In the same paper, they showed that the obtained DFA is minimal if the input trie recognizes the minimal forbidden factors of a single word. We gener…

L-automatonDiscrete mathematicsFactorialFibonacci numberSettore INF/01 - InformaticaComputer Science (all)Computer Science::Computation and Language (Computational Linguistics and Natural Language and Speech Processing)0102 computer and information sciences02 engineering and technologyCircular wordMinimal forbidden factor01 natural sciencesTheoretical Computer ScienceSet (abstract data type)010201 computation theory & mathematicsIf and only ifTrie0202 electrical engineering electronic engineering information engineering020201 artificial intelligence & image processingComputer Science::Formal Languages and Automata TheoryWord (computer architecture)Mathematics
researchProduct

Normal, Abby Normal, Prefix Normal

2014

A prefix normal word is a binary word with the property that no substring has more 1s than the prefix of the same length. This class of words is important in the context of binary jumbled pattern matching. In this paper we present results about the number \(\textit{pnw}(n)\) of prefix normal words of length n, showing that \(\textit{pnw}(n) =\Omega\left(2^{n - c\sqrt{n\ln n}}\right)\) for some c and \(\textit{pnw}(n) = O \left(\frac{2^n (\ln n)^2}{n}\right)\). We introduce efficient algorithms for testing the prefix normal property and a “mechanical algorithm” for computing prefix normal forms. We also include games which can be played with prefix normal words. In these games Alice wishes t…

binary jumbled pattern matchingEfficient algorithmmembership testBinary numberContext (language use)Prefix Normal Word AlgorithmData_CODINGANDINFORMATIONTHEORYprefix normal wordsOmegaSubstringenumerationCombinatoricsPrefixprefix normal words; binary jumbled pattern matching; normal forms; enumeration; membership test; binary languagesEnumerationnormal formsbinary languagesWord (group theory)Mathematics
researchProduct

Primitive sets of words

2020

Given a (finite or infinite) subset $X$ of the free monoid $A^*$ over a finite alphabet $A$, the rank of $X$ is the minimal cardinality of a set $F$ such that $X \subseteq F^*$. We say that a submonoid $M$ generated by $k$ elements of $A^*$ is {\em $k$-maximal} if there does not exist another submonoid generated by at most $k$ words containing $M$. We call a set $X \subseteq A^*$ {\em primitive} if it is the basis of a $|X|$-maximal submonoid. This definition encompasses the notion of primitive word -- in fact, $\{w\}$ is a primitive set if and only if $w$ is a primitive word. By definition, for any set $X$, there exists a primitive set $Y$ such that $X \subseteq Y^*$. We therefore call $Y$…

FOS: Computer and information sciencesPrimitive setDiscrete Mathematics (cs.DM)General Computer ScienceFormal Languages and Automata Theory (cs.FL)Pseudo-repetitionComputer Science - Formal Languages and Automata Theory0102 computer and information sciences02 engineering and technology01 natural sciencesTheoretical Computer ScienceCombinatoricsCardinalityFree monoidBi-rootFOS: Mathematics0202 electrical engineering electronic engineering information engineeringMathematics - CombinatoricsRank (graph theory)Primitive root modulo nMathematicsHidden repetitionSettore INF/01 - InformaticaIntersection (set theory)k-maximal monoidFunction (mathematics)Basis (universal algebra)010201 computation theory & mathematics020201 artificial intelligence & image processingCombinatorics (math.CO)Computer Science::Formal Languages and Automata TheoryWord (group theory)Computer Science - Discrete Mathematics
researchProduct

Bacteria classification using minimal absent words

2017

Bacteria classification has been deeply investigated with different tools for many purposes, such as early diagnosis, metagenomics, phylogenetics. Classification methods based on ribosomal DNA sequences are considered a reference in this area. We present a new classificatier for bacteria species based on a dissimilarity measure of purely combinatorial nature. This measure is based on the notion of Minimal Absent Words, a combinatorial definition that recently found applications in bioinformatics. We can therefore incorporate this measure into a probabilistic neural network in order to classify bacteria species. Our approach is motivated by the fact that there is a vast literature on the com…

0301 basic medicinesupervised classificationRelation (database)Computer science0102 computer and information sciences01 natural sciencesMeasure (mathematics)03 medical and health sciencesProbabilistic neural networkcombinatorics on wordsprobabilistic neural networkminimal absent wordlcsh:R5-920Settore INF/01 - Informaticabusiness.industryBacterial taxonomyPattern recognitionbacteria classificationGeneral MedicineCombinatorics on words030104 developmental biology010201 computation theory & mathematicsMetagenomicsClassification methodsArtificial intelligencebusinesslcsh:Medicine (General)AIMS Medical Science
researchProduct

Algorithms for Computing Abelian Periods of Words

2012

Constantinescu and Ilie (Bulletin EATCS 89, 167--170, 2006) introduced the notion of an \emph{Abelian period} of a word. A word of length $n$ over an alphabet of size $\sigma$ can have $\Theta(n^{2})$ distinct Abelian periods. The Brute-Force algorithm computes all the Abelian periods of a word in time $O(n^2 \times \sigma)$ using $O(n \times \sigma)$ space. We present an off-line algorithm based on a $\sel$ function having the same worst-case theoretical complexity as the Brute-Force one, but outperforming it in practice. We then present on-line algorithms that also enable to compute all the Abelian periods of all the prefixes of $w$.

FOS: Computer and information sciencesDiscrete Mathematics (cs.DM)Abelian repetitionElementary abelian groupRank of an abelian groupCombinatoricsComputer Science - Data Structures and AlgorithmsFOS: MathematicsDiscrete Mathematics and CombinatoricsMathematics - CombinatoricsData Structures and Algorithms (cs.DS)Abelian groupOnline algorithmMathematicsArithmetic of abelian varietiesDiscrete mathematicsCombinatorics on wordsApplied MathematicsAbelian periodText algorithmWeak repetitionPrefixCombinatorics on wordsDesign of algorithmCombinatorics (math.CO)AlgorithmWord (computer architecture)Computer Science::Formal Languages and Automata TheoryComputer Science - Discrete Mathematics
researchProduct

Words with the Maximum Number of Abelian Squares

2015

An abelian square is the concatenation of two words that are anagrams of one another. A word of length n can contain \(\varTheta (n^2)\) distinct factors that are abelian squares. We study infinite words such that the number of abelian square factors of length n grows quadratically with n.

Quadratic growthComputer Science (all)ConcatenationComputer Science::Computation and Language (Computational Linguistics and Natural Language and Speech Processing)Computer Science (all); Theoretical Computer ScienceSquare (algebra)Theoretical Computer ScienceCombinatoricsAnagramsIrrational numberGolden ratioAbelian groupComputer Science::Formal Languages and Automata TheoryWord (group theory)Mathematics
researchProduct

Vertical representation of C∞-words

2015

We present a new framework for dealing with C ∞ -words, based on their left and right frontiers. This allows us to give a compact representation of them, and to describe the set of C ∞ -words through an infinite directed acyclic graph G. This graph is defined by a map acting on the frontiers of C ∞ -words. We show that this map can be defined recursively and with no explicit reference to C ∞ -words. We then show that some important conjectures on C ∞ -words follow from analogous statements on the structure of the graph G.

Left and rightDiscrete mathematicsGeneral Computer ScienceComputer Science::Computation and Language (Computational Linguistics and Natural Language and Speech Processing)16. Peace & justiceDirected acyclic graphTheoretical Computer ScienceCombinatoricsDirected setRecursive functionsGraph (abstract data type)Null graphComputer Science::Formal Languages and Automata TheoryMathematicsofComputing_DISCRETEMATHEMATICSMathematicsTheoretical Computer Science
researchProduct

On Combinatorial Generation of Prefix Normal Words

2014

A prefix normal word is a binary word with the property that no substring has more 1s than the prefix of the same length. This class of words is important in the context of binary jumbled pattern matching. In this paper we present an efficient algorithm for exhaustively listing the prefix normal words with a fixed length. The algorithm is based on the fact that the language of prefix normal words is a bubble language, a class of binary languages with the property that, for any word w in the language, exchanging the first occurrence of 01 by 10 in w results in another word in the language. We prove that each prefix normal word is produced in O(n) amortized time, and conjecture, based on expe…

FOS: Computer and information sciencesDiscrete Mathematics (cs.DM)Computer Science - Data Structures and AlgorithmsData Structures and Algorithms (cs.DS)Data_CODINGANDINFORMATIONTHEORYComputer Science - Discrete Mathematics
researchProduct

Fragment assembly through minimal forbidden words

2004

researchProduct

Preface

2017

Settore ING-INF/05 - Sistemi Di Elaborazione Delle InformazioniSettore INF/01 - InformaticaComputer Science (all)String Processinginformation retrievalTheoretical Computer Science
researchProduct

On The Maximum Number of Abelian Squares in a Word

2014

Strings (aka sequences or words) form the most basic and natural data structure. They occur whenever information is electronically transmitted (as bit streams), when natural language text is spoken or written down (as words over, for example, the Latin alphabet), in the process of heredity transmission in living cells (through DNA sequences) or the protein synthesis (assequence of amino acids), and in many more different contexts

Settore INF/01 - InformaticaCombinatorics on Words abelian squaree
researchProduct

On The Least Number of Palindromes in an Infinite Word

2012

PalindromesCombinatorics on word
researchProduct

Substring Complexity in Sublinear Space

2020

Shannon's entropy is a definitive lower bound for statistical compression. Unfortunately, no such clear measure exists for the compressibility of repetitive strings. Thus, ad-hoc measures are employed to estimate the repetitiveness of strings, e.g., the size $z$ of the Lempel-Ziv parse or the number $r$ of equal-letter runs of the Burrows-Wheeler transform. A more recent one is the size $\gamma$ of a smallest string attractor. Unfortunately, Kempa and Prezza [STOC 2018] showed that computing $\gamma$ is NP-hard. Kociumaka et al. [LATIN 2020] considered a new measure that is based on the function $S_T$ counting the cardinalities of the sets of substrings of each length of $T$, also known as …

FOS: Computer and information sciencesComputer Science - Data Structures and AlgorithmsData Structures and Algorithms (cs.DS)
researchProduct

Normal, Abby Normal, Prefix Normal

2014

A prefix normal word is a binary word with the property that no substring has more 1s than the prefix of the same length. This class of words is important in the context of binary jumbled pattern matching. In this paper we present results about the number $pnw(n)$ of prefix normal words of length $n$, showing that $pnw(n) =\Omega\left(2^{n - c\sqrt{n\ln n}}\right)$ for some $c$ and $pnw(n) = O \left(\frac{2^n (\ln n)^2}{n}\right)$. We introduce efficient algorithms for testing the prefix normal property and a "mechanical algorithm" for computing prefix normal forms. We also include games which can be played with prefix normal words. In these games Alice wishes to stay normal but Bob wants t…

FOS: Computer and information sciencesDiscrete Mathematics (cs.DM)Formal Languages and Automata Theory (cs.FL)Computer Science - Data Structures and AlgorithmsFOS: MathematicsMathematics - CombinatoricsData Structures and Algorithms (cs.DS)Computer Science - Formal Languages and Automata TheoryCombinatorics (math.CO)Data_CODINGANDINFORMATIONTHEORYComputer Science - Discrete Mathematics
researchProduct

Vertical Representation of C∞-words

2015

International audience; We present a new framework for dealing with C∞-words, based on their left and right frontiers. Thisallows us to give a compact representation of them, and to describe the set of C∞-words throughan infinite directed acyclic graph G. This graph is defined by a map acting on the frontiers ofC∞-words. We show that this map can be defined recursively and with no explicit reference toC∞-words. We then show that some important conjectures on C∞-words follow from analogousstatements on the structure of the graph G.

Kolakoski wordC∞-wordsComputer Science (all)[MATH.MATH-CO]Mathematics [math]/Combinatorics [math.CO]directed acyclic graphComputer Science::Computation and Language (Computational Linguistics and Natural Language and Speech Processing)directed setrecursive function[INFO.INFO-DM]Computer Science [cs]/Discrete Mathematics [cs.DM]Computer Science::Formal Languages and Automata Theory[INFO.INFO-CL]Computer Science [cs]/Computation and Language [cs.CL]C∞-wordTheoretical Computer Science
researchProduct

Splicing Systems from Past to Future: Old and New Challenges

2014

A splicing system is a formal model of a recombinant behaviour of sets of double stranded DNA molecules when acted on by restriction enzymes and ligase. In this survey we will concentrate on a specific behaviour of a type of splicing systems, introduced by P\u{a}un and subsequently developed by many researchers in both linear and circular case of splicing definition. In particular, we will present recent results on this topic and how they stimulate new challenging investigations.

FOS: Computer and information sciencesDiscrete Mathematics (cs.DM)[INFO.INFO-FL]Computer Science [cs]/Formal Languages and Automata Theory [cs.FL]Formal Languages and Automata Theory (cs.FL)Splicing Systems Formal Languages.ACM: F.: Theory of Computation/F.4: MATHEMATICAL LOGIC AND FORMAL LANGUAGES/F.4.3: Formal LanguagesACM: F.: Theory of Computation/F.4: MATHEMATICAL LOGIC AND FORMAL LANGUAGES/F.4.2: Grammars and Other Rewriting SystemsComputer Science - Formal Languages and Automata TheorySplicing Systems Formal languages Regular languages DNA computingComputingMilieux_MISCELLANEOUS[INFO.INFO-FL] Computer Science [cs]/Formal Languages and Automata Theory [cs.FL]Computer Science - Discrete Mathematics
researchProduct