0000000001192844

AUTHOR

Antonio Restivo

showing 127 related works from this author

Text Compression Using Antidictionaries

1999

International audience; We give a new text compression scheme based on Forbidden Words ("antidictionary"). We prove that our algorithms attain the entropy for balanced binary sources. They run in linear time. Moreover, one of the main advantages of this approach is that it produces very fast decompressors. A second advantage is a synchronization property that is helpful to search compressed data and allows parallel compression. Our algorithms can also be presented as "compilers" that create compressors dedicated to any previously fixed source. The techniques used in this paper are from Information Theory and Finite Automata.

Theoretical computer scienceFinite-state machineComputer science[INFO.INFO-DS]Computer Science [cs]/Data Structures and Algorithms [cs.DS]010102 general mathematicsforbidden wordData_CODINGANDINFORMATIONTHEORY0102 computer and information sciencesInformation theory01 natural sciencesfinite automatonParallel compressionpattern matching010201 computation theory & mathematicsEntropy (information theory)Pattern matching0101 mathematicsTime complexityAlgorithmdata compressioninformation theoryData compression
researchProduct

Burrows–Wheeler transform and Sturmian words

2003

Burrows–Wheeler transformSignal ProcessingFormal languageSturmian wordArithmeticWord (computer architecture)Computer Science ApplicationsInformation SystemsTheoretical Computer ScienceMathematicsInformation Processing Letters
researchProduct

Automata and forbidden words

1998

Abstract Let L ( M ) be the (factorial) language avoiding a given anti-factorial language M . We design an automaton accepting L ( M ) and built from the language M . The construction is effective if M is finite. If M is the set of minimal forbidden words of a single word ν, the automaton turns out to be the factor automaton of ν (the minimal automaton accepting the set of factors of ν). We also give an algorithm that builds the trie of M from the factor automaton of a single word. It yields a nontrivial upper bound on the number of minimal forbidden words of a word.

TheoryofComputation_COMPUTATIONBYABSTRACTDEVICES[INFO.INFO-DS]Computer Science [cs]/Data Structures and Algorithms [cs.DS]Büchi automaton0102 computer and information sciences02 engineering and technologyω-automaton01 natural sciencesTheoretical Computer ScienceCombinatoricsDeterministic automaton0202 electrical engineering electronic engineering information engineeringTwo-way deterministic finite automatonNondeterministic finite automatonMathematicsPowerset constructionLevenshtein automaton020206 networking & telecommunicationsComputer Science::Computation and Language (Computational Linguistics and Natural Language and Speech Processing)Nonlinear Sciences::Cellular Automata and Lattice GasesComputer Science ApplicationsTheoryofComputation_MATHEMATICALLOGICANDFORMALLANGUAGES010201 computation theory & mathematicsSignal ProcessingProbabilistic automatonComputer Science::Programming LanguagesComputer Science::Formal Languages and Automata TheoryInformation Systems
researchProduct

Reducing Local Alphabet Size in Recognizable Picture Languages

2021

A recognizable picture language is defined as the projection of a local picture language defined by a set of two-by-two tiles, i.e. by a strictly-locally-testable (SLT) language of order 2. The family of recognizable picture languages is also defined, using larger k by k tiles, \(k>2\), by the projection of the corresponding SLT language. A basic measure of the descriptive complexity of a picture language is given by the size of the SLT alphabet using two-by-two tiles, more precisely by the so-called alphabetic ratio of sizes: SLT-alphabet/picture-alphabet. We study how the alphabetic ratio changes moving from two to larger tile sizes, and we obtain the following result: any recognizable pi…

Set (abstract data type)Discrete mathematicsProjection (mathematics)Property (programming)Order (ring theory)AlphabetDescriptive complexity theoryPicture languageMeasure (mathematics)Mathematics
researchProduct

Monadic second-order logic over pictures and recognizability by tiling systems

1994

We show that a set of pictures (rectangular arrays of symbols) is recognized by a finite tiling system if and only if it is definable in existential monadic second-order logic. As a consequence, finite tiling systems constitute a notion of recognizability over two-dimensional inputs which at the same time generalizes finite-state recognizability over strings and matches a natural logic. The proof is based on the Ehrenfeucht-FraIsse technique for first-order logic and an implementation of “threshold counting” within tiling systems.

Predicate logicDiscrete mathematicsTheoryofComputation_MATHEMATICALLOGICANDFORMALLANGUAGESComputer Science::Logic in Computer ScienceSubstructural logicSecond-order logicMultimodal logicDynamic logic (modal logic)Intermediate logicHigher-order logicComputer Science::Formal Languages and Automata TheoryMonadic predicate calculusMathematics
researchProduct

Forbidden Factors and Fragment Assembly

2001

In this paper methods and results related to the notion of minimal forbidden words are applied to the fragment assembly problem. The fragment assembly problem can be formulated, in its simplest form, as follows: reconstruct a word w from a given set I of substrings (fragments ) of a word w . We introduce an hypothesis involving the set of fragments I and the maximal length m(w) of the minimal forbidden factors of w . Such hypothesis allows us to reconstruct uniquely the word w from the set I in linear time. We prove also that, if w is a word randomly generated by a memoryless source with identical symbol probabilities, m(w) is logarithmic with respect to the size of w . This result shows th…

CombinatoricsSet (abstract data type)Fragment (logic)LogarithmDeterministic automatonSymbol (programming)General MathematicsTime complexitySoftwareWord (computer architecture)SubstringComputer Science ApplicationsMathematicsRAIRO - Theoretical Informatics and Applications
researchProduct

Languages with mismatches

2007

AbstractIn this paper we study some combinatorial properties of a class of languages that represent sets of words occurring in a text S up to some errors. More precisely, we consider sets of words that occur in a text S with k mismatches in any window of size r. The study of this class of languages mainly focuses both on a parameter, called repetition index, and on the set of the minimal forbidden words of the language of factors of S with errors. The repetition index of a string S is defined as the smallest integer such that all strings of this length occur at most in a unique position of the text S up to errors. We prove that there is a strong relation between the repetition index of S an…

Combinatorics on wordsApproximate string matchingGeneral Computer ScienceRepetition (rhetorical device)String (computer science)Search engine indexingComputer Science::Computation and Language (Computational Linguistics and Natural Language and Speech Processing)Approximate string matchingData structureTheoretical Computer ScienceCombinatoricsSet (abstract data type)Formal languagesCombinatorics on words Formal languages Approximate string matching IndexingIndexingWord (group theory)MathematicsInteger (computer science)Computer Science(all)Theoretical Computer Science
researchProduct

An automata-theoretic approach to the study of the intersection of two submonoids of a free monoid

2008

We investigate the intersection of two finitely generated submonoids of the free monoid on a finite alphabet. To this purpose, we consider automata that recognize such submonoids and we study the product automata recognizing their intersection. By using automata methods we obtain a new proof of a result of Karhumaki on the cha- racterization of the intersection of two submonoids of rank two, in the case of prefix (or suffix) generators. In a more general setting, for an arbitrary number of generators, we prove that if H and K are two finitely generated submonoids generated by prefix sets such that the product automaton associated to H ∩ K has a given special property then �(H ∩ K) ≤ �(H)�(K…

Discrete mathematicsGenerator (category theory)General MathematicsCharacterization (mathematics)Computer Science ApplicationsCombinatoricsPrefixMathematics Subject ClassificationIntersectionFree monoidProduct (mathematics)Rank (graph theory)Computer Science::Formal Languages and Automata TheorySoftwareAutomata Theory Free MonoidsMathematics
researchProduct

A New Combinatorial Approach to Sequence Comparison

2008

In this paper we introduce a new alignment-free method for comparing sequences which is combinatorial by nature and does not use any compressor nor any information-theoretic notion. Such a method is based on an extension of the Burrows-Wheeler Transform, a transformation widely used in the context of Data Compression. The new extended transformation takes as input a multiset of sequences and produces as output a string obtained by a suitable rearrangement of the characters of all the input sequences. By using such a transformation we give a general method for comparing sequences that takes into account how much the characters coming from the different input sequences are mixed in the output…

MultisetTheoretical computer scienceBurrows–Wheeler transformSettore INF/01 - InformaticaComputer scienceBurrows-Wheeler transform; Sequence comparisonString (computer science)Context (language use)Extension (predicate logic)ComparisonInformation theoryGenomeBurrows-Wheeler transform; ComparisonTheoretical Computer ScienceTransformation (function)CategorizationComputational Theory and MathematicsPhylogeneticsSequence comparisonTheory of computationBurrows-Wheeler TransformSequence ComparisonAlgorithmMathematicsData compression
researchProduct

The Shuffle Product: New Research Directions

2015

In this paper we survey some recent researches concerning the shuffle operation that arise both in Formal Languages and in Combinatorics on Words.

Star-free languageComputer scienceProgramming languageComputer Science (all)Computer Science::Computation and Language (Computational Linguistics and Natural Language and Speech Processing)computer.software_genreIntermixed languageTheoretical Computer ScienceCombinatorics on wordsTheoryofComputation_ANALYSISOFALGORITHMSANDPROBLEMCOMPLEXITYProduct (mathematics)Formal languageShuffle squarecomputerShuffle
researchProduct

Dictionary-symbolwise flexible parsing

2012

AbstractLinear-time optimal parsing algorithms are rare in the dictionary-based branch of the data compression theory. A recent result is the Flexible Parsing algorithm of Matias and Sahinalp (1999) that works when the dictionary is prefix closed and the encoding of dictionary pointers has a constant cost. We present the Dictionary-Symbolwise Flexible Parsing algorithm that is optimal for prefix-closed dictionaries and any symbolwise compressor under some natural hypothesis. In the case of LZ78-like algorithms with variable costs and any, linear as usual, symbolwise compressor we show how to implement our parsing algorithm in linear time. In the case of LZ77-like dictionaries and any symbol…

Theoretical computer scienceComputer science[INFO.INFO-DS]Computer Science [cs]/Data Structures and Algorithms [cs.DS][INFO.INFO-DS] Computer Science [cs]/Data Structures and Algorithms [cs.DS]Data_CODINGANDINFORMATIONTHEORY0102 computer and information sciences02 engineering and technologycomputer.software_genre01 natural sciencesDirected acyclic graphTheoretical Computer ScienceConstant (computer programming)020204 information systemsEncoding (memory)Optimal parsing0202 electrical engineering electronic engineering information engineeringDiscrete Mathematics and CombinatoricsStringologySymbolwise text compressionTime complexityLossless compressionParsingSettore INF/01 - InformaticaDictionary-based compressionOptimal Parsing Lossless Data Compression DAGDirected acyclic graphPrefixComputational Theory and MathematicsText compression010201 computation theory & mathematicsAlgorithmcomputerBottom-up parsingData compressionJournal of Discrete Algorithms
researchProduct

Enumeration of L-convex polyominoes by rows and columns

2005

In this paper, we consider the class of L-convex polyominoes, i.e. the convex polyominoes in which any two cells can be connected by a path of cells in the polyomino that switches direction between the vertical and the horizontal at most once.Using the ECO method, we prove that the number fn of L-convex polyominoes with perimeter 2(n + 2) satisfies the rational recurrence relation fn = 4fn-1 - 2fn-2, with f0 = 1, f1 = 2, f2 = 7. Moreover, we give a combinatorial interpretation of this statement. In the last section, we present some open problems.

Discrete mathematicsRecurrence relationECO methodGeneral Computer SciencePolyominoGenerating functionRegular polygonRow and column spacesTheoretical Computer ScienceInterpretation (model theory)Generating functionsCombinatoricsSection (fiber bundle)Path (graph theory)Convex polyominoesComputer Science(all)MathematicsTheoretical Computer Science
researchProduct

A reconstruction algorithm for L-convex polyominoes

2006

AbstractWe give an algorithm that uniquely reconstruct an L-convex polyomino from the size of some special paths, called bordered L-paths.

CombinatoricsConvexityMathematics::CombinatoricsGeneral Computer SciencePolyominoPolyominoesRegular polygonReconstruction algorithmReconstructionComputer Science(all)Theoretical Computer ScienceMathematicsTheoretical Computer Science
researchProduct

Suffixes, Conjugates and Lyndon Words

2013

In this paper we are interested in the study of the combinatorial aspects connecting three important constructions in the field of string algorithms: the suffix array, the Burrows-Wheeler transform (BWT) and the extended Burrows-Wheeler transform (EBWT). Such constructions involve the notions of suffixes and conjugates of words and are based on two different order relations, denoted by $\plex$ and $\pom$, that, even if strictly connected, are quite different from the computational point of view. In this study an important role is played by Lyndon words. In particular, we improve the upper bound on the number of symbol comparisons needed to establish the $\pom$ order between two primitive wo…

MultisetReduction (recursion theory)BWT; Lyndon factorization; Suffix ArrayString (computer science)Suffix arrayLyndon words Lyndon factorization BWT Suffix array EBWT Circular words ConjugacyLexicographical orderlaw.inventionSuffix ArrayCombinatoricsBWTLyndon factorizationlawOrder (group theory)Symbol (formal)Word (group theory)Mathematics
researchProduct

A note on renewal systems

1992

Abstract A renewal system is a symbolic dynamical system generated by free concatenations of a finite set of words. In this paper we prove that, given two systems which are both renewal and Markov systems, it is decidable whether they are topologically conjugate. The proof makes use of the methods and the techniques of formal language theory.

Discrete mathematicsAlgebraGeneral Computer ScienceFormal languageMarkov systemsDynamical system (definition)Topological conjugacyFinite setComputer Science::Formal Languages and Automata TheoryDecidabilityMathematicsTheoretical Computer ScienceComputer Science(all)Theoretical Computer Science
researchProduct

A NEW COMPLEXITY FUNCTION FOR WORDS BASED ON PERIODICITY

2013

Motivated by the extension of the critical factorization theorem to infinite words, we study the (local) periodicity function, i.e. the function that, for any position in a word, gives the size of the shortest square centered in that position. We prove that this function characterizes any binary word up to exchange of letters. We then introduce a new complexity function for words (the periodicity complexity) that, for any position in the word, gives the average value of the periodicity function up to that position. The new complexity function is independent from the other commonly used complexity measures as, for instance, the factor complexity. Indeed, whereas any infinite word with bound…

Average-case complexityDiscrete mathematicsFibonacci numberSettore INF/01 - InformaticaGeneral Mathematicscomplexity functionComputer Science::Computation and Language (Computational Linguistics and Natural Language and Speech Processing)Function (mathematics)periodicitycritical factorization theoremCombinatoricsComplexity indexCombinatorics on wordsBounded functionComplexity functionComputer Science::Formal Languages and Automata TheoryWord (computer architecture)Combinatorics on wordMathematicsInternational Journal of Algebra and Computation
researchProduct

From First Principles to the Burrows and Wheeler Transform and Beyond, via Combinatorial Optimization

2007

AbstractWe introduce a combinatorial optimization framework that naturally induces a class of optimal word permutations with respect to a suitably defined cost function taking into account various measures of relatedness between words. The Burrows and Wheeler transform (bwt) (cf. [M. Burrows, D. Wheeler, A block sorting lossless data compression algorithm, Technical Report 124, Digital Equipment Corporation, 1994]), and its analog for labelled trees (cf. [P. Ferragina, F. Luccio, G. Manzini, S. Muthukrishnan, Structuring labeled trees for optimal succinctness, and beyond, in: Proc. of the 45th Annual IEEE Symposium on Foundations of Computer Science, 2005, pp. 198–207]), are special cases i…

Lossless compressionBoosting (machine learning)General Computer ScienceComputer scienceComputationData_CODINGANDINFORMATIONTHEORYLyndon wordOptimal word permutationTheoretical Computer ScienceCombinatoricsPermutationSuffix treeCombinatorial optimizationBurrows–Wheeler transformTime complexityComputer Science(all)
researchProduct

Varieties of Codes and Kraft Inequality

2007

Decipherability conditions for codes are investigated by using the approach of Guzman, who introduced in [7] the notion of variety of codes and established a connection between classes of codes and varieties of monoids. The class of Uniquely Decipherable (UD) codes is a special case of variety of codes, corresponding to the variety of all monoids. It is well known that the Kraft inequality is a necessary condition for UD codes, but it is not sufficient, in the sense that there exist codes that are not UD and that satisfy the Kraft inequality. The main result of the present paper states that, given a variety V of codes, if all the elements of V satisfy the Kraft inequality, then V is the var…

Discrete mathematicsClass (set theory)Computational Theory and MathematicsTheory of computationHigh Energy Physics::ExperimentAstrophysics::Cosmology and Extragalactic AstrophysicsKraft's inequalityVariety (universal algebra)Special caseConnection (algebraic framework)Mathematics::Representation TheoryTheoretical Computer ScienceMathematicsTheory of Computing Systems
researchProduct

The expressive power of the shuffle product

2010

International audience; There is an increasing interest in the shuffle product on formal languages, mainly because it is a standard tool for modeling process algebras. It still remains a mysterious operation on regular languages.Antonio Restivo proposed as a challenge to characterize the smallest class of languages containing the singletons and closed under Boolean operations, product and shuffle. This problem is still widely open, but we present some partial results on it. We also study some other smaller classes, including the smallest class containing the languages composed of a single word of length 2 which is closed under Boolean operations and shuffle by a letter (resp. shuffle by a l…

Class (set theory)Computer science[INFO.INFO-DS]Computer Science [cs]/Data Structures and Algorithms [cs.DS]0102 computer and information sciences02 engineering and technologyStar (graph theory)01 natural sciencesExpressive powerTheoretical Computer ScienceRegular languageFormal language0202 electrical engineering electronic engineering information engineeringArithmeticAlgebraic numberComputingMilieux_MISCELLANEOUSDiscrete mathematicsComputer Science Applicationsshuffle operatorComputational Theory and Mathematics010201 computation theory & mathematicsProduct (mathematics)Formal language020201 artificial intelligence & image processingBoolean operations in computer-aided designWord (computer architecture)Information Systems
researchProduct

Minimal forbidden words and factor automata

1998

International audience; Let L(M) be the (factorial) language avoiding a given antifactorial language M. We design an automaton accepting L(M) and built from the language M. The construction is eff ective if M is finite. If M is the set of minimal forbidden words of a single word v, the automaton turns out to be the factor automaton of v (the minimal automaton accepting the set of factors of v). We also give an algorithm that builds the trie of M from the factor automaton of a single word. It yields a non-trivial upper bound on the number of minimal forbidden words of a word.

TheoryofComputation_COMPUTATIONBYABSTRACTDEVICESfailure functionfactor code[INFO.INFO-DS]Computer Science [cs]/Data Structures and Algorithms [cs.DS]Büchi automatonComputerApplications_COMPUTERSINOTHERSYSTEMS[INFO.INFO-DS] Computer Science [cs]/Data Structures and Algorithms [cs.DS]0102 computer and information sciencesavoiding a wordω-automaton01 natural sciencesfactorial languageReversible cellular automatonCombinatoricsDeterministic automatonanti-factorial languageNondeterministic finite automaton0101 mathematicsMathematicsfactor automatonPowerset constructionLevenshtein automaton010102 general mathematicsforbidden wordComputer Science::Computation and Language (Computational Linguistics and Natural Language and Speech Processing)16. Peace & justiceNonlinear Sciences::Cellular Automata and Lattice GasesTheoryofComputation_MATHEMATICALLOGICANDFORMALLANGUAGES010201 computation theory & mathematicsProbabilistic automatonPhysics::Accelerator PhysicsComputer Science::Programming LanguagesHigh Energy Physics::ExperimentComputer Science::Formal Languages and Automata Theory
researchProduct

"Indexing structures for approximate string matching

2003

In this paper we give the first, to our knowledge, structures and corresponding algorithms for approximate indexing, by considering the Hamming distance, having the following properties. i) Their size is linear times a polylog of the size of the text on average. ii) For each pattern x, the time spent by our algorithms for finding the list occ(x) of all occurrences of a pattern x in the text, up to a certain distance, is proportional on average to |x| + |occ(x)|, under an additional but realistic hypothesis.

CombinatoricsCombinatorics on wordsPattern recognition (psychology)Search engine indexingAutomata theoryHamming distanceString searching algorithmApproximate string matchingTime complexityMathematics
researchProduct

On the Shuffle of Star-Free Languages

2012

Motivated by the general problem to characterize families of languages closed under shuffle, we investigate some conditions under which the shuffle of two star-free languages is star-free. Some of the special cases here approached give rise to new problems in combinatorics on words.

Discrete mathematicsAlgebra and Number TheorySettore INF/01 - Informaticapure submonoidGeneral problemAbstract family of languagesRegular languageComputer Science::Computation and Language (Computational Linguistics and Natural Language and Speech Processing)Star (graph theory)star-free languageCone (formal languages)shuffle of languagePumping lemma for regular languagesTheoretical Computer ScienceCombinatorics on wordsComputational Theory and MathematicsRegular languagecombinatorics on words.Information SystemsMathematicsFundamenta Informaticae
researchProduct

Block Sorting-Based Transformations on Words: Beyond the Magic BWT

2018

The Burrows-Wheeler Transform (BWT) is a word transformation introduced in 1994 for Data Compression and later results have contributed to make it a fundamental tool for the design of self-indexing compressed data structures. The Alternating Burrows-Wheeler Transform (ABWT) is a more recent transformation, studied in the context of Combinatorics on Words, that works in a similar way, using an alternating lexicographical order instead of the usual one. In this paper we study a more general class of block sorting-based transformations. The transformations in this new class prove to be interesting combinatorial tools that offer new research perspectives. In particular, we show that all the tra…

0301 basic medicineSettore INF/01 - InformaticaComputer scienceData_CODINGANDINFORMATIONTHEORY0102 computer and information sciencesBlock sortingData structureLexicographical order01 natural sciencesUpper and lower bounds03 medical and health sciencesCombinatorics on words030104 developmental biology010201 computation theory & mathematicsArithmeticCompressed Data Structures Block Sorting Combinatorics on Words AlgorithmsData compression
researchProduct

Nondeterministic Moore automata and Brzozowski's minimization algorithm

2012

AbstractMoore automata represent a model that has many applications. In this paper we define a notion of coherent nondeterministic Moore automaton (NMA) and show that such a model has the same computational power of the classical deterministic Moore automaton. We consider also the problem of constructing the minimal deterministic Moore automaton equivalent to a given NMA. We propose an algorithm that is a variant of Brzozowski’s minimization algorithm in the sense that it is essentially structured as reverse operation and subset construction performed twice. Moreover, we explore more general classes of NMA and analyze the applicability of the algorithm. For some of such classes the algorith…

Discrete mathematicsTheoryofComputation_COMPUTATIONBYABSTRACTDEVICESGeneral Computer ScienceBrzozowski’s minimization algorithmSettore INF/01 - InformaticaPowerset constructionAutomata minimizationBüchi automatonNonlinear Sciences::Cellular Automata and Lattice GasesTheoretical Computer ScienceNondeterministic algorithmDeterministic finite automatonTheoryofComputation_MATHEMATICALLOGICANDFORMALLANGUAGESDFA minimizationDeterministic automatonTwo-way deterministic finite automatonNondeterministic finite automatonBrzozowski's minimization algorithmComputer Science::Formal Languages and Automata TheoryComputer Science(all)MathematicsNondeterministic Moore automata
researchProduct

Balancing and clustering of words in the Burrows–Wheeler transform

2011

AbstractCompression algorithms based on Burrows–Wheeler transform (BWT) take advantage of the fact that the word output of BWT shows a local similarity and then turns out to be highly compressible. The aim of the present paper is to study such “clustering effect” by using notions and methods from Combinatorics on Words.The notion of balance of a word plays a central role in our investigation. Empirical observations suggest that balance is actually the combinatorial property of input word that ensure optimal BWT compression. Moreover, it is reasonable to assume that the more balanced the input word is, the more local similarity we have after BWT (and therefore the better the compression is).…

Discrete mathematicsGeneral Computer ScienceBurrows–Wheeler transformCombinatorics on wordsPalindromeComputer Science::Computation and Language (Computational Linguistics and Natural Language and Speech Processing)Binary alphabetTheoretical Computer ScienceCombinatorics on wordsData compressionEntropy (information theory)Combinatorics on words; Burrows–Wheeler transform; Data compressionArithmeticCluster analysisEmpirical evidenceBurrows–Wheeler transformComputer Science::Formal Languages and Automata TheoryMathematicsData compressionComputer Science(all)
researchProduct

Distance measures for biological sequences: Some recent approaches

2008

AbstractSequence comparison has become a very essential tool in modern molecular biology. In fact, in biomolecular sequences high similarity usually implies significant functional or structural similarity. Traditional approaches use techniques that are based on sequence alignment able to measure character level differences. However, the recent developments of whole genome sequencing technology give rise to need of similarity measures able to capture the rearrangements involving large segments contained in the sequences. This paper is devoted to illustrate different methods recently introduced for the alignment-free comparison of biological sequences. Goal of the paper is both to highlight t…

Whole genome sequencingComputer sciencebusiness.industryApplied MathematicsSequence alignmentMachine learningcomputer.software_genreBioinformaticsMeasure (mathematics)GenomeDistance measuresSimilitudeTheoretical Computer ScienceArtificial IntelligenceSimilarity (psychology)Metric (mathematics)Artificial intelligencebusinesscomputerSoftwareInternational Journal of Approximate Reasoning
researchProduct

On extremal cases of Hopcroft’s algorithm

2010

AbstractIn this paper we consider the problem of minimization of deterministic finite automata (DFA) with reference to Hopcroft’s algorithm. Hopcroft’s algorithm has several degrees of freedom, so there can exist different executions that can lead to different sequences of refinements of the set of the states up to the final partition. We find an infinite family of binary automata for which such a process is unique, whatever strategy is chosen. Some recent papers (cf. Berstel and Carton (2004) [3], Castiglione et al. (2008) [6] and Berstel et al. (2009) [1]) have been devoted to find families of automata for which Hopcroft’s algorithm has its worst execution time. They are unary automata as…

Discrete mathematicsFinite-state machineGeneral Computer ScienceUnary operationWord treesStandard treesAutomatonTheoretical Computer ScienceCombinatoricsDeterministic finite automatonDFA minimizationDeterministic automatonHopcroft’s minimization algorithmTree automatonDeterministic finite state automataTime complexityAlgorithmComputer Science::Formal Languages and Automata TheoryMathematicsComputer Science(all)Theoretical Computer Science
researchProduct

Reconstruction of L-convex Polyominoes.

2003

Abstract We introduce the family of L-convex polyominoes, a subset of convex polyominoes whose elements satisfy a special convexity property. We develop an algorithm that reconstructs an L-convex polyomino from the set of its maximal L-polyominoes.

Discrete mathematicsMathematics::CombinatoricsProperty (philosophy)PolyominoApplied MathematicsRegular polygonPolyominoesComputer Science::Computational GeometryConvexityCombinatoricsSet (abstract data type)Computer Science::Discrete MathematicsDiscrete Mathematics and CombinatoricsComputer Science::Formal Languages and Automata TheoryMathematics
researchProduct

Combinatorial aspects of L-convex polyominoes

2007

We consider the class of L-convex polyominoes, i.e. those polyominoes in which any two cells can be connected with an ''L'' shaped path in one of its four cyclic orientations. The paper proves bijectively that the number f"n of L-convex polyominoes with perimeter 2(n+2) satisfies the linear recurrence relation f"n"+"2=4f"n"+"1-2f"n, by first establishing a recurrence of the same form for the cardinality of the ''2-compositions'' of a natural number n, a simple generalization of the ordinary compositions of n. Then, such 2-compositions are studied and bijectively related to certain words of a regular language over four letters which is in turn bijectively related to L-convex polyominoes. In …

Discrete mathematicsClass (set theory)Mathematics::CombinatoricsPolyominoEnumerationOpen problemGenerating functionRegular polygonPolyominoesNatural numberComputer Science::Computational GeometryFormal SeriesCombinatoricsCardinalityRegular languageDiscrete Mathematics and CombinatoricsTomographyAlgorithmsbinary tomographyMathematicsEnumeration; Formal Series; PolyominoesEuropean Journal of Combinatorics
researchProduct

Cancellation, pumping and permutation in formal languages

1984

Formal grammarTheoretical computer scienceChomsky hierarchyFormal languageContext-free languageAbstract family of languagesPumping lemma for context-free languagesArithmeticCone (formal languages)Pumping lemma for regular languagesMathematics
researchProduct

A graph theoretic approach to automata minimality

2012

AbstractThe paper presents a graph-theoretic approach to test the minimality of a deterministic automaton. In particular, we focus on problems concerning the dependence of the minimality of an automaton on the choice of the set F of final states or on the cardinality of the set F. We introduce different minimality conditions of an automaton and show that such conditions can be characterized in graph-theoretic terms.

Discrete mathematicsTheoryofComputation_COMPUTATIONBYABSTRACTDEVICESSettore INF/01 - InformaticaGeneral Computer Sciencegraph theoryContinuous automatonTimed automatonPushdown automatonBüchi automatonautomata minimalityNonlinear Sciences::Cellular Automata and Lattice GasesTheoretical Computer ScienceAutomatonCombinatoricsCardinalityDeterministic automatonTwo-way deterministic finite automatonComputer Science::Formal Languages and Automata TheoryMathematicsTheoretical Computer Science
researchProduct

A Periodicity Theorem on Words and Applications

1995

We prove a periodicity theorem on words that has strong analogies with the Critical Factorization theorem and we show three applications of it.

Discrete mathematicssymbols.namesakeWeierstrass factorization theoremsymbolsBinary alphabetMathematics
researchProduct

A Tomographical Characterization of L-convex Polyominoes

2005

Our main purpose is to characterize the class of L-convex polyominoes introduced in [3] by means of their horizontal and vertical projections. The achieved results allow an answer to one of the most relevant questions in tomography i.e. the uniqueness of discrete sets, with respect to their horizontal and vertical projections. In this paper, by giving a characterization of L-convex polyominoes, we investigate the connection between uniqueness property and unimodality of vectors of horizontal and vertical projections. In the last section we consider the continuum environment; we extend the definition of L-convex set, and we obtain some results analogous to those for the discrete case.

Pure mathematicsInteger VectorHorizontal and verticalPolyominoDiscrete TomographyConvex setDiscrete geometryUnimodalityConnection (mathematics)Vertical ProjectionContinuum CounterpartMonotone PathUniquenessDiscrete tomographyMathematics
researchProduct

Unambiguous recognizable two-dimensional languages

2006

We consider the family UREC of unambiguous recognizable two-dimensional languages. We prove that there are recognizable languages that are inherently ambiguous, that is UREC family is a proper subclass of REC family. The result is obtained by showing a necessary condition for unambiguous recognizable languages. Further UREC family coincides with the class of picture languages defined by unambiguous 2OTA and it strictly contains its deterministic counterpart. Some closure and non-closure properties of UREC are presented. Finally we show that it is undecidable whether a given tiling system is unambiguous.

DeterminismSettore INF/01 - InformaticaDeterministic context-free languageGeneral MathematicsTwo-dimensional languagesAutomata and formal languages; Determinism; Two-dimensional languages; UnambiguityComputer Science::Computation and Language (Computational Linguistics and Natural Language and Speech Processing)Class (philosophy)Computer Science ApplicationsUndecidable problemAutomata and Formal Languages. ; Unambiguity ; Determinism. .; Two-dimensional languagesCombinatoricsClosure (mathematics)Computer Science::Programming LanguagesAutomata and formal languagesDeterminism.ArithmeticComputer Science::Formal Languages and Automata TheorySoftwareUnambiguityMathematics
researchProduct

Extremal minimality conditions on automata

2012

AbstractIn this paper we investigate the minimality problem of DFAs by varying the set of final states. In other words, we are interested on how the choice of the final states can affect the minimality of the automata. The state-pair graph is a useful tool to investigate such a problem. The choice of a set of final states for the automaton A defines a coloring of the closed components of the state-pair graph and the minimality of A corresponds to a property of these colored components. A particular attention is devoted to the analysis of some extremal cases such as, for example, the automata that are minimal for any choice of the subset of final states F from the state set Q of the automato…

Discrete mathematicsTheoryofComputation_COMPUTATIONBYABSTRACTDEVICESNested wordSettore INF/01 - InformaticaGeneral Computer Sciencestate-pair graph of automataminimality automataTimed automatonω-automatonNonlinear Sciences::Cellular Automata and Lattice GasesTheoretical Computer ScienceMobile automatonCombinatoricsTheoryofComputation_MATHEMATICALLOGICANDFORMALLANGUAGESDFA minimizationContinuous spatial automatonAutomata theoryQuantum finite automataComputer Science::Formal Languages and Automata TheoryComputer Science(all)MathematicsTheoretical Computer Science
researchProduct

Burrows-Wheeler transform and palindromic richness

2009

AbstractThe investigation of the extremal case of the Burrows–Wheeler transform leads to study the words w over an ordered alphabet A={a1,a2,…,ak}, with a1<a2<⋯<ak, such that bwt(w) is of the form aknkak−1nk−1⋯a2n2a1n1, for some non-negative integers n1,n2,…,nk. A characterization of these words in the case |A|=2 has been given in [Sabrina Mantaci, Antonio Restivo, Marinella Sciortino, Burrows-Wheeler transform and Sturmian words, Information Processing Letters 86 (2003) 241–246], where it is proved that they correspond to the powers of conjugates of standard words. The case |A|=3 has been settled in [Jamie Simpson, Simon J. Puglisi, Words with simple Burrows-Wheeler transforms, Electronic …

Combinatorics on wordsGeneral Computer ScienceBurrows–Wheeler transformSettore INF/01 - InformaticaRich wordsPalindromeBurrows-Wheeler transformTheoretical Computer ScienceCombinatoricsRich wordBurrows-Wheeler transform; Palindromes; Rich words; Combinatorics on wordsPalindromePalindromesSpecies richnessAlphabetArithmeticBurrows–Wheeler transformComputer Science(all)MathematicsCombinatorics on word
researchProduct

Permutation properties and the fibonacci semigroup

1989

CombinatoricsAlgebra and Number TheoryFibonacci numberSemigroupPartial permutationFibonacci polynomialsBicyclic semigroupGeneralized permutation matrixPisano periodCyclic permutationMathematicsSemigroup Forum
researchProduct

Overlap free words on two symbols

1985

CombinatoricsInternal factorMathematics
researchProduct

BALANCE PROPERTIES AND DISTRIBUTION OF SQUARES IN CIRCULAR WORDS

2010

We study balance properties of circular words over alphabets of size greater than two. We give some new characterizations of balanced words connected to the Kawasaki-Ising model and to the notion of derivative of a word. Moreover we consider two different generalizations of the notion of balance, and we find some relations between them. Some of our results can be generalized to non periodic infinite words as well.

combinatoria delle parole parole circolari parole bilanciateCombinatoricsCombinatorics on wordsSettore INF/01 - InformaticaComputer Science (miscellaneous)Computer Science::Computation and Language (Computational Linguistics and Natural Language and Speech Processing)Computer Science::Formal Languages and Automata TheoryMathematicsInternational Journal of Foundations of Computer Science
researchProduct

Hopcroft's algorithm and tree-like automata

2011

Minimizing a deterministic finite automata (DFA) is a very important problem in theory of automata and formal languages. Hopcroft's algorithm represents the fastest known solution to the such a problem. In this paper we analyze the behavior of this algorithm on a family binary automata, called tree-like automata, associated to binary labeled trees constructed by words. We prove that all the executions of the algorithm on tree-like automata associated to trees, constructed by standard words, have running time with the same asymptotic growth rate. In particular, we provide a lower and upper bound for the running time of the algorithm expressed in terms of combinatorial properties of the trees…

Discrete mathematicsNested wordSettore INF/01 - InformaticaGeneral MathematicsAutomata minimizationω-automatonHopcroft's algorithmComputer Science ApplicationsCombinatoricsDeterministic finite automatonDFA minimizationDeterministic automatonContinuous spatial automatonQuantum finite automataAutomata theoryword treesAlgorithmComputer Science::Formal Languages and Automata TheorySoftwareMathematics
researchProduct

Coding Partitions: Regularity, Maximality and Global Ambiguity

2007

The canonical coding partition of a set of words is the finest partition such that the words contained in at least two factorizations of a same sequence belong to a same class. In the case the set is not uniquely decipherable, it partitions the set into one unambiguous class and other parts that localize the ambiguities in the factorizations of finite sequences. We firstly prove that the canonical coding partition of a regular set contains a finite number of regular classes. We give an algorithm for computing this partition. We then investigate maximality conditions in a coding partition and we prove, in the regular case, the equivalence between two different notions of maximality. As an ap…

CombinatoricsDiscrete mathematicsFormal languagesinformation ratemedia_common.quotation_subjectPartition (number theory)AmbiguityPartition of a setFinite automataFinite setCoding (social sciences)media_commonMathematics
researchProduct

Some Investigations on Similarity Measures Based on Absent Words

2019

In this paper we investigate similarity measures based on minimal absent words, introduced by Chairungsee and Crochemore in [1]. They make use of a length-weighted index on a sample set corresponding to the symmetric difference M(x)ΔM(y) of the minimal absent words M(x) and M(y) of two sequences x and y, respectively. We first propose a variant of this measure by choosing as a sample set a proper subset (x, y) of M(x)ΔM(y), which appears to be more appropriate for distinguishing x and y. From the algebraic point of view, we prove that (x, y) is the base of the ideal generated by M(x)ΔM(y). We then remark that such measures are able to recognize whether the sequences x and y share a common s…

sequence comparisonAlgebra and Number TheorySettore INF/01 - Informaticabusiness.industryComputer sciencePattern recognitionsimilarity measuresMinimal absent wordsTheoretical Computer ScienceComputational Theory and MathematicsSimilarity (network science)Artificial intelligencebusinessInformation SystemsFundamenta Informaticae
researchProduct

TWO-DIMENSIONAL FINITE STATE RECOGNIZABILITY

1996

The purpose of this paper is to investigate about a new notion of finite state recognizability for two-dimensional (picture) languages. This notion takes as starting point the characterization of one-dimensional recognizable languages in terms of local languages and projections. Such notion can be extended in a natural way to the two-dimensional case. We first introduce a notion of local picture language and then we define,a recognizable picture language as a projection of a local picture language. The family of recognizable picture languages is denoted by REC. We study some combinatorial and language-theoretic properties of family REC. In particular we prove some closure properties with re…

Algebra and Number TheoryString (computer science)Abstract family of languagesComputer Science::Computation and Language (Computational Linguistics and Natural Language and Speech Processing)Ontology languagePicture languageCone (formal languages)Theoretical Computer ScienceUndecidable problemAlgebraComputational Theory and MathematicsClosure (mathematics)Regular languageComputer Science::Programming LanguagesComputer Science::Formal Languages and Automata TheoryInformation SystemsMathematicsFundamenta Informaticae
researchProduct

Codes and automata

2006

Prefix codeTheoretical computer scienceFinite-state machineRegular languageComputer scienceDeterministic automatonAutomaton
researchProduct

Hopcroft’s Algorithm and Cyclic Automata

2008

Minimization of deterministic finite automata is a largely studied problem of the Theory of Automata and Formal Languages. It consists in finding the unique (up to isomorphism) minimal deterministic automaton recognizing a set of words. The first approaches to this topic can be traced back to the 1950’s with the works of Huffman and Moore (cf. [12,15]). Over the years several methods to solve this problem have been proposed but the most efficient algorithm in the worst case was given by Hopcroft in [11]. Such an algorithm computes in O(n log n) the minimal automaton equivalent to a given automaton with n states. The Hopcroft’s algorithm has been widely studied, described and implemented by …

Discrete mathematicsNested wordSettore INF/01 - InformaticaComputer scienceTimed automatonSturmian wordsω-automatonNonlinear Sciences::Cellular Automata and Lattice GasesHopcroft's algorithmCombinatoricsDFA minimizationDeterministic automatonAutomata theoryQuantum finite automataNondeterministic finite automatonAlgorithmComputer Science::Formal Languages and Automata Theory
researchProduct

On languages factorizing the free monoid

1996

A language X⊂A* is called factorizing if there exists a language Y⊂A* such that XY = A* This work was partially supported by ESPRIT-EBRA project ASMICS contact 6317 and project 40% MURST “Algoritmi, Modelli di Calcolo e Strutture Informative”. and the product is unambiguous. First we give a combinatorial characterization of factorizing languages. Further we prove that it is decidable whether a regular language X is factorizing and we construct an automaton recognizing the corresponding language Y. For finite languages we show that it suffices to consider words of bounded length. A complete characterization of factorizing languages with three words and explicit regular expression for the co…

CombinatoricsDiscrete mathematicsRegular languageGeneral MathematicsFree monoidBounded functionProduct (mathematics)Existential quantificationRegular expressionCharacterization (mathematics)DecidabilityMathematics
researchProduct

Patterns in words and languages

2004

AbstractA word p, over the alphabet of variables E, is a pattern of a word w over A if there exists a non-erasing morphism h from E∗ to A∗ such that h(p)=w. If we take E=A, given two words u,v∈A∗, we write u⩽v if u is a pattern of v. The restriction of ⩽ to aA∗, where A is the binary alphabet {a,b}, is a partial order relation. We introduce, given a word v, the set P(v) of all words u such that u⩽v. P(v), with the relation ⩽, is a poset and it is called the pattern poset of v. The first part of the paper is devoted to investigate the relationships between the structure of the poset P(v) and the combinatorial properties of the word v. In the last section, for a given language L, we consider …

PatternApplied MathematicsPartial order on wordStructure (category theory)Set (abstract data type)CombinatoricsFormal languagesSection (category theory)MorphismRegular languagePartial order on wordsDiscrete Mathematics and CombinatoricsOrder (group theory)Partially ordered setWord (group theory)MathematicsDiscrete Applied Mathematics
researchProduct

Forbidden words in symbolic dynamics

2000

AbstractWe introduce an equivalence relation≃between functions from N to N. By describing a symbolic dynamical system in terms of forbidden words, we prove that the≃-equivalence class of the function that counts the minimal forbidden words of a system is a topological invariant of the system. We show that the new invariant is independent from previous ones, but it is not characteristic. In the case of sofic systems, we prove that the≃-equivalence of the corresponding functions is a decidable question. As a more special application, we show, by using the new invariant, that two systems associated to Sturmian words having “different slope” are not conjugate.

Discrete mathematicsApplied Mathematicsautomata and formal languages010102 general mathematics[INFO.INFO-DS]Computer Science [cs]/Data Structures and Algorithms [cs.DS]Symbolic dynamics[INFO.INFO-DS] Computer Science [cs]/Data Structures and Algorithms [cs.DS]0102 computer and information sciencesFunction (mathematics)16. Peace & justice01 natural sciencesDecidabilitysymbolic dynamics010201 computation theory & mathematicsEquivalence relationcombinatoric on words0101 mathematicsInvariant (mathematics)Dynamical system (definition)Equivalence (measure theory)Computer Science::Formal Languages and Automata TheoryWord (group theory)ComputingMilieux_MISCELLANEOUSMathematics
researchProduct

Some Remarks on Automata Minimality

2011

It is well known that the minimization problem of deterministic finite automata (DFAs) is related to the indistinguishability notion of states (cf. [HMU00]). Indeed, a well known technique to minimize a DFA, essentially, consists in finding pairs of states that are equivalent (or indistinguishable), namely pairs of states (p,q) such that it is impossible to assert the difference between p and q only by starting in each of the two states and asking whether or not a given input string leads to a final state. Since, in the testing states equivalence, the notion of initial state is irrelevant, some of the main techniques for the minimization of automata, such as Moore’s algorithm [Moo56] and Ho…

Set (abstract data type)Discrete mathematicsDeterministic finite automatonSettore INF/01 - InformaticaRegular languageCayley graphString (computer science)state-pair graph uniformly minimal automataState (functional analysis)Equivalence (measure theory)Computer Science::Formal Languages and Automata TheoryAutomatonMathematics
researchProduct

Words and forbidden factors

2002

AbstractGiven a finite or infinite word v, we consider the set M(v) of minimal forbidden factors of v. We show that the set M(v) is of fundamental importance in determining the structure of the word v. In the case of a finite word w we consider two parameters that are related to the size of M(w): the first counts the minimal forbidden factors of w and the second gives the length of the longest minimal forbidden factor of w. We derive sharp upper and lower bounds for both parameters. We prove also that the second parameter is related to the minimal period of the word w. We are further interested to the algorithmic point of view. Indeed, we design linear time algorithm for the following two p…

CombinatoricsGeneral Computer ScienceGeneral problemFree monoidFormal languageSturmian wordWord problem (mathematics)AutomorphismTime complexityUpper and lower boundsMathematicsTheoretical Computer ScienceComputer Science(all)Theoretical Computer Science
researchProduct

RECOGNIZABLE PICTURE LANGUAGES

1992

The purpose of this paper is to propose a new notion of recognizability for picture (two-dimensional) languages extending the characterization of one-dimensional recognizable languages in terms of local languages and alphabetic mappings. We first introduce the family of local picture languages (denoted by LOC) and, in particular, prove the undecidability of the emptiness problem. Then we define the new family of recognizable picture languages (denoted by REC). We study some combinatorial and language theoretic properties of REC such as ambiguity, closure properties or undecidability results. Finally we compare the family REC with the classical families of languages recognized by four-way a…

Finite-state machinebusiness.industrymedia_common.quotation_subjectClosure (topology)Abstract family of languagesComputer Science::Computation and Language (Computational Linguistics and Natural Language and Speech Processing)AmbiguityOntology languageCone (formal languages)DecidabilityPhilosophy of languageTheoryofComputation_MATHEMATICALLOGICANDFORMALLANGUAGESArtificial IntelligenceComputer Science::Programming LanguagesComputer Vision and Pattern RecognitionArtificial intelligencebusinessComputer Science::Formal Languages and Automata TheorySoftwareMathematicsmedia_commonInternational Journal of Pattern Recognition and Artificial Intelligence
researchProduct

An extension of the Burrows-Wheeler Transform

2007

AbstractWe describe and highlight a generalization of the Burrows–Wheeler Transform (bwt) to a multiset of words. The extended transformation, denoted by ebwt, is reversible. Moreover, it allows to define a bijection between the words over a finite alphabet A and the finite multisets of conjugacy classes of primitive words in A∗. Besides its mathematical interest, the extended transform can be useful for applications in the context of string processing. In the last part of this paper we illustrate one such application, providing a similarity measure between sequences based on ebwt.

Discrete mathematicsMultisetSimilarity (geometry)General Computer ScienceBurrows–Wheeler transformGeneralizationAlignment-free distance measure; Burrows-Wheeler transform; Sequence comparisonContext (language use)Similarity measureBurrows-Wheeler transformSequence comparisonTheoretical Computer ScienceConjugacy classBijectionAlignment-free distance measureBurrows–Wheeler transformComputer Science::Formal Languages and Automata TheoryComputer Science(all)Mathematics
researchProduct

On the lattice of prefix codes

2002

AbstractThe natural correspondence between prefix codes and trees is explored, generalizing the results obtained in Giammarresi et al. (Theoret. Comput. Sci. 205 (1998) 1459) for the lattice of finite trees under division and the lattice of finite maximal prefix codes. Joins and meets of prefix codes are studied in this light in connection with such concepts as finiteness, maximality and varieties of rational languages. Decidability results are obtained for several problems involving rational prefix codes, including the solution to the primeness problem.

Block codeDiscrete mathematicsPrefix codeGeneral Computer ScienceRational languagesJoinsKraft's inequalityDecidabilityTheoretical Computer SciencePrefixCombinatoricsLattice (order)Computer Science::Formal Languages and Automata TheoryMathematicsComputer Science(all)Theoretical Computer Science
researchProduct

Suffix array and Lyndon factorization of a text

2014

Abstract The main goal of this paper is to highlight the relationship between the suffix array of a text and its Lyndon factorization. It is proved in [15] that one can obtain the Lyndon factorization of a text from its suffix array. Conversely, here we show a new method for constructing the suffix array of a text that takes advantage of its Lyndon factorization. The surprising consequence of our results is that, in order to construct the suffix array, the local suffixes inside each Lyndon factor can be separately processed, allowing different implementative scenarios, such as online, external and internal memory, or parallel implementations. Based on our results, the algorithm that we prop…

Sorting suffixes; BWT; Suffix array; Lyndon word; Lyndon factorizationCompressed suffix arraySettore INF/01 - InformaticaSorting suffixesGeneralized suffix treeSuffix arrayOrder (ring theory)Construct (python library)Lyndon wordSorting suffixeTheoretical Computer Sciencelaw.inventionBWTLyndon factorizationComputational Theory and MathematicsFactorizationlawSuffix arrayFactor (programming language)Internal memoryDiscrete Mathematics and CombinatoricsArithmeticcomputerMathematicscomputer.programming_languageJournal of Discrete Algorithms
researchProduct

Some decision results for recognizable sets in arbitrary monoids

1978

Pure mathematicsFinite-state machineFree monoidMathematics
researchProduct

Automata with Extremal Minimality Conditions

2010

It is well known that the minimality of a deterministic finite automaton (DFA) depends on the set of final states. In this paper we study the minimality of a strongly connected DFA by varying the set of final states. We consider, in particular, some extremal cases. A strongly connected DFA is called uniformly minimal if it is minimal, for any choice of the set of final states. It is called never-minimal if it is not minimal, for any choice of the set of final states. We show that there exists an infinite family of uniformly minimal automata and that there exists an infinite family of never-minimal automata. Some properties of these automata are investigated and, in particular, we consider t…

Discrete mathematicsTheoryofComputation_COMPUTATIONBYABSTRACTDEVICESPowerset constructionBüchi automatonω-automatonNonlinear Sciences::Cellular Automata and Lattice GasesCombinatoricsTheoryofComputation_MATHEMATICALLOGICANDFORMALLANGUAGESDFA minimizationDeterministic automatonQuantum finite automataTwo-way deterministic finite automatonNondeterministic finite automatonComputer Science::Formal Languages and Automata TheoryAutomata MinimizationMathematics
researchProduct

Binary Patterns in Infinite Binary Words

2002

In this paper we study the set P(w) of binary patterns that can occur in one infinite binary word w, comparing it with the set F(w) of factors of the word. Since the set P(w) can be considered as an extension of the set F(w), we first investigate how large is such extension, by introducing the parameter ?(w) that corresponds to the cardinality of the difference set P(w) \ F(w). Some non trivial results about such parameter are obtained in the case of the Thue-Morse and the Fibonacci words. Since, in most cases, the parameter ?(w) is infinite, we introduce the pattern complexity of w, which corresponds to the complexity of the language P(w). As a main result, we prove that there exist infini…

Set (abstract data type)Discrete mathematicsFibonacci numberDifference setCardinalityBinary numberBinary systemExtension (predicate logic)ArithmeticWord (group theory)Mathematics
researchProduct

On fixed points of the Burrows-Wheeler transform

2017

The Burrows-Wheeler Transform is a well known transformation widely used in Data Compression: important competitive compression software, such as Bzip (cf. [1]) and Szip (cf. [2]) and some indexing software, like the FM-index (cf. [3]), are deeply based on the Burrows Wheeler Transform. The main advantage of using BWT for data compression consists in its feature of "clustering" together equal characters. In this paper we show the existence of fixed points of BWT, i.e., words on which BWT has no effect. We show a characterization of the permutations associated to BWT of fixed points and we give the explicit form of fixed points on a binary ordered alphabet a, b having at most four b's and th…

Discrete mathematicsAlgebra and Number TheoryBurrows–Wheeler transformSettore INF/01 - InformaticaPermutationPermutations0102 computer and information sciences02 engineering and technologyInformation SystemFixed point01 natural sciencesTheoretical Computer ScienceComputational Theory and Mathematics010201 computation theory & mathematicsFixed PointFixed Points0202 electrical engineering electronic engineering information engineeringBurrows-Wheeler Transform; Fixed Points; Permutations; Theoretical Computer Science; Algebra and Number Theory; Information Systems; Computational Theory and Mathematics020201 artificial intelligence & image processingBurrows-Wheeler TransformInformation SystemsMathematics
researchProduct

Burrows-Wheeler transform and Run-Length Enconding

2017

In this paper we study the clustering effect of the Burrows-Wheeler Transform (BWT) from a combinatorial viewpoint. In particular, given a word w we define the BWT-clustering ratio of w as the ratio between the number of clusters produced by BWT and the number of the clusters of w. The number of clusters of a word is measured by its Run-Length Encoding. We show that the BWT-clustering ratio ranges in ]0, 2]. Moreover, given a rational number \(r\,\in \,]0,2]\), it is possible to find infinitely many words having BWT-clustering ratio equal to r. Finally, we show how the words can be classified according to their BWT-clustering ratio. The behavior of such a parameter is studied for very well-…

Discrete mathematicsRational numberBurrows–Wheeler transformComputer scienceComputer Science (all)0102 computer and information sciences02 engineering and technologyBurrows-Wheeler transform01 natural sciencesBurrows-Wheeler transform; Clustering effect; Run-length encoding; Theoretical Computer Science; Computer Science (all)Theoretical Computer ScienceClustering effect010201 computation theory & mathematicsRun-length encoding0202 electrical engineering electronic engineering information engineering020201 artificial intelligence & image processingCluster analysisWord (computer architecture)Run-length encoding
researchProduct

Varieties of Codes and Kraft Inequality

2005

Decipherability conditions for codes are investigated by using the approach of Guzman, who introduced in [7] the notion of variety of codes and established a connection between classes of codes and varieties of monoids. The class of Uniquely Decipherable (UD) codes is a special case of variety of codes, corresponding to the variety of all monoids. It is well known that the Kraft inequality is a necessary condition for UD codes, but it is not sufficient, in the sense that there exist codes that are not UD and that satisfy the Kraft inequality. The main result of the present paper states that, given a variety $\mathcal{V}$ of codes, if all the elements of $\mathcal{V}$ satisfy the Kraft inequ…

Discrete mathematicsClass (set theory)Unique factorization domainCode wordAstrophysics::Cosmology and Extragalactic AstrophysicsKraft's inequalityCombinatoricsFormal languageHigh Energy Physics::ExperimentSpecial caseVariety (universal algebra)Connection (algebraic framework)Mathematics::Representation TheoryMathematics
researchProduct

On Extremal Cases of Hopcroft’s Algorithm

2009

In this paper we consider the problem of minimization of deterministic finite automata (DFA) with reference to Hopcroft’s algorithm. Hopcroft’s algorithm has several degrees of freedom, so there can exist different sequences of refinements of the set of the states that lead to the final partition. We find an infinite family of binary automata for which such a process is unique. Some recent papers (cf. [3,7,1]) have been devoted to find families of automata for which Hopcroft’s algorithm has its worst execution time. They are unary automata associated to circular words. However, automata minimization can be achieved also in linear time when the alphabet has only one letter (cf. [14]), so in …

Discrete mathematicsTheoryofComputation_COMPUTATIONBYABSTRACTDEVICESSettore INF/01 - InformaticaUnary operationBinary numberHopcroft's algorithmNonlinear Sciences::Cellular Automata and Lattice GasesAutomatonCombinatoricsSet (abstract data type)TheoryofComputation_MATHEMATICALLOGICANDFORMALLANGUAGESDeterministic finite automatonDFA minimizationMinificationAlgorithmTime complexityComputer Science::Formal Languages and Automata TheoryMathematics
researchProduct

On Fine and Wilf's theorem for bidimensional words

2003

AbstractGeneralizations of Fine and Wilf's Periodicity Theorem are obtained for the case of bidimensional words using geometric arguments. The domains considered constitute a large class of convex subsets of R2 which include most parallelograms. A complete discussion is provided for the parallelogram case.

CombinatoricsLarge classDiscrete mathematicsGeneral Computer ScienceGeneralizationRegular polygonParallelogramWord (group theory)MathematicsTheoretical Computer ScienceComputer Science(all)Theoretical Computer Science
researchProduct

CODING PARTITIONS OF REGULAR SETS

2009

A coding partition of a set of words partitions this set into classes such that whenever a sequence, of minimal length, has two distinct factorizations, the words of these factorizations belong to the same class. The canonical coding partition is the finest coding partition that partitions the set of words in at most one unambiguous class and other classes that localize the ambiguities in the factorizations of finite sequences. We prove that the canonical coding partition of a regular set contains a finite number of regular classes and we give an algorithm for computing this partition. From this we derive a canonical decomposition of a regular monoid into a free product of finitely many re…

MonoidGeneral Mathematicsregular monoid0102 computer and information sciences02 engineering and technologyregular language01 natural sciences[INFO.INFO-CL]Computer Science [cs]/Computation and Language [cs.CL]CombinatoricsRegular language0202 electrical engineering electronic engineering information engineeringPartition (number theory)Finite setComputingMilieux_MISCELLANEOUSMathematicsDiscrete mathematics020206 networking & telecommunicationsPartition of a set16. Peace & justiceFree product010201 computation theory & mathematicscodeuniquely decipherable codecoding partitionRegular setsCoding (social sciences)International Journal of Algebra and Computation
researchProduct

On Sets of Words of Rank Two

2019

Given a (finite or infinite) subset X of the free monoid A∗ over a finite alphabet A, the rank of X is the minimal cardinality of a set F such that X⊆ F∗. A submonoid M generated by k elements of A∗ is k-maximal if there does not exist another submonoid generated by at most k words containing M. We call a set X⊆ A∗ primitive if it is the basis of a |X|-maximal submonoid. This extends the notion of primitive word: indeed, w is a primitive set if and only if w is a primitive word. By definition, for any set X, there exists a primitive set Y such that X⊆ Y∗. The set Y is therefore called a primitive root of X. As a main result, we prove that if a set has rank 2, then it has a unique primitive …

Hidden repetitionPrimitive setExistential quantificationBinary rootk-maximal monoidPseudo-repetitionBasis (universal algebra)CombinatoricsSet (abstract data type)RepetitionCardinalityFree monoidRank (graph theory)Primitive root modulo nComputer Science::Formal Languages and Automata TheoryWord (group theory)Mathematics
researchProduct

A note on Sturmian words

2012

International audience; We describe an algorithm which, given a factor of a Sturmian word, computes the next factor of the same length in the lexicographic order in linear time. It is based on a combinatorial property of Sturmian words which is related with the Burrows-Wheeler transformation.

Discrete mathematicsProperty (philosophy)General Computer ScienceSettore INF/01 - Informatica010102 general mathematics[INFO.INFO-DS]Computer Science [cs]/Data Structures and Algorithms [cs.DS]Sturmian word0102 computer and information sciencesSturmian wordsLexicographical order01 natural sciencesTheoretical Computer ScienceCombinatoricsTransformation (function)010201 computation theory & mathematicsFactor (programming language)combinatorics0101 mathematicscomputerTime complexitycomputer.programming_languageMathematics
researchProduct

On the product of balanced sequences

2011

The product w  =  u  ⊗  v of two sequences u and v is a naturally defined sequence on the alphabet of pairs of symbols. Here, we study when the product w of two balanced sequences u,v is balanced too. In the case u and v are binary sequences, we prove, as a main result, that, if such a product w is balanced and deg ( w ) = 4, then w is an ultimately periodic sequence of a very special form. The case of arbitrary alphabets is approached in the last section. The partial results obtained and the problems proposed show the interest of the notion of product in the study of balanced sequences.

SequenceGeneral MathematicsSturmian wordPeriodic sequenceBinary numberbalanceSturmian wordsInfinite sequences; Sturmian words; balanceComputer Science ApplicationsCombinatoricsInfinite sequencesSection (category theory)Product (mathematics)Infinite sequenceproductAlphabetSoftwareMathematics
researchProduct

Word assembly through minimal forbidden words

2006

AbstractWe give a linear-time algorithm to reconstruct a finite word w over a finite alphabet A of constant size starting from a finite set of factors of w verifying a suitable hypothesis. We use combinatorics techniques based on the minimal forbidden words, which have been introduced in previous papers. This improves a previous algorithm which worked under the assumption of stronger hypothesis.

General Computer ScienceFragment assemblyFactor automaton[INFO.INFO-DS]Computer Science [cs]/Data Structures and Algorithms [cs.DS][INFO.INFO-DS] Computer Science [cs]/Data Structures and Algorithms [cs.DS]0102 computer and information sciences02 engineering and technology01 natural sciencesMinimal forbidden wordTheoretical Computer ScienceCombinatorics0202 electrical engineering electronic engineering information engineeringFinite setComputingMilieux_MISCELLANEOUSCombinatorics on wordMathematicsShortest superstringCombinatorics on wordsRepetition index16. Peace & justice010201 computation theory & mathematics020201 artificial intelligence & image processingAlphabetConstant (mathematics)Word (computer architecture)Computer Science::Formal Languages and Automata TheoryComputer Science(all)
researchProduct

SORTING CONJUGATES AND SUFFIXES OF WORDS IN A MULTISET

2014

In this paper we are interested in the study of the combinatorial aspects related to the extension of the Burrows-Wheeler transform to a multiset of words. Such study involves the notion of suffixes and conjugates of words and is based on two different order relations, denoted by &lt;lex and ≺ω, that, even if strictly connected, are quite different from the computational point of view. In particular, we introduce a method that only uses the &lt;lex sorting among suffixes of a multiset of words in order to sort their conjugates according to ≺ω-order. In this study an important role is played by Lyndon words. This strategy could be used in applications specially in the field of Bioinformatic…

Lyndon words; Burrows-Wheeler transform; Extended Burrows-Wheeler transform; Circular words; Conjugates; Suffixes; SortingSuffixesMultisetTheoretical computer sciencePoint (typography)Burrows–Wheeler transformSettore INF/01 - InformaticaSortingcircular wordExtension (predicate logic)Lyndon wordsBurrows-Wheeler transformLyndon wordField (computer science)ConjugatesconjugateComputer Science (miscellaneous)sortOrder (group theory)suffixeArithmeticextended Burrows-Wheeler transformCircular wordssortingMathematics
researchProduct

Minimal forbidden words and symbolic dynamics

1996

We introduce a new complexity measure of a factorial formal language L: the growth rate of the set of minimal forbidden words. We prove some combinatorial properties of minimal forbidden words. As main result we prove that the growth rate of the set of minimal forbidden words for L is a topological invariant of the dynamical system defined by L.

Discrete mathematicsFactorial010102 general mathematics[INFO.INFO-DS]Computer Science [cs]/Data Structures and Algorithms [cs.DS]Symbolic dynamicsComputer Science::Computation and Language (Computational Linguistics and Natural Language and Speech Processing)[INFO.INFO-DS] Computer Science [cs]/Data Structures and Algorithms [cs.DS]0102 computer and information sciencesInvariant (physics)16. Peace & justice01 natural sciencesCombinatorics010201 computation theory & mathematicsTheoryofComputation_LOGICSANDMEANINGSOFPROGRAMSInformation complexityFormal language0101 mathematicsComputer Science::Formal Languages and Automata TheoryComputingMilieux_MISCELLANEOUSMathematicsofComputing_DISCRETEMATHEMATICSMathematics
researchProduct

On aperiodic trace languages

2005

Trace (semiology)Computer scienceAperiodic graphPrincipal idealMineralogy
researchProduct

On the decomposition of prefix codes

2017

Abstract In this paper we focus on the decomposition of rational and maximal prefix codes. We present an effective procedure that allows us to decide whether such a code is decomposable. In this case, the procedure also produces the factors of some of its decompositions. We also give partial results on the problem of deciding whether a rational maximal prefix code decomposes over a finite prefix code.

Block codePrefix codeGeneral Computer ScienceComputer science0102 computer and information sciences02 engineering and technologyPrefix grammarKraft's inequality01 natural sciencesPrefix codeTheoretical Computer SciencePrefix codes; Finite automata; Composition of codesComposition of codes0202 electrical engineering electronic engineering information engineeringDiscrete mathematicsSelf-synchronizing codeFinite-state machineSettore INF/01 - InformaticaComputer Science (all)Rational languageLinear codePrefixComposition of code010201 computation theory & mathematicsPrefix codes020201 artificial intelligence & image processingFinite automataComputer Science::Formal Languages and Automata Theory
researchProduct

Nondeterministic Moore Automata and Brzozowski’s Algorithm

2011

Moore automata represent a model that has many applications. In this paper we define a notion of coherent nondeterministic Moore automaton (NMA) and show that such a model has the same computational power of the classical deterministic Moore automaton. We consider also the problem of constructing the minimal deterministic Moore automaton equivalent to a given NMA. In this paper we propose an algorithm that is a variant of Brzozowski's algorithm in the sense that it is essentially structured as reverse operation and subset construction performed twice.

Discrete mathematicsTheoryofComputation_COMPUTATIONBYABSTRACTDEVICESSettore INF/01 - InformaticaPowerset constructionBüchi automatonNonlinear Sciences::Cellular Automata and Lattice GasesNondeterministic algorithmTheoryofComputation_MATHEMATICALLOGICANDFORMALLANGUAGESDeterministic finite automatonDFA minimizationDeterministic automatonTwo-way deterministic finite automatonMoore automata minimization Brzozowski'algorithmNondeterministic finite automatonAlgorithmComputer Science::Formal Languages and Automata TheoryMathematics
researchProduct

Some Decision Results on Nonrepetitive Words

1985

The paper addresses some generalizations of the Thue Problem such as: given a word u, does there exist an infinite nonrepetitive overlap free (or square free) word having u as a prefix? A solution to this as well as to related problems is given for the case of overlap free words on a binary alphabet.

PrefixCombinatoricsTheoryofComputation_MATHEMATICALLOGICANDFORMALLANGUAGESComputer Science::Discrete MathematicsUnique factorization domainComputer Science::Computation and Language (Computational Linguistics and Natural Language and Speech Processing)Square-free integerComputer Science::Formal Languages and Automata TheoryBinary alphabetWord (computer architecture)Mathematics
researchProduct

Anti-powers in infinite words

2018

In combinatorics of words, a concatenation of $k$ consecutive equal blocks is called a power of order $k$. In this paper we take a different point of view and define an anti-power of order $k$ as a concatenation of $k$ consecutive pairwise distinct blocks of the same length. As a main result, we show that every infinite word contains powers of any order or anti-powers of any order. That is, the existence of powers or anti-powers is an unavoidable regularity. Indeed, we prove a stronger result, which relates the density of anti-powers to the existence of a factor that occurs with arbitrary exponent. As a consequence, we show that in every aperiodic uniformly recurrent word, anti-powers of ev…

FOS: Computer and information sciencesDiscrete Mathematics (cs.DM)Formal Languages and Automata Theory (cs.FL)ConcatenationComputer Science - Formal Languages and Automata Theory68R150102 computer and information sciences01 natural sciencesTheoretical Computer ScienceCombinatoricsUnavoidable regularityPosition (vector)Infinite wordAvoidability[MATH.MATH-CO]Mathematics [math]/Combinatorics [math.CO]FOS: MathematicsMathematics - CombinatoricsDiscrete Mathematics and CombinatoricsOrder (group theory)Point (geometry)0101 mathematicsDiscrete Mathematics and CombinatoricMathematicsDiscrete mathematics000 Computer science knowledge general worksAnti-power010101 applied mathematicsComputational Theory and Mathematics010201 computation theory & mathematicsAperiodic graphComputer ScienceExponentPairwise comparisonCombinatorics (math.CO)SoftwareWord (group theory)Computer Science - Discrete Mathematics
researchProduct

A combinatorial view on string attractors

2021

Abstract The notion of string attractor has recently been introduced in [Prezza, 2017] and studied in [Kempa and Prezza, 2018] to provide a unifying framework for known dictionary-based compressors. A string attractor for a word w = w 1 w 2 ⋯ w n is a subset Γ of the positions { 1 , … , n } , such that all distinct factors of w have an occurrence crossing at least one of the elements of Γ. In this paper we explore the notion of string attractor by focusing on its combinatorial properties. In particular, we show how the size of the smallest string attractor of a word varies when combinatorial operations are applied and we deduce that such a measure is not monotone. Moreover, we introduce a c…

General Computer ScienceSettore INF/01 - InformaticaString (computer science)de Bruijn word0102 computer and information sciences02 engineering and technologyCharacterization (mathematics)Burrows-Wheeler transform01 natural sciencesMeasure (mathematics)Standard Sturmian wordTheoretical Computer ScienceCombinatoricsConjugacy classMonotone polygonString attractor010201 computation theory & mathematicsAttractorThue-Morse word0202 electrical engineering electronic engineering information engineeringLempel-Ziv encoding020201 artificial intelligence & image processingWord (group theory)Mathematics
researchProduct

Fine and Wilf's Theorem for Three periods and a Generalization of Sturmian Words

1999

AbstractWe extend the theorem of Fine and Wilf to words having three periods. We then define the set 3-PER of words of maximal length for which such result does not apply. We prove that the set 3-PER and the sequences of complexity 2n + 1, introduced by Arnoux and Rauzy to generalize Sturmian words, have the same set of factors.

Discrete mathematicsPeriodicityEuclid's algorithmCombinatorics on wordsGeneral Computer ScienceGeneralizationSturmian wordSturmian wordsTheoretical Computer ScienceCombinatoricsSet (abstract data type)Combinatorics on wordsWord lengthComputer Science(all)Mathematics
researchProduct

Some applications of a theorem of Shirshov to language theory

1983

Some applications of a theorem of Shirshov to language theory are given: characterization of regular languages, characterization of bounded languages, and a sufficient condition for a language to be Parikh-bounded.

business.industryGeneral EngineeringComputer Science::Computation and Language (Computational Linguistics and Natural Language and Speech Processing)Characterization (mathematics)computer.software_genrePhilosophy of languageAlgebraTheoryofComputation_MATHEMATICALLOGICANDFORMALLANGUAGESRegular languageBounded functionComputer Science::Programming LanguagesArtificial intelligencebusinesscomputerNatural language processingEngineering(all)MathematicsInformation and Control
researchProduct

A finiteness condition for finitely generated semigroups

1984

Pure mathematicsAlgebra and Number TheoryFinitely-generated abelian groupAlgebra over a fieldMathematics
researchProduct

Forbidden Factors and Fragment Assembly

2002

In this paper we approach the fragment assembly problem by using the notion of minimal forbidden factors introduced in previous paper. Denoting by M(w) the set of minimal forbidden factors of a word w, we first focus on the evaluation of the size of elements in M(w) and on designing of an algorithm to recover the word w from M(w). Actually we prove that for a word w randomly generated by a memoryless source with identical symbol probabilities, the maximal length m(w) of words in M(w) is logarithmic and that the reconstruction algorithm runs in linear time. These results have an interesting application to the fragment assembly problem, i.e. reconstruct a word w from a given set I of substrin…

Set (abstract data type)CombinatoricsLogarithmFragment (logic)Reconstruction algorithmFocus (optics)AlgorithmTime complexitySubstringWord (computer architecture)Mathematics
researchProduct

Balance Properties and Distribution of Squares in Circular Words

2008

We study balance properties of circular words over alphabets of size greater than two. We give some new characterizations of balanced words connected to the Kawasaki-Ising model and to the notion of derivative of a word. Moreover we consider two different generalizations of the notion of balance, and we find some relations between them. Some of our results can be generalised to non periodic infinite words as well.

CombinatoricsBalance (metaphysics)Distribution (number theory)Settore INF/01 - InformaticaComputer Science::Computation and Language (Computational Linguistics and Natural Language and Speech Processing)Combinatoria delle Parole Parole Sturmiane parole circolari Parole BilanciateComputer Science::Formal Languages and Automata TheoryBinary alphabetWord (group theory)Mathematics
researchProduct

A bijection between words and multisets of necklaces

2012

Two of the present authors have given in 1993 a bijection Phi between words on a totally ordered alphabet and multisets of primitive necklaces. At the same time and independently, Burrows and Wheeler gave a data compression algorithm which turns out to be a particular case of the inverse of Phi. In the present article, we show that if one replaces in Phi the standard permutation of a word by the co-standard one (reading the word from right to left), then the inverse bijection is computed using the alternate lexicographic order (which is the order of real numbers given by continued fractions) on necklaces, instead of the lexicographic order as for Phi(-1). The image of the new bijection, ins…

Discrete mathematicsBurrows and Wheeler TransformMathematics::CombinatoricsSettore INF/01 - InformaticaFree Lie algebraLie superalgebrastandard permutationLexicographical orderTheoretical Computer ScienceImage (mathematics)CombinatoricsSet (abstract data type)PermutationComputational Theory and MathematicsBijectionDiscrete Mathematics and CombinatoricsGeometry and TopologyComputer Science::Formal Languages and Automata TheoryWord (group theory)MathematicsReal number
researchProduct

Extending formal language hierarchies to higher dimensions

1999

General Computer ScienceProgramming languageComputer scienceObject languagecomputer.software_genreFormal systemTheoretical Computer ScienceFormal grammarDeterministic finite automatonRegular languageFormal languageAutomata theoryNondeterministic finite automatoncomputerACM Computing Surveys
researchProduct

Ambiguity and complementation in recognizable two-dimensional languages

2008

The theory of one-dimensional (word) languages is well founded and investigated since fifties. From several years, the increasing interest for pattern recognition and image processing motivated the research on two-dimensional or picture languages, and nowadays this is a research field of great interest. A first attempt to formalize the concept of finite state recognizability for two-dimensional languages can be attributed to Blum and Hewitt ([7]) who started in 1967 the study of finite state devices that can define two-dimensional languages, with the aim to finding a counterpart of what regular languages are in one dimension. Since then, many approaches have been presented in the literature…

Finite-state machineTessellationCOMPLEXITYSettore INF/01 - Informaticamedia_common.quotation_subjectPicture LanguageAmbiguityPattern RecognitionPicture languageAlgebraRule-based machine translationRegular languageFormal LanguagePICTURE-LANGUAGES; NONDETERMINISM; COMPLEXITY; AUTOMATAFormal languageRegular expressionAUTOMATAArithmeticPICTURE-LANGUAGESmedia_commonMathematicsNONDETERMINISM
researchProduct

ON THE STAR HEIGHT OF RATIONAL LANGUAGES

1994

Two problems concerning the star height of a rational language are investigated: the star height one problem and the relationships between the unambiguity of an expression and its star height. For this purpose we consider the class of factorial, transitive and rational (FTR) languages. From the algebraic point of view a FTR language is the set of factors of a rational submonoid M. Two subclasses of FTR languages are introduced: renewal languages, corresponding to the case of M finitely generated, and unambiguous renewal languages, corresponding to the case of M finitely generated and free. We prove that a FTR language has star height one if and only if it is renewal. This gives a simple de…

Discrete mathematicsFactorialTransitive relationStar heightGeneral Mathematicsmedia_common.quotation_subjectComputer Science::Computation and Language (Computational Linguistics and Natural Language and Speech Processing)AmbiguityRegular languageIf and only ifComputer Science::Programming LanguagesEntropy (information theory)Algebraic numberMathematicsmedia_commonInternational Journal of Algebra and Computation
researchProduct

Measuring the clustering effect of BWT via RLE

2017

Abstract The Burrows–Wheeler Transform (BWT) is a reversible transformation on which are based several text compressors and many other tools used in Bioinformatics and Computational Biology. The BWT is not actually a compressor, but a transformation that performs a context-dependent permutation of the letters of the input text that often create runs of equal letters (clusters) longer than the ones in the original text, usually referred to as the “clustering effect” of BWT. In particular, from a combinatorial point of view, great attention has been given to the case in which the BWT produces the fewest number of clusters (cf. [5] , [16] , [21] , [23] ). In this paper we are concerned about t…

0301 basic medicineGeneral Computer SciencePermutationComputer Science (all)Binary number0102 computer and information sciencesQuantitative Biology::Genomics01 natural sciencesUpper and lower boundsTheoretical Computer ScienceCombinatorics03 medical and health sciencesPermutation030104 developmental biologyTransformation (function)BWT010201 computation theory & mathematicsRun-length encodingComputer Science::Data Structures and AlgorithmsCluster analysisPrimitive root modulo nBWT; Permutation; Run-length encoding; Theoretical Computer Science; Computer Science (all)Word (computer architecture)Run-length encodingMathematics
researchProduct

String Attractors and Infinite Words

2022

The notion of string attractor has been introduced by Kempa and Prezza (STOC 2018) in the context of Data Compression and it represents a set of positions of a finite word in which all of its factors can be “attracted”. The smallest size γ∗ of a string attractor for a finite word is a lower bound for several repetitiveness measures associated with the most common compression schemes, including BWT-based and LZ-based compressors. The combinatorial properties of the measure γ∗ have been studied in [Mantaci et al., TCS 2021]. Very recently, a complexity measure, called string attractor profile function, has been introduced for infinite words, by evaluating γ∗ on each prefix. Such a measure has…

String attractorSettore INF/01 - InformaticaFactor complexityMorphismSturmian wordRecurrent wordRepetitiveness measure
researchProduct

Circular sturmian words and Hopcroft’s algorithm

2009

AbstractIn order to analyze some extremal cases of Hopcroft’s algorithm, we investigate the relationships between the combinatorial properties of a circular sturmian word (x) and the run of the algorithm on the cyclic automaton Ax associated to (x). The combinatorial properties of words taken into account make use of sturmian morphisms and give rise to the notion of reduction tree of a circular sturmian word. We prove that the shape of this tree uniquely characterizes the word itself. The properties of the run of Hopcroft’s algorithm are expressed in terms of the derivation tree of the automaton, which is a tree that represents the refinement process that, in the execution of Hopcroft’s alg…

Discrete mathematicsReduction (recursion theory)Fibonacci numberGeneral Computer ScienceHopcroft'algorithmSturmian wordSturmian wordSturmian morphismsTheoretical Computer ScienceCombinatoricsTree (descriptive set theory)TheoryofComputation_MATHEMATICALLOGICANDFORMALLANGUAGESComputer Science::Discrete MathematicsDeterministic automatonHopcroft’s minimization algorithmCircular sturmian wordsTree automatonDeterministic finite state automataTime complexityAlgorithmComputer Science::Formal Languages and Automata TheoryWord (group theory)Computer Science(all)MathematicsTheoretical Computer Science
researchProduct

Monadic Second-Order Logic over Rectangular Pictures and Recognizability by Tiling Systems

1996

Abstract It is shown that a set of pictures (rectangular arrays of symbols) is recognized by a finite tiling system iff it is definable in existential monadic second-order logic. As a consequence, finite tiling systems constitute a notion of recognizability over two-dimensional inputs which at the same time generalizes finite-state recognizability over strings and also matches a natural logic. The proof is based on the Ehrenfeucht–Fraisse technique for first-order logic and an implementation of “threshold counting” within tiling systems.

Predicate logicMonadic second-order logicDiscrete mathematicsNatural logicIntermediate logicHigher-order logicMonadic predicate calculusComputer Science ApplicationsTheoretical Computer ScienceMathematics::LogicTheoryofComputation_MATHEMATICALLOGICANDFORMALLANGUAGESComputational Theory and MathematicsComputer Science::Logic in Computer ScienceMany-valued logicDynamic logic (modal logic)Computer Science::Formal Languages and Automata TheoryInformation SystemsMathematicsInformation and Computation
researchProduct

Ordering and Convex Polyominoes

2005

We introduce a partial order on pictures (matrices), denoted by ≼ that extends to two dimensions the subword ordering on words. We investigate properties of special families of discrete sets (corresponding to {0,1}-matrices) with respect to this partial order. In particular we consider the families of polyominoes and convex polyominoes and the family, recently introduced by the authors, of L-convex polyominoes. In the first part of the paper we study the closure properties of such families with respect to the order. In particular we obtain a new characterization of L-convex polyominoes: a discrete set P is a L-convex polyomino if and only if all the elements Q≼P are polyominoes. In the seco…

Discrete mathematicsMathematics::CombinatoricsPolyominoBinary relationRegular polygonConvex setDiscrete geometryMonotonic functionPartial OrderComputer Science::Computational GeometryMonotone FunctionCombinatoricsClosure PropertyBinary RelationFormal Language TheoryClosure (mathematics)Computer Science::Discrete MathematicsPartially ordered setComputer Science::Formal Languages and Automata TheoryMathematics
researchProduct

Minimal forbidden factors of circular words

2017

Minimal forbidden factors are a useful tool for investigating properties of words and languages. Two factorial languages are distinct if and only if they have different (antifactorial) sets of minimal forbidden factors. There exist algorithms for computing the minimal forbidden factors of a word, as well as of a regular factorial language. Conversely, Crochemore et al. [IPL, 1998] gave an algorithm that, given the trie recognizing a finite antifactorial language $M$, computes a DFA recognizing the language whose set of minimal forbidden factors is $M$. In the same paper, they showed that the obtained DFA is minimal if the input trie recognizes the minimal forbidden factors of a single word.…

FOS: Computer and information sciencesSettore ING-INF/05 - Sistemi Di Elaborazione Delle InformazioniGeneral Computer ScienceDiscrete Mathematics (cs.DM)Finite automatonSettore INF/01 - InformaticaFormal Languages and Automata Theory (cs.FL)Factor automatonComputer Science - Formal Languages and Automata TheoryComputer Science::Computation and Language (Computational Linguistics and Natural Language and Speech Processing)Circular wordFibonacci wordMinimal forbidden factorTheoretical Computer ScienceComputer Science::Formal Languages and Automata TheoryComputer Science - Discrete Mathematics
researchProduct

Characteristic Sturmian words are extremal for the Critical Factorization Theorem

2012

We prove that characteristic Sturmian words are extremal for the Critical Factorization Theorem (CFT) in the following sense. If p x ( n ) denotes the local period of an infinite word x at point n , we prove that x is a characteristic Sturmian word if and only if p x ( n ) is smaller than or equal to n + 1 for all n ≥ 1 and it is equal to n + 1 for infinitely many integers n . This result is extremal with respect to the \{CFT\} since a consequence of the \{CFT\} is that, for any infinite recurrent word x, either the function p x is bounded, and in such a case x is periodic, or p x ( n ) ≥ n + 1 for infinitely many integers n . As a byproduct of the techniques used in the paper we extend a r…

Critical Factorization TheoremDiscrete mathematicsPeriodicitySettore INF/01 - InformaticaCombinatorics on wordsGeneral Computer ScienceSturmian wordSturmian wordsFunction (mathematics)Critical point (mathematics)Theoretical Computer ScienceCombinatoricsCombinatorics on wordssymbols.namesakeBounded functionWeierstrass factorization theoremsymbolsFibonacci wordWord (group theory)MathematicsComputer Science(all)Theoretical Computer Science
researchProduct

A synchronization property of pure subsemigroups of a free semigroup

1981

AlgebraCancellative semigroupAlgebra and Number TheoryProperty (philosophy)SemigroupSynchronization (computer science)Algebra over a fieldMathematicsSemigroup Forum
researchProduct

On generalized Lyndon words

2018

Abstract A generalized lexicographical order on infinite words is defined by choosing for each position a total order on the alphabet. This allows to define generalized Lyndon words. Every word in the free monoid can be factorized in a unique way as a nonincreasing factorization of generalized Lyndon words. We give new characterizations of the first and the last factor in this factorization as well as new characterization of generalized Lyndon words. We also give more specific results on two special cases: the classical one and the one arising from the alternating lexicographical order.

FOS: Computer and information sciencesGeneral Computer ScienceDiscrete Mathematics (cs.DM)Computer Science::Computation and Language (Computational Linguistics and Natural Language and Speech Processing)68R15Characterization (mathematics)Lexicographical orderTheoretical Computer ScienceLyndon wordsCombinatoricsFactorizationPosition (vector)Free monoidFOS: MathematicsOrder (group theory)Mathematics - CombinatoricsCombinatorics (math.CO)Word (group theory)Computer Science::Formal Languages and Automata TheoryMathematicsComputer Science - Discrete Mathematics
researchProduct

Star-free trace languages

1992

Abstract Generalizing a classical result of Schutzenberger to free partially commutative monoids, we prove that the family of star-free trace languages coincides with the family of aperiodic trace languages.

MonoidPure mathematicsGeneral Computer ScienceAbstract family of languagesComputer Science::Computation and Language (Computational Linguistics and Natural Language and Speech Processing)Star (graph theory)Cone (formal languages)Theoretical Computer ScienceTrace (semiology)Aperiodic graphFormal languageComputer Science::Programming LanguagesCommutative propertyMathematicsComputer Science(all)Theoretical Computer Science
researchProduct

Periodicity vectors for labelled trees

2003

AbstractThe concept of a periodicity vector is introduced in the context of labelled trees, and some new periodicity theorems are obtained. These results constitute generalizations of the classical periodicity theorem of Fine and Wilf for words. The concept of a tree congruence is also generalized and the isomorphism between the lattice of tree congruences and the lattice of unlabelled trees (prefix codes) is established.

Discrete mathematicsMonoidPrefix codePeriodicityApplied MathematicsContext (language use)Congruence relationTree (graph theory)CombinatoricsFormal languagesLattice (music)Labelled treeCongruence (manifolds)Periodicity vectorDiscrete Mathematics and CombinatoricsIsomorphismMathematicsDiscrete Applied Mathematics
researchProduct

Words and Patterns

2002

In this paper some new ideas, problems and results on patterns are proposed. In particular, motivated by questions concerning avoidability, we first study the set of binary patterns that can occur in one infinite binary word, comparing it with the set of factors of the word. This suggests a classification of infinite words in terms of the "difference" between the set of its patterns and the set of its factors. The fact that each factor in an infinite word can give rise to several distinct patterns leads to study the set of patterns of a single finite word. This set, endowed with a natural order relation, defines a poset: we investigate the relationships between the structure of such a poset…

Set (abstract data type)Discrete mathematicsStructure (mathematical logic)Regular languageRelation (database)Binary numberComputer Science::Computation and Language (Computational Linguistics and Natural Language and Speech Processing)Natural orderPartially ordered setComputer Science::Formal Languages and Automata TheoryWord (computer architecture)Mathematics
researchProduct

The Alternating BWT: an algorithmic perspective

2020

Abstract The Burrows-Wheeler Transform (BWT) is a word transformation introduced in 1994 for Data Compression. It has become a fundamental tool for designing self-indexing data structures, with important applications in several areas in science and engineering. The Alternating Burrows-Wheeler Transform (ABWT) is another transformation recently introduced in Gessel et al. (2012) [21] and studied in the field of Combinatorics on Words. It is analogous to the BWT, except that it uses an alternating lexicographical order instead of the usual one. Building on results in Giancarlo et al. (2018) [23] , where we have shown that BWT and ABWT are part of a larger class of reversible transformations, …

Discrete mathematicsFOS: Computer and information sciencesSettore INF/01 - InformaticaGeneral Computer ScienceBasis (linear algebra)Computer scienceAlternating Burrows-Wheeler TransformGalois wordRank-invertibilityField (mathematics)Data structureTheoretical Computer ScienceTransformation (function)Difference cover algorithmComputer Science - Data Structures and AlgorithmsData Structures and Algorithms (cs.DS)Time complexityAlternating Burrows-Wheeler Transform; Difference cover algorithm; Galois word; Rank-invertibilityWord (computer architecture)Data compression
researchProduct

Minimal Forbidden Factors of Circular Words

2017

Minimal forbidden factors are a useful tool for investigating properties of words and languages. Two factorial languages are distinct if and only if they have different (antifactorial) sets of minimal forbidden factors. There exist algorithms for computing the minimal forbidden factors of a word, as well as of a regular factorial language. Conversely, Crochemore et al. [IPL, 1998] gave an algorithm that, given the trie recognizing a finite antifactorial language M, computes a DFA of the language having M as set of minimal forbidden factors. In the same paper, they showed that the obtained DFA is minimal if the input trie recognizes the minimal forbidden factors of a single word. We gener…

L-automatonDiscrete mathematicsFactorialFibonacci numberSettore INF/01 - InformaticaComputer Science (all)Computer Science::Computation and Language (Computational Linguistics and Natural Language and Speech Processing)0102 computer and information sciences02 engineering and technologyCircular wordMinimal forbidden factor01 natural sciencesTheoretical Computer ScienceSet (abstract data type)010201 computation theory & mathematicsIf and only ifTrie0202 electrical engineering electronic engineering information engineering020201 artificial intelligence & image processingComputer Science::Formal Languages and Automata TheoryWord (computer architecture)Mathematics
researchProduct

Primitive sets of words

2020

Given a (finite or infinite) subset $X$ of the free monoid $A^*$ over a finite alphabet $A$, the rank of $X$ is the minimal cardinality of a set $F$ such that $X \subseteq F^*$. We say that a submonoid $M$ generated by $k$ elements of $A^*$ is {\em $k$-maximal} if there does not exist another submonoid generated by at most $k$ words containing $M$. We call a set $X \subseteq A^*$ {\em primitive} if it is the basis of a $|X|$-maximal submonoid. This definition encompasses the notion of primitive word -- in fact, $\{w\}$ is a primitive set if and only if $w$ is a primitive word. By definition, for any set $X$, there exists a primitive set $Y$ such that $X \subseteq Y^*$. We therefore call $Y$…

FOS: Computer and information sciencesPrimitive setDiscrete Mathematics (cs.DM)General Computer ScienceFormal Languages and Automata Theory (cs.FL)Pseudo-repetitionComputer Science - Formal Languages and Automata Theory0102 computer and information sciences02 engineering and technology01 natural sciencesTheoretical Computer ScienceCombinatoricsCardinalityFree monoidBi-rootFOS: Mathematics0202 electrical engineering electronic engineering information engineeringMathematics - CombinatoricsRank (graph theory)Primitive root modulo nMathematicsHidden repetitionSettore INF/01 - InformaticaIntersection (set theory)k-maximal monoidFunction (mathematics)Basis (universal algebra)010201 computation theory & mathematics020201 artificial intelligence & image processingCombinatorics (math.CO)Computer Science::Formal Languages and Automata TheoryWord (group theory)Computer Science - Discrete Mathematics
researchProduct

Balanced Words Having Simple Burrows-Wheeler Transform

2009

The investigation of the "clustering effect" of the Burrows-Wheeler transform (BWT) leads to study the words having simple BWT , i.e. words w over an ordered alphabet $A=\{a_1,a_2,\ldots,a_k\}$, with $a_1 < a_2 < \ldots <a_k$, such that $bwt(w)$ is of the form $a_k^{n_k} a_{k-1}^{n_{k-1}} \cdots a_1^{n_1}$, for some non-negative integers $n_1, n_2, \ldots, n_k$. We remark that, in the case of binary alphabets, there is an equivalence between words having simple BWT, the family of (circular) balanced words and the conjugates of standard words. In the case of alphabets of size greater than two, there is no more equivalence between these notions. As a main result of this paper we prove that, u…

CombinatoricsConjugacy classClustering effectBurrows–Wheeler transformSettore INF/01 - InformaticaBurrows Wheeler Transform Combinatorics on Words Balanced sequences epistandard rich words words having simple BWTBinary numberBurrows-Wheeler TransformAlphabetBinary alphabetBurrows-Wheeler Transform; Clustering effectMathematics
researchProduct

Representations of integers and Language theory

1984

AlgebraPhilosophy of languageMathematics
researchProduct

Equations on trees

1996

We introduce the notion of equation on trees, generalizing the corresponding notion for words, and we develop the first steps of a theory of tree equations. The main result of the paper states that, if a pair of trees is the solution of a tree equation with two indeterminates, then the two trees are both powers of the same tree. As an application, we show that a tree can be expressed in a unique way as a power of a primitive tree. This extends a basic result of combinatorics on words to trees. Some open problems are finally proposed.

Discrete mathematicsTree (data structure)Combinatorics on wordsBinary treeTree codeMathematics
researchProduct

An extension of the Burrows-Wheeler Transform and applications to sequence comparison and data compression

2005

We introduce a generalization of the Burrows-Wheeler Transform (BWT) that can be applied to a multiset of words. The extended transformation, denoted by E, is reversible, but, differently from BWT, it is also surjective. The E transformation allows to give a definition of distance between two sequences, that we apply here to the problem of the whole mitochondrial genome phylogeny. Moreover we give some consideration about compressing a set of words by using the E transformation as preprocessing.

Discrete mathematicsMultisetBurrows-Wheeler transform; Data Compression; Mitochondrial genome phylogenyBurrows–Wheeler transformMultiplicity (mathematics)Mitochondrial genome phylogenyBurrows-Wheeler transformData CompressionSurjective functionConjugacy classSequence comparisonPreprocessorAlgorithmMathematicsData compression
researchProduct

Comparing Sequences by the Burrows-Wheeler Transform

2004

researchProduct

L-convex Polyominoes: A Survey

2006

researchProduct

Tomographical aspects of L-convex polyominoes

2007

Discrete Tomography Polyominoes.
researchProduct

Fragment assembly through minimal forbidden words

2004

researchProduct

On the trace product and some families of languages closed under partial commutation

2004

researchProduct

On Extremal Cases of the Hopcroft's Algorithm

2010

In this paper we consider the problem of minimization of deterministic finite automata (DFA) with reference to Hopcroft’s algorithm. Hopcroft’s algorithm has several degrees of freedom, so there can exist different executions that can lead to different sequences of refinements of the set of the states up to the final partition. We find an infinite family of binary automata for which such a process is unique, whatever strategy is chosen. Some recent papers (cf. Berstel and Carton (2004) [3], Castiglione et al. (2008) [6] and Berstel et al. (2009) [1]) have been devoted to find families of automata for which Hopcroft’s algorithm has its worst execution time. They are unary automata associated…

Hopcroft’s minimization algorithmStandard treeDeterministic finite state automataWord trees
researchProduct

An extension of the Burrows-Wheeler Transform to k words

2005

researchProduct

Lecture Notes in Computer Science - Proc. of DLT 2005

2005

researchProduct

Regular coding partitions

2006

researchProduct

The Burrows-Wheeler Transform: from data compression to combinatorics on words

2005

researchProduct

Combinatorics on Words

2005

researchProduct

Highmann's Theorem on Discrete Sets

2006

In this paper we investigate properties of different classes of discrete sets with respect to the partial-order of subpicture. In particular we take in consideration the classes of convex polyominoes and L-convex polyominoes. In the first part of the paper we study closure properties of these classes with respect the order and we give a new characterization of L-convex polyominoes. In the second part we pose the question to extend Higman’s theoremto discrete sets. We give a negative answer in the general case and we prove that the set of L-convex polyominoes is well-partially-ordered by using a representation of L-convex polyominoes in terms of words of a regular language.

subpicture order and wellpartial-ordering.Discrete sets polyominoes and L-convex polyominoe
researchProduct

A new sequence distance measure based on the Burrows-Wheeler Transform

2005

researchProduct

Matrix-based complexity functions and recognizable picture languages

2008

MatriceSettore INF/01 - InformaticaPicture LanguageComplexity functionsAutomata
researchProduct

Coding Partitions

2007

Motivated by the study of decipherability conditions for codes weaker than Unique Decipherability (UD), we introduce the notion of coding partition. Such a notion generalizes that of UD code and, for codes that are not UD, allows to recover the ''unique decipherability" at the level of the classes of the partition. By tacking into account the natural order between the partitions, we define the characteristic partition of a code X as the finest coding partition of X. This leads to introduce the canonical decomposition of a code in at most one unambiguous component and other (if any) totally ambiguous components. In the case the code is finite, we give an algorithm for computing its canonical…

[INFO.INFO-DM] Computer Science [cs]/Discrete Mathematics [cs.DM]General Computer Sciencelcsh:MathematicsDiscrete Mathematics and Combinatoricslcsh:QA1-939Theoretical Computer Science
researchProduct

The Burrows-Wheeler Transform: a new tool in Combinatorics on Words

2005

researchProduct

Circular words and automata minimization

2007

researchProduct

Words

2015

Words combinatorics enumerative combinatorics
researchProduct

String attractors and combinatorics on words

2019

The notion of \emph{string attractor} has recently been introduced in [Prezza, 2017] and studied in [Kempa and Prezza, 2018] to provide a unifying framework for known dictionary-based compressors. A string attractor for a word $w=w[1]w[2]\cdots w[n]$ is a subset $\Gamma$ of the positions $\{1,\ldots,n\}$, such that all distinct factors of $w$ have an occurrence crossing at least one of the elements of $\Gamma$. While finding the smallest string attractor for a word is a NP-complete problem, it has been proved in [Kempa and Prezza, 2018] that dictionary compressors can be interpreted as algorithms approximating the smallest string attractor for a given word. In this paper we explore the noti…

FOS: Computer and information sciencesSettore ING-INF/05 - Sistemi Di Elaborazione Delle InformazioniSettore INF/01 - InformaticaFormal Languages and Automata Theory (cs.FL)De Brujin wordComputer Science - Formal Languages and Automata TheoryBurrows-Wheeler transformString attractorComputer Science - Data Structures and AlgorithmsThue-Morse wordLempel-Ziv encodingBurrows-Wheeler transform; De Brujin word; Lempel-Ziv encoding; Run-length encoding; String attractor; Thue-Morse wordData Structures and Algorithms (cs.DS)Run-length encoding
researchProduct

Metodo per la indicizzazione approssimata e il reperimento di sequenze che si ripetono all'interno di sequenze

2005

brevetto

oligonucleotidi pcr sequenze omologia
researchProduct

Enumeration of L-convex polyominoes

2004

researchProduct

Hopcroft’s Algorithm and Tree-like Automata

2009

In order to analyze some extremal cases of Hopcroft’s algorithm we deepened the relationship between combinatorial properties of circular words and the ex- ecution of the algorithm on cyclic automata associated to such words.In this paper we highlight the notion of word tree and in particular, we char- acterize the word trees for which Hopcroft’s algorithm on the associated tree-like automata has a unique refinement process. Moreover, we show the relationship between the time complexity of the refinements process of the Hopcroft’s algo- rithm on unary cyclic automata and binary tree-like automata. Such a result allows to exhibit a family of tree-like automata representing the worst case of …

Automata minimization Hoprcroft's algorithm trees.
researchProduct

On Balancing of a Direct Product

2009

A direct product of two sequences is a naturally defined sequence on the alphabet of pairs of symbols. By taking inspiration from [Pavel Salimov. On uniform recurrence of a direct product. In AutoMathA, 2009], where the author investigates the case of uniformly recurrent words, here, we study when the product of two balanced sequences on binary alphabet is also balanced.

Balanced sequenceSettore INF/01 - InformaticaDirect ProductSturmianCombinatorics on Word
researchProduct

Approximate string matching: indexing and the k-mismatch problem

2004

researchProduct