0000000001192844
AUTHOR
Antonio Restivo
Text Compression Using Antidictionaries
International audience; We give a new text compression scheme based on Forbidden Words ("antidictionary"). We prove that our algorithms attain the entropy for balanced binary sources. They run in linear time. Moreover, one of the main advantages of this approach is that it produces very fast decompressors. A second advantage is a synchronization property that is helpful to search compressed data and allows parallel compression. Our algorithms can also be presented as "compilers" that create compressors dedicated to any previously fixed source. The techniques used in this paper are from Information Theory and Finite Automata.
Burrows–Wheeler transform and Sturmian words
Automata and forbidden words
Abstract Let L ( M ) be the (factorial) language avoiding a given anti-factorial language M . We design an automaton accepting L ( M ) and built from the language M . The construction is effective if M is finite. If M is the set of minimal forbidden words of a single word ν, the automaton turns out to be the factor automaton of ν (the minimal automaton accepting the set of factors of ν). We also give an algorithm that builds the trie of M from the factor automaton of a single word. It yields a nontrivial upper bound on the number of minimal forbidden words of a word.
Reducing Local Alphabet Size in Recognizable Picture Languages
A recognizable picture language is defined as the projection of a local picture language defined by a set of two-by-two tiles, i.e. by a strictly-locally-testable (SLT) language of order 2. The family of recognizable picture languages is also defined, using larger k by k tiles, \(k>2\), by the projection of the corresponding SLT language. A basic measure of the descriptive complexity of a picture language is given by the size of the SLT alphabet using two-by-two tiles, more precisely by the so-called alphabetic ratio of sizes: SLT-alphabet/picture-alphabet. We study how the alphabetic ratio changes moving from two to larger tile sizes, and we obtain the following result: any recognizable pi…
Monadic second-order logic over pictures and recognizability by tiling systems
We show that a set of pictures (rectangular arrays of symbols) is recognized by a finite tiling system if and only if it is definable in existential monadic second-order logic. As a consequence, finite tiling systems constitute a notion of recognizability over two-dimensional inputs which at the same time generalizes finite-state recognizability over strings and matches a natural logic. The proof is based on the Ehrenfeucht-FraIsse technique for first-order logic and an implementation of “threshold counting” within tiling systems.
Forbidden Factors and Fragment Assembly
In this paper methods and results related to the notion of minimal forbidden words are applied to the fragment assembly problem. The fragment assembly problem can be formulated, in its simplest form, as follows: reconstruct a word w from a given set I of substrings (fragments ) of a word w . We introduce an hypothesis involving the set of fragments I and the maximal length m(w) of the minimal forbidden factors of w . Such hypothesis allows us to reconstruct uniquely the word w from the set I in linear time. We prove also that, if w is a word randomly generated by a memoryless source with identical symbol probabilities, m(w) is logarithmic with respect to the size of w . This result shows th…
Languages with mismatches
AbstractIn this paper we study some combinatorial properties of a class of languages that represent sets of words occurring in a text S up to some errors. More precisely, we consider sets of words that occur in a text S with k mismatches in any window of size r. The study of this class of languages mainly focuses both on a parameter, called repetition index, and on the set of the minimal forbidden words of the language of factors of S with errors. The repetition index of a string S is defined as the smallest integer such that all strings of this length occur at most in a unique position of the text S up to errors. We prove that there is a strong relation between the repetition index of S an…
Comparing Sequences by the Burrows-Wheeler Transform
An automata-theoretic approach to the study of the intersection of two submonoids of a free monoid
We investigate the intersection of two finitely generated submonoids of the free monoid on a finite alphabet. To this purpose, we consider automata that recognize such submonoids and we study the product automata recognizing their intersection. By using automata methods we obtain a new proof of a result of Karhumaki on the cha- racterization of the intersection of two submonoids of rank two, in the case of prefix (or suffix) generators. In a more general setting, for an arbitrary number of generators, we prove that if H and K are two finitely generated submonoids generated by prefix sets such that the product automaton associated to H ∩ K has a given special property then �(H ∩ K) ≤ �(H)�(K…
A New Combinatorial Approach to Sequence Comparison
In this paper we introduce a new alignment-free method for comparing sequences which is combinatorial by nature and does not use any compressor nor any information-theoretic notion. Such a method is based on an extension of the Burrows-Wheeler Transform, a transformation widely used in the context of Data Compression. The new extended transformation takes as input a multiset of sequences and produces as output a string obtained by a suitable rearrangement of the characters of all the input sequences. By using such a transformation we give a general method for comparing sequences that takes into account how much the characters coming from the different input sequences are mixed in the output…
The Shuffle Product: New Research Directions
In this paper we survey some recent researches concerning the shuffle operation that arise both in Formal Languages and in Combinatorics on Words.
Dictionary-symbolwise flexible parsing
AbstractLinear-time optimal parsing algorithms are rare in the dictionary-based branch of the data compression theory. A recent result is the Flexible Parsing algorithm of Matias and Sahinalp (1999) that works when the dictionary is prefix closed and the encoding of dictionary pointers has a constant cost. We present the Dictionary-Symbolwise Flexible Parsing algorithm that is optimal for prefix-closed dictionaries and any symbolwise compressor under some natural hypothesis. In the case of LZ78-like algorithms with variable costs and any, linear as usual, symbolwise compressor we show how to implement our parsing algorithm in linear time. In the case of LZ77-like dictionaries and any symbol…
Enumeration of L-convex polyominoes by rows and columns
In this paper, we consider the class of L-convex polyominoes, i.e. the convex polyominoes in which any two cells can be connected by a path of cells in the polyomino that switches direction between the vertical and the horizontal at most once.Using the ECO method, we prove that the number fn of L-convex polyominoes with perimeter 2(n + 2) satisfies the rational recurrence relation fn = 4fn-1 - 2fn-2, with f0 = 1, f1 = 2, f2 = 7. Moreover, we give a combinatorial interpretation of this statement. In the last section, we present some open problems.
A reconstruction algorithm for L-convex polyominoes
AbstractWe give an algorithm that uniquely reconstruct an L-convex polyomino from the size of some special paths, called bordered L-paths.
Suffixes, Conjugates and Lyndon Words
In this paper we are interested in the study of the combinatorial aspects connecting three important constructions in the field of string algorithms: the suffix array, the Burrows-Wheeler transform (BWT) and the extended Burrows-Wheeler transform (EBWT). Such constructions involve the notions of suffixes and conjugates of words and are based on two different order relations, denoted by $\plex$ and $\pom$, that, even if strictly connected, are quite different from the computational point of view. In this study an important role is played by Lyndon words. In particular, we improve the upper bound on the number of symbol comparisons needed to establish the $\pom$ order between two primitive wo…
A note on renewal systems
Abstract A renewal system is a symbolic dynamical system generated by free concatenations of a finite set of words. In this paper we prove that, given two systems which are both renewal and Markov systems, it is decidable whether they are topologically conjugate. The proof makes use of the methods and the techniques of formal language theory.
A NEW COMPLEXITY FUNCTION FOR WORDS BASED ON PERIODICITY
Motivated by the extension of the critical factorization theorem to infinite words, we study the (local) periodicity function, i.e. the function that, for any position in a word, gives the size of the shortest square centered in that position. We prove that this function characterizes any binary word up to exchange of letters. We then introduce a new complexity function for words (the periodicity complexity) that, for any position in the word, gives the average value of the periodicity function up to that position. The new complexity function is independent from the other commonly used complexity measures as, for instance, the factor complexity. Indeed, whereas any infinite word with bound…
From First Principles to the Burrows and Wheeler Transform and Beyond, via Combinatorial Optimization
AbstractWe introduce a combinatorial optimization framework that naturally induces a class of optimal word permutations with respect to a suitably defined cost function taking into account various measures of relatedness between words. The Burrows and Wheeler transform (bwt) (cf. [M. Burrows, D. Wheeler, A block sorting lossless data compression algorithm, Technical Report 124, Digital Equipment Corporation, 1994]), and its analog for labelled trees (cf. [P. Ferragina, F. Luccio, G. Manzini, S. Muthukrishnan, Structuring labeled trees for optimal succinctness, and beyond, in: Proc. of the 45th Annual IEEE Symposium on Foundations of Computer Science, 2005, pp. 198–207]), are special cases i…
L-convex Polyominoes: A Survey
Varieties of Codes and Kraft Inequality
Decipherability conditions for codes are investigated by using the approach of Guzman, who introduced in [7] the notion of variety of codes and established a connection between classes of codes and varieties of monoids. The class of Uniquely Decipherable (UD) codes is a special case of variety of codes, corresponding to the variety of all monoids. It is well known that the Kraft inequality is a necessary condition for UD codes, but it is not sufficient, in the sense that there exist codes that are not UD and that satisfy the Kraft inequality. The main result of the present paper states that, given a variety V of codes, if all the elements of V satisfy the Kraft inequality, then V is the var…
The expressive power of the shuffle product
International audience; There is an increasing interest in the shuffle product on formal languages, mainly because it is a standard tool for modeling process algebras. It still remains a mysterious operation on regular languages.Antonio Restivo proposed as a challenge to characterize the smallest class of languages containing the singletons and closed under Boolean operations, product and shuffle. This problem is still widely open, but we present some partial results on it. We also study some other smaller classes, including the smallest class containing the languages composed of a single word of length 2 which is closed under Boolean operations and shuffle by a letter (resp. shuffle by a l…
Minimal forbidden words and factor automata
International audience; Let L(M) be the (factorial) language avoiding a given antifactorial language M. We design an automaton accepting L(M) and built from the language M. The construction is eff ective if M is finite. If M is the set of minimal forbidden words of a single word v, the automaton turns out to be the factor automaton of v (the minimal automaton accepting the set of factors of v). We also give an algorithm that builds the trie of M from the factor automaton of a single word. It yields a non-trivial upper bound on the number of minimal forbidden words of a word.
"Indexing structures for approximate string matching
In this paper we give the first, to our knowledge, structures and corresponding algorithms for approximate indexing, by considering the Hamming distance, having the following properties. i) Their size is linear times a polylog of the size of the text on average. ii) For each pattern x, the time spent by our algorithms for finding the list occ(x) of all occurrences of a pattern x in the text, up to a certain distance, is proportional on average to |x| + |occ(x)|, under an additional but realistic hypothesis.
On the Shuffle of Star-Free Languages
Motivated by the general problem to characterize families of languages closed under shuffle, we investigate some conditions under which the shuffle of two star-free languages is star-free. Some of the special cases here approached give rise to new problems in combinatorics on words.
Block Sorting-Based Transformations on Words: Beyond the Magic BWT
The Burrows-Wheeler Transform (BWT) is a word transformation introduced in 1994 for Data Compression and later results have contributed to make it a fundamental tool for the design of self-indexing compressed data structures. The Alternating Burrows-Wheeler Transform (ABWT) is a more recent transformation, studied in the context of Combinatorics on Words, that works in a similar way, using an alternating lexicographical order instead of the usual one. In this paper we study a more general class of block sorting-based transformations. The transformations in this new class prove to be interesting combinatorial tools that offer new research perspectives. In particular, we show that all the tra…
Nondeterministic Moore automata and Brzozowski's minimization algorithm
AbstractMoore automata represent a model that has many applications. In this paper we define a notion of coherent nondeterministic Moore automaton (NMA) and show that such a model has the same computational power of the classical deterministic Moore automaton. We consider also the problem of constructing the minimal deterministic Moore automaton equivalent to a given NMA. We propose an algorithm that is a variant of Brzozowski’s minimization algorithm in the sense that it is essentially structured as reverse operation and subset construction performed twice. Moreover, we explore more general classes of NMA and analyze the applicability of the algorithm. For some of such classes the algorith…
Balancing and clustering of words in the Burrows–Wheeler transform
AbstractCompression algorithms based on Burrows–Wheeler transform (BWT) take advantage of the fact that the word output of BWT shows a local similarity and then turns out to be highly compressible. The aim of the present paper is to study such “clustering effect” by using notions and methods from Combinatorics on Words.The notion of balance of a word plays a central role in our investigation. Empirical observations suggest that balance is actually the combinatorial property of input word that ensure optimal BWT compression. Moreover, it is reasonable to assume that the more balanced the input word is, the more local similarity we have after BWT (and therefore the better the compression is).…
Distance measures for biological sequences: Some recent approaches
AbstractSequence comparison has become a very essential tool in modern molecular biology. In fact, in biomolecular sequences high similarity usually implies significant functional or structural similarity. Traditional approaches use techniques that are based on sequence alignment able to measure character level differences. However, the recent developments of whole genome sequencing technology give rise to need of similarity measures able to capture the rearrangements involving large segments contained in the sequences. This paper is devoted to illustrate different methods recently introduced for the alignment-free comparison of biological sequences. Goal of the paper is both to highlight t…
On extremal cases of Hopcroft’s algorithm
AbstractIn this paper we consider the problem of minimization of deterministic finite automata (DFA) with reference to Hopcroft’s algorithm. Hopcroft’s algorithm has several degrees of freedom, so there can exist different executions that can lead to different sequences of refinements of the set of the states up to the final partition. We find an infinite family of binary automata for which such a process is unique, whatever strategy is chosen. Some recent papers (cf. Berstel and Carton (2004) [3], Castiglione et al. (2008) [6] and Berstel et al. (2009) [1]) have been devoted to find families of automata for which Hopcroft’s algorithm has its worst execution time. They are unary automata as…
Reconstruction of L-convex Polyominoes.
Abstract We introduce the family of L-convex polyominoes, a subset of convex polyominoes whose elements satisfy a special convexity property. We develop an algorithm that reconstructs an L-convex polyomino from the set of its maximal L-polyominoes.
Combinatorial aspects of L-convex polyominoes
We consider the class of L-convex polyominoes, i.e. those polyominoes in which any two cells can be connected with an ''L'' shaped path in one of its four cyclic orientations. The paper proves bijectively that the number f"n of L-convex polyominoes with perimeter 2(n+2) satisfies the linear recurrence relation f"n"+"2=4f"n"+"1-2f"n, by first establishing a recurrence of the same form for the cardinality of the ''2-compositions'' of a natural number n, a simple generalization of the ordinary compositions of n. Then, such 2-compositions are studied and bijectively related to certain words of a regular language over four letters which is in turn bijectively related to L-convex polyominoes. In …
Tomographical aspects of L-convex polyominoes
Cancellation, pumping and permutation in formal languages
A graph theoretic approach to automata minimality
AbstractThe paper presents a graph-theoretic approach to test the minimality of a deterministic automaton. In particular, we focus on problems concerning the dependence of the minimality of an automaton on the choice of the set F of final states or on the cardinality of the set F. We introduce different minimality conditions of an automaton and show that such conditions can be characterized in graph-theoretic terms.
A Periodicity Theorem on Words and Applications
We prove a periodicity theorem on words that has strong analogies with the Critical Factorization theorem and we show three applications of it.
A Tomographical Characterization of L-convex Polyominoes
Our main purpose is to characterize the class of L-convex polyominoes introduced in [3] by means of their horizontal and vertical projections. The achieved results allow an answer to one of the most relevant questions in tomography i.e. the uniqueness of discrete sets, with respect to their horizontal and vertical projections. In this paper, by giving a characterization of L-convex polyominoes, we investigate the connection between uniqueness property and unimodality of vectors of horizontal and vertical projections. In the last section we consider the continuum environment; we extend the definition of L-convex set, and we obtain some results analogous to those for the discrete case.
Fragment assembly through minimal forbidden words
Unambiguous recognizable two-dimensional languages
We consider the family UREC of unambiguous recognizable two-dimensional languages. We prove that there are recognizable languages that are inherently ambiguous, that is UREC family is a proper subclass of REC family. The result is obtained by showing a necessary condition for unambiguous recognizable languages. Further UREC family coincides with the class of picture languages defined by unambiguous 2OTA and it strictly contains its deterministic counterpart. Some closure and non-closure properties of UREC are presented. Finally we show that it is undecidable whether a given tiling system is unambiguous.
Extremal minimality conditions on automata
AbstractIn this paper we investigate the minimality problem of DFAs by varying the set of final states. In other words, we are interested on how the choice of the final states can affect the minimality of the automata. The state-pair graph is a useful tool to investigate such a problem. The choice of a set of final states for the automaton A defines a coloring of the closed components of the state-pair graph and the minimality of A corresponds to a property of these colored components. A particular attention is devoted to the analysis of some extremal cases such as, for example, the automata that are minimal for any choice of the subset of final states F from the state set Q of the automato…
Burrows-Wheeler transform and palindromic richness
AbstractThe investigation of the extremal case of the Burrows–Wheeler transform leads to study the words w over an ordered alphabet A={a1,a2,…,ak}, with a1<a2<⋯<ak, such that bwt(w) is of the form aknkak−1nk−1⋯a2n2a1n1, for some non-negative integers n1,n2,…,nk. A characterization of these words in the case |A|=2 has been given in [Sabrina Mantaci, Antonio Restivo, Marinella Sciortino, Burrows-Wheeler transform and Sturmian words, Information Processing Letters 86 (2003) 241–246], where it is proved that they correspond to the powers of conjugates of standard words. The case |A|=3 has been settled in [Jamie Simpson, Simon J. Puglisi, Words with simple Burrows-Wheeler transforms, Electronic …
Permutation properties and the fibonacci semigroup
Overlap free words on two symbols
BALANCE PROPERTIES AND DISTRIBUTION OF SQUARES IN CIRCULAR WORDS
We study balance properties of circular words over alphabets of size greater than two. We give some new characterizations of balanced words connected to the Kawasaki-Ising model and to the notion of derivative of a word. Moreover we consider two different generalizations of the notion of balance, and we find some relations between them. Some of our results can be generalized to non periodic infinite words as well.
On the trace product and some families of languages closed under partial commutation
Hopcroft's algorithm and tree-like automata
Minimizing a deterministic finite automata (DFA) is a very important problem in theory of automata and formal languages. Hopcroft's algorithm represents the fastest known solution to the such a problem. In this paper we analyze the behavior of this algorithm on a family binary automata, called tree-like automata, associated to binary labeled trees constructed by words. We prove that all the executions of the algorithm on tree-like automata associated to trees, constructed by standard words, have running time with the same asymptotic growth rate. In particular, we provide a lower and upper bound for the running time of the algorithm expressed in terms of combinatorial properties of the trees…
Coding Partitions: Regularity, Maximality and Global Ambiguity
The canonical coding partition of a set of words is the finest partition such that the words contained in at least two factorizations of a same sequence belong to a same class. In the case the set is not uniquely decipherable, it partitions the set into one unambiguous class and other parts that localize the ambiguities in the factorizations of finite sequences. We firstly prove that the canonical coding partition of a regular set contains a finite number of regular classes. We give an algorithm for computing this partition. We then investigate maximality conditions in a coding partition and we prove, in the regular case, the equivalence between two different notions of maximality. As an ap…
Some Investigations on Similarity Measures Based on Absent Words
In this paper we investigate similarity measures based on minimal absent words, introduced by Chairungsee and Crochemore in [1]. They make use of a length-weighted index on a sample set corresponding to the symmetric difference M(x)ΔM(y) of the minimal absent words M(x) and M(y) of two sequences x and y, respectively. We first propose a variant of this measure by choosing as a sample set a proper subset (x, y) of M(x)ΔM(y), which appears to be more appropriate for distinguishing x and y. From the algebraic point of view, we prove that (x, y) is the base of the ideal generated by M(x)ΔM(y). We then remark that such measures are able to recognize whether the sequences x and y share a common s…
On Extremal Cases of the Hopcroft's Algorithm
In this paper we consider the problem of minimization of deterministic finite automata (DFA) with reference to Hopcroft’s algorithm. Hopcroft’s algorithm has several degrees of freedom, so there can exist different executions that can lead to different sequences of refinements of the set of the states up to the final partition. We find an infinite family of binary automata for which such a process is unique, whatever strategy is chosen. Some recent papers (cf. Berstel and Carton (2004) [3], Castiglione et al. (2008) [6] and Berstel et al. (2009) [1]) have been devoted to find families of automata for which Hopcroft’s algorithm has its worst execution time. They are unary automata associated…
TWO-DIMENSIONAL FINITE STATE RECOGNIZABILITY
The purpose of this paper is to investigate about a new notion of finite state recognizability for two-dimensional (picture) languages. This notion takes as starting point the characterization of one-dimensional recognizable languages in terms of local languages and projections. Such notion can be extended in a natural way to the two-dimensional case. We first introduce a notion of local picture language and then we define,a recognizable picture language as a projection of a local picture language. The family of recognizable picture languages is denoted by REC. We study some combinatorial and language-theoretic properties of family REC. In particular we prove some closure properties with re…
Codes and automata
Hopcroft’s Algorithm and Cyclic Automata
Minimization of deterministic finite automata is a largely studied problem of the Theory of Automata and Formal Languages. It consists in finding the unique (up to isomorphism) minimal deterministic automaton recognizing a set of words. The first approaches to this topic can be traced back to the 1950’s with the works of Huffman and Moore (cf. [12,15]). Over the years several methods to solve this problem have been proposed but the most efficient algorithm in the worst case was given by Hopcroft in [11]. Such an algorithm computes in O(n log n) the minimal automaton equivalent to a given automaton with n states. The Hopcroft’s algorithm has been widely studied, described and implemented by …
On languages factorizing the free monoid
A language X⊂A* is called factorizing if there exists a language Y⊂A* such that XY = A* This work was partially supported by ESPRIT-EBRA project ASMICS contact 6317 and project 40% MURST “Algoritmi, Modelli di Calcolo e Strutture Informative”. and the product is unambiguous. First we give a combinatorial characterization of factorizing languages. Further we prove that it is decidable whether a regular language X is factorizing and we construct an automaton recognizing the corresponding language Y. For finite languages we show that it suffices to consider words of bounded length. A complete characterization of factorizing languages with three words and explicit regular expression for the co…
Patterns in words and languages
AbstractA word p, over the alphabet of variables E, is a pattern of a word w over A if there exists a non-erasing morphism h from E∗ to A∗ such that h(p)=w. If we take E=A, given two words u,v∈A∗, we write u⩽v if u is a pattern of v. The restriction of ⩽ to aA∗, where A is the binary alphabet {a,b}, is a partial order relation. We introduce, given a word v, the set P(v) of all words u such that u⩽v. P(v), with the relation ⩽, is a poset and it is called the pattern poset of v. The first part of the paper is devoted to investigate the relationships between the structure of the poset P(v) and the combinatorial properties of the word v. In the last section, for a given language L, we consider …
Forbidden words in symbolic dynamics
AbstractWe introduce an equivalence relation≃between functions from N to N. By describing a symbolic dynamical system in terms of forbidden words, we prove that the≃-equivalence class of the function that counts the minimal forbidden words of a system is a topological invariant of the system. We show that the new invariant is independent from previous ones, but it is not characteristic. In the case of sofic systems, we prove that the≃-equivalence of the corresponding functions is a decidable question. As a more special application, we show, by using the new invariant, that two systems associated to Sturmian words having “different slope” are not conjugate.
Some Remarks on Automata Minimality
It is well known that the minimization problem of deterministic finite automata (DFAs) is related to the indistinguishability notion of states (cf. [HMU00]). Indeed, a well known technique to minimize a DFA, essentially, consists in finding pairs of states that are equivalent (or indistinguishable), namely pairs of states (p,q) such that it is impossible to assert the difference between p and q only by starting in each of the two states and asking whether or not a given input string leads to a final state. Since, in the testing states equivalence, the notion of initial state is irrelevant, some of the main techniques for the minimization of automata, such as Moore’s algorithm [Moo56] and Ho…
An extension of the Burrows-Wheeler Transform to k words
Words and forbidden factors
AbstractGiven a finite or infinite word v, we consider the set M(v) of minimal forbidden factors of v. We show that the set M(v) is of fundamental importance in determining the structure of the word v. In the case of a finite word w we consider two parameters that are related to the size of M(w): the first counts the minimal forbidden factors of w and the second gives the length of the longest minimal forbidden factor of w. We derive sharp upper and lower bounds for both parameters. We prove also that the second parameter is related to the minimal period of the word w. We are further interested to the algorithmic point of view. Indeed, we design linear time algorithm for the following two p…
RECOGNIZABLE PICTURE LANGUAGES
The purpose of this paper is to propose a new notion of recognizability for picture (two-dimensional) languages extending the characterization of one-dimensional recognizable languages in terms of local languages and alphabetic mappings. We first introduce the family of local picture languages (denoted by LOC) and, in particular, prove the undecidability of the emptiness problem. Then we define the new family of recognizable picture languages (denoted by REC). We study some combinatorial and language theoretic properties of REC such as ambiguity, closure properties or undecidability results. Finally we compare the family REC with the classical families of languages recognized by four-way a…
An extension of the Burrows-Wheeler Transform
AbstractWe describe and highlight a generalization of the Burrows–Wheeler Transform (bwt) to a multiset of words. The extended transformation, denoted by ebwt, is reversible. Moreover, it allows to define a bijection between the words over a finite alphabet A and the finite multisets of conjugacy classes of primitive words in A∗. Besides its mathematical interest, the extended transform can be useful for applications in the context of string processing. In the last part of this paper we illustrate one such application, providing a similarity measure between sequences based on ebwt.
Lecture Notes in Computer Science - Proc. of DLT 2005
On the lattice of prefix codes
AbstractThe natural correspondence between prefix codes and trees is explored, generalizing the results obtained in Giammarresi et al. (Theoret. Comput. Sci. 205 (1998) 1459) for the lattice of finite trees under division and the lattice of finite maximal prefix codes. Joins and meets of prefix codes are studied in this light in connection with such concepts as finiteness, maximality and varieties of rational languages. Decidability results are obtained for several problems involving rational prefix codes, including the solution to the primeness problem.
Regular coding partitions
The Burrows-Wheeler Transform: from data compression to combinatorics on words
Suffix array and Lyndon factorization of a text
Abstract The main goal of this paper is to highlight the relationship between the suffix array of a text and its Lyndon factorization. It is proved in [15] that one can obtain the Lyndon factorization of a text from its suffix array. Conversely, here we show a new method for constructing the suffix array of a text that takes advantage of its Lyndon factorization. The surprising consequence of our results is that, in order to construct the suffix array, the local suffixes inside each Lyndon factor can be separately processed, allowing different implementative scenarios, such as online, external and internal memory, or parallel implementations. Based on our results, the algorithm that we prop…
Combinatorics on Words
Highmann's Theorem on Discrete Sets
In this paper we investigate properties of different classes of discrete sets with respect to the partial-order of subpicture. In particular we take in consideration the classes of convex polyominoes and L-convex polyominoes. In the first part of the paper we study closure properties of these classes with respect the order and we give a new characterization of L-convex polyominoes. In the second part we pose the question to extend Higman’s theoremto discrete sets. We give a negative answer in the general case and we prove that the set of L-convex polyominoes is well-partially-ordered by using a representation of L-convex polyominoes in terms of words of a regular language.
Some decision results for recognizable sets in arbitrary monoids
Automata with Extremal Minimality Conditions
It is well known that the minimality of a deterministic finite automaton (DFA) depends on the set of final states. In this paper we study the minimality of a strongly connected DFA by varying the set of final states. We consider, in particular, some extremal cases. A strongly connected DFA is called uniformly minimal if it is minimal, for any choice of the set of final states. It is called never-minimal if it is not minimal, for any choice of the set of final states. We show that there exists an infinite family of uniformly minimal automata and that there exists an infinite family of never-minimal automata. Some properties of these automata are investigated and, in particular, we consider t…
Binary Patterns in Infinite Binary Words
In this paper we study the set P(w) of binary patterns that can occur in one infinite binary word w, comparing it with the set F(w) of factors of the word. Since the set P(w) can be considered as an extension of the set F(w), we first investigate how large is such extension, by introducing the parameter ?(w) that corresponds to the cardinality of the difference set P(w) \ F(w). Some non trivial results about such parameter are obtained in the case of the Thue-Morse and the Fibonacci words. Since, in most cases, the parameter ?(w) is infinite, we introduce the pattern complexity of w, which corresponds to the complexity of the language P(w). As a main result, we prove that there exist infini…
On fixed points of the Burrows-Wheeler transform
The Burrows-Wheeler Transform is a well known transformation widely used in Data Compression: important competitive compression software, such as Bzip (cf. [1]) and Szip (cf. [2]) and some indexing software, like the FM-index (cf. [3]), are deeply based on the Burrows Wheeler Transform. The main advantage of using BWT for data compression consists in its feature of "clustering" together equal characters. In this paper we show the existence of fixed points of BWT, i.e., words on which BWT has no effect. We show a characterization of the permutations associated to BWT of fixed points and we give the explicit form of fixed points on a binary ordered alphabet a, b having at most four b's and th…
A new sequence distance measure based on the Burrows-Wheeler Transform
Burrows-Wheeler transform and Run-Length Enconding
In this paper we study the clustering effect of the Burrows-Wheeler Transform (BWT) from a combinatorial viewpoint. In particular, given a word w we define the BWT-clustering ratio of w as the ratio between the number of clusters produced by BWT and the number of the clusters of w. The number of clusters of a word is measured by its Run-Length Encoding. We show that the BWT-clustering ratio ranges in ]0, 2]. Moreover, given a rational number \(r\,\in \,]0,2]\), it is possible to find infinitely many words having BWT-clustering ratio equal to r. Finally, we show how the words can be classified according to their BWT-clustering ratio. The behavior of such a parameter is studied for very well-…
Matrix-based complexity functions and recognizable picture languages
Varieties of Codes and Kraft Inequality
Decipherability conditions for codes are investigated by using the approach of Guzman, who introduced in [7] the notion of variety of codes and established a connection between classes of codes and varieties of monoids. The class of Uniquely Decipherable (UD) codes is a special case of variety of codes, corresponding to the variety of all monoids. It is well known that the Kraft inequality is a necessary condition for UD codes, but it is not sufficient, in the sense that there exist codes that are not UD and that satisfy the Kraft inequality. The main result of the present paper states that, given a variety $\mathcal{V}$ of codes, if all the elements of $\mathcal{V}$ satisfy the Kraft inequ…
On Extremal Cases of Hopcroft’s Algorithm
In this paper we consider the problem of minimization of deterministic finite automata (DFA) with reference to Hopcroft’s algorithm. Hopcroft’s algorithm has several degrees of freedom, so there can exist different sequences of refinements of the set of the states that lead to the final partition. We find an infinite family of binary automata for which such a process is unique. Some recent papers (cf. [3,7,1]) have been devoted to find families of automata for which Hopcroft’s algorithm has its worst execution time. They are unary automata associated to circular words. However, automata minimization can be achieved also in linear time when the alphabet has only one letter (cf. [14]), so in …
On Fine and Wilf's theorem for bidimensional words
AbstractGeneralizations of Fine and Wilf's Periodicity Theorem are obtained for the case of bidimensional words using geometric arguments. The domains considered constitute a large class of convex subsets of R2 which include most parallelograms. A complete discussion is provided for the parallelogram case.
CODING PARTITIONS OF REGULAR SETS
A coding partition of a set of words partitions this set into classes such that whenever a sequence, of minimal length, has two distinct factorizations, the words of these factorizations belong to the same class. The canonical coding partition is the finest coding partition that partitions the set of words in at most one unambiguous class and other classes that localize the ambiguities in the factorizations of finite sequences. We prove that the canonical coding partition of a regular set contains a finite number of regular classes and we give an algorithm for computing this partition. From this we derive a canonical decomposition of a regular monoid into a free product of finitely many re…
On Sets of Words of Rank Two
Given a (finite or infinite) subset X of the free monoid A∗ over a finite alphabet A, the rank of X is the minimal cardinality of a set F such that X⊆ F∗. A submonoid M generated by k elements of A∗ is k-maximal if there does not exist another submonoid generated by at most k words containing M. We call a set X⊆ A∗ primitive if it is the basis of a |X|-maximal submonoid. This extends the notion of primitive word: indeed, w is a primitive set if and only if w is a primitive word. By definition, for any set X, there exists a primitive set Y such that X⊆ Y∗. The set Y is therefore called a primitive root of X. As a main result, we prove that if a set has rank 2, then it has a unique primitive …
A note on Sturmian words
International audience; We describe an algorithm which, given a factor of a Sturmian word, computes the next factor of the same length in the lexicographic order in linear time. It is based on a combinatorial property of Sturmian words which is related with the Burrows-Wheeler transformation.
Coding Partitions
Motivated by the study of decipherability conditions for codes weaker than Unique Decipherability (UD), we introduce the notion of coding partition. Such a notion generalizes that of UD code and, for codes that are not UD, allows to recover the ''unique decipherability" at the level of the classes of the partition. By tacking into account the natural order between the partitions, we define the characteristic partition of a code X as the finest coding partition of X. This leads to introduce the canonical decomposition of a code in at most one unambiguous component and other (if any) totally ambiguous components. In the case the code is finite, we give an algorithm for computing its canonical…
On the product of balanced sequences
The product w = u ⊗ v of two sequences u and v is a naturally defined sequence on the alphabet of pairs of symbols. Here, we study when the product w of two balanced sequences u,v is balanced too. In the case u and v are binary sequences, we prove, as a main result, that, if such a product w is balanced and deg ( w ) = 4, then w is an ultimately periodic sequence of a very special form. The case of arbitrary alphabets is approached in the last section. The partial results obtained and the problems proposed show the interest of the notion of product in the study of balanced sequences.
Word assembly through minimal forbidden words
AbstractWe give a linear-time algorithm to reconstruct a finite word w over a finite alphabet A of constant size starting from a finite set of factors of w verifying a suitable hypothesis. We use combinatorics techniques based on the minimal forbidden words, which have been introduced in previous papers. This improves a previous algorithm which worked under the assumption of stronger hypothesis.
SORTING CONJUGATES AND SUFFIXES OF WORDS IN A MULTISET
In this paper we are interested in the study of the combinatorial aspects related to the extension of the Burrows-Wheeler transform to a multiset of words. Such study involves the notion of suffixes and conjugates of words and is based on two different order relations, denoted by <lex and ≺ω, that, even if strictly connected, are quite different from the computational point of view. In particular, we introduce a method that only uses the <lex sorting among suffixes of a multiset of words in order to sort their conjugates according to ≺ω-order. In this study an important role is played by Lyndon words. This strategy could be used in applications specially in the field of Bioinformatic…
Minimal forbidden words and symbolic dynamics
We introduce a new complexity measure of a factorial formal language L: the growth rate of the set of minimal forbidden words. We prove some combinatorial properties of minimal forbidden words. As main result we prove that the growth rate of the set of minimal forbidden words for L is a topological invariant of the dynamical system defined by L.
The Burrows-Wheeler Transform: a new tool in Combinatorics on Words
On aperiodic trace languages
On the decomposition of prefix codes
Abstract In this paper we focus on the decomposition of rational and maximal prefix codes. We present an effective procedure that allows us to decide whether such a code is decomposable. In this case, the procedure also produces the factors of some of its decompositions. We also give partial results on the problem of deciding whether a rational maximal prefix code decomposes over a finite prefix code.
Nondeterministic Moore Automata and Brzozowski’s Algorithm
Moore automata represent a model that has many applications. In this paper we define a notion of coherent nondeterministic Moore automaton (NMA) and show that such a model has the same computational power of the classical deterministic Moore automaton. We consider also the problem of constructing the minimal deterministic Moore automaton equivalent to a given NMA. In this paper we propose an algorithm that is a variant of Brzozowski's algorithm in the sense that it is essentially structured as reverse operation and subset construction performed twice.
Some Decision Results on Nonrepetitive Words
The paper addresses some generalizations of the Thue Problem such as: given a word u, does there exist an infinite nonrepetitive overlap free (or square free) word having u as a prefix? A solution to this as well as to related problems is given for the case of overlap free words on a binary alphabet.
Anti-powers in infinite words
In combinatorics of words, a concatenation of $k$ consecutive equal blocks is called a power of order $k$. In this paper we take a different point of view and define an anti-power of order $k$ as a concatenation of $k$ consecutive pairwise distinct blocks of the same length. As a main result, we show that every infinite word contains powers of any order or anti-powers of any order. That is, the existence of powers or anti-powers is an unavoidable regularity. Indeed, we prove a stronger result, which relates the density of anti-powers to the existence of a factor that occurs with arbitrary exponent. As a consequence, we show that in every aperiodic uniformly recurrent word, anti-powers of ev…
A combinatorial view on string attractors
Abstract The notion of string attractor has recently been introduced in [Prezza, 2017] and studied in [Kempa and Prezza, 2018] to provide a unifying framework for known dictionary-based compressors. A string attractor for a word w = w 1 w 2 ⋯ w n is a subset Γ of the positions { 1 , … , n } , such that all distinct factors of w have an occurrence crossing at least one of the elements of Γ. In this paper we explore the notion of string attractor by focusing on its combinatorial properties. In particular, we show how the size of the smallest string attractor of a word varies when combinatorial operations are applied and we deduce that such a measure is not monotone. Moreover, we introduce a c…
Fine and Wilf's Theorem for Three periods and a Generalization of Sturmian Words
AbstractWe extend the theorem of Fine and Wilf to words having three periods. We then define the set 3-PER of words of maximal length for which such result does not apply. We prove that the set 3-PER and the sequences of complexity 2n + 1, introduced by Arnoux and Rauzy to generalize Sturmian words, have the same set of factors.
Some applications of a theorem of Shirshov to language theory
Some applications of a theorem of Shirshov to language theory are given: characterization of regular languages, characterization of bounded languages, and a sufficient condition for a language to be Parikh-bounded.
A finiteness condition for finitely generated semigroups
Forbidden Factors and Fragment Assembly
In this paper we approach the fragment assembly problem by using the notion of minimal forbidden factors introduced in previous paper. Denoting by M(w) the set of minimal forbidden factors of a word w, we first focus on the evaluation of the size of elements in M(w) and on designing of an algorithm to recover the word w from M(w). Actually we prove that for a word w randomly generated by a memoryless source with identical symbol probabilities, the maximal length m(w) of words in M(w) is logarithmic and that the reconstruction algorithm runs in linear time. These results have an interesting application to the fragment assembly problem, i.e. reconstruct a word w from a given set I of substrin…
Circular words and automata minimization
Balance Properties and Distribution of Squares in Circular Words
We study balance properties of circular words over alphabets of size greater than two. We give some new characterizations of balanced words connected to the Kawasaki-Ising model and to the notion of derivative of a word. Moreover we consider two different generalizations of the notion of balance, and we find some relations between them. Some of our results can be generalised to non periodic infinite words as well.
A bijection between words and multisets of necklaces
Two of the present authors have given in 1993 a bijection Phi between words on a totally ordered alphabet and multisets of primitive necklaces. At the same time and independently, Burrows and Wheeler gave a data compression algorithm which turns out to be a particular case of the inverse of Phi. In the present article, we show that if one replaces in Phi the standard permutation of a word by the co-standard one (reading the word from right to left), then the inverse bijection is computed using the alternate lexicographic order (which is the order of real numbers given by continued fractions) on necklaces, instead of the lexicographic order as for Phi(-1). The image of the new bijection, ins…
Extending formal language hierarchies to higher dimensions
Words
Ambiguity and complementation in recognizable two-dimensional languages
The theory of one-dimensional (word) languages is well founded and investigated since fifties. From several years, the increasing interest for pattern recognition and image processing motivated the research on two-dimensional or picture languages, and nowadays this is a research field of great interest. A first attempt to formalize the concept of finite state recognizability for two-dimensional languages can be attributed to Blum and Hewitt ([7]) who started in 1967 the study of finite state devices that can define two-dimensional languages, with the aim to finding a counterpart of what regular languages are in one dimension. Since then, many approaches have been presented in the literature…
ON THE STAR HEIGHT OF RATIONAL LANGUAGES
Two problems concerning the star height of a rational language are investigated: the star height one problem and the relationships between the unambiguity of an expression and its star height. For this purpose we consider the class of factorial, transitive and rational (FTR) languages. From the algebraic point of view a FTR language is the set of factors of a rational submonoid M. Two subclasses of FTR languages are introduced: renewal languages, corresponding to the case of M finitely generated, and unambiguous renewal languages, corresponding to the case of M finitely generated and free. We prove that a FTR language has star height one if and only if it is renewal. This gives a simple de…
Measuring the clustering effect of BWT via RLE
Abstract The Burrows–Wheeler Transform (BWT) is a reversible transformation on which are based several text compressors and many other tools used in Bioinformatics and Computational Biology. The BWT is not actually a compressor, but a transformation that performs a context-dependent permutation of the letters of the input text that often create runs of equal letters (clusters) longer than the ones in the original text, usually referred to as the “clustering effect” of BWT. In particular, from a combinatorial point of view, great attention has been given to the case in which the BWT produces the fewest number of clusters (cf. [5] , [16] , [21] , [23] ). In this paper we are concerned about t…
String Attractors and Infinite Words
The notion of string attractor has been introduced by Kempa and Prezza (STOC 2018) in the context of Data Compression and it represents a set of positions of a finite word in which all of its factors can be “attracted”. The smallest size γ∗ of a string attractor for a finite word is a lower bound for several repetitiveness measures associated with the most common compression schemes, including BWT-based and LZ-based compressors. The combinatorial properties of the measure γ∗ have been studied in [Mantaci et al., TCS 2021]. Very recently, a complexity measure, called string attractor profile function, has been introduced for infinite words, by evaluating γ∗ on each prefix. Such a measure has…
Circular sturmian words and Hopcroft’s algorithm
AbstractIn order to analyze some extremal cases of Hopcroft’s algorithm, we investigate the relationships between the combinatorial properties of a circular sturmian word (x) and the run of the algorithm on the cyclic automaton Ax associated to (x). The combinatorial properties of words taken into account make use of sturmian morphisms and give rise to the notion of reduction tree of a circular sturmian word. We prove that the shape of this tree uniquely characterizes the word itself. The properties of the run of Hopcroft’s algorithm are expressed in terms of the derivation tree of the automaton, which is a tree that represents the refinement process that, in the execution of Hopcroft’s alg…
Monadic Second-Order Logic over Rectangular Pictures and Recognizability by Tiling Systems
Abstract It is shown that a set of pictures (rectangular arrays of symbols) is recognized by a finite tiling system iff it is definable in existential monadic second-order logic. As a consequence, finite tiling systems constitute a notion of recognizability over two-dimensional inputs which at the same time generalizes finite-state recognizability over strings and also matches a natural logic. The proof is based on the Ehrenfeucht–Fraisse technique for first-order logic and an implementation of “threshold counting” within tiling systems.
Ordering and Convex Polyominoes
We introduce a partial order on pictures (matrices), denoted by ≼ that extends to two dimensions the subword ordering on words. We investigate properties of special families of discrete sets (corresponding to {0,1}-matrices) with respect to this partial order. In particular we consider the families of polyominoes and convex polyominoes and the family, recently introduced by the authors, of L-convex polyominoes. In the first part of the paper we study the closure properties of such families with respect to the order. In particular we obtain a new characterization of L-convex polyominoes: a discrete set P is a L-convex polyomino if and only if all the elements Q≼P are polyominoes. In the seco…
Minimal forbidden factors of circular words
Minimal forbidden factors are a useful tool for investigating properties of words and languages. Two factorial languages are distinct if and only if they have different (antifactorial) sets of minimal forbidden factors. There exist algorithms for computing the minimal forbidden factors of a word, as well as of a regular factorial language. Conversely, Crochemore et al. [IPL, 1998] gave an algorithm that, given the trie recognizing a finite antifactorial language $M$, computes a DFA recognizing the language whose set of minimal forbidden factors is $M$. In the same paper, they showed that the obtained DFA is minimal if the input trie recognizes the minimal forbidden factors of a single word.…
Characteristic Sturmian words are extremal for the Critical Factorization Theorem
We prove that characteristic Sturmian words are extremal for the Critical Factorization Theorem (CFT) in the following sense. If p x ( n ) denotes the local period of an infinite word x at point n , we prove that x is a characteristic Sturmian word if and only if p x ( n ) is smaller than or equal to n + 1 for all n ≥ 1 and it is equal to n + 1 for infinitely many integers n . This result is extremal with respect to the \{CFT\} since a consequence of the \{CFT\} is that, for any infinite recurrent word x, either the function p x is bounded, and in such a case x is periodic, or p x ( n ) ≥ n + 1 for infinitely many integers n . As a byproduct of the techniques used in the paper we extend a r…
String attractors and combinatorics on words
The notion of \emph{string attractor} has recently been introduced in [Prezza, 2017] and studied in [Kempa and Prezza, 2018] to provide a unifying framework for known dictionary-based compressors. A string attractor for a word $w=w[1]w[2]\cdots w[n]$ is a subset $\Gamma$ of the positions $\{1,\ldots,n\}$, such that all distinct factors of $w$ have an occurrence crossing at least one of the elements of $\Gamma$. While finding the smallest string attractor for a word is a NP-complete problem, it has been proved in [Kempa and Prezza, 2018] that dictionary compressors can be interpreted as algorithms approximating the smallest string attractor for a given word. In this paper we explore the noti…
A synchronization property of pure subsemigroups of a free semigroup
On generalized Lyndon words
Abstract A generalized lexicographical order on infinite words is defined by choosing for each position a total order on the alphabet. This allows to define generalized Lyndon words. Every word in the free monoid can be factorized in a unique way as a nonincreasing factorization of generalized Lyndon words. We give new characterizations of the first and the last factor in this factorization as well as new characterization of generalized Lyndon words. We also give more specific results on two special cases: the classical one and the one arising from the alternating lexicographical order.
Star-free trace languages
Abstract Generalizing a classical result of Schutzenberger to free partially commutative monoids, we prove that the family of star-free trace languages coincides with the family of aperiodic trace languages.
Periodicity vectors for labelled trees
AbstractThe concept of a periodicity vector is introduced in the context of labelled trees, and some new periodicity theorems are obtained. These results constitute generalizations of the classical periodicity theorem of Fine and Wilf for words. The concept of a tree congruence is also generalized and the isomorphism between the lattice of tree congruences and the lattice of unlabelled trees (prefix codes) is established.
Words and Patterns
In this paper some new ideas, problems and results on patterns are proposed. In particular, motivated by questions concerning avoidability, we first study the set of binary patterns that can occur in one infinite binary word, comparing it with the set of factors of the word. This suggests a classification of infinite words in terms of the "difference" between the set of its patterns and the set of its factors. The fact that each factor in an infinite word can give rise to several distinct patterns leads to study the set of patterns of a single finite word. This set, endowed with a natural order relation, defines a poset: we investigate the relationships between the structure of such a poset…
Metodo per la indicizzazione approssimata e il reperimento di sequenze che si ripetono all'interno di sequenze
brevetto
The Alternating BWT: an algorithmic perspective
Abstract The Burrows-Wheeler Transform (BWT) is a word transformation introduced in 1994 for Data Compression. It has become a fundamental tool for designing self-indexing data structures, with important applications in several areas in science and engineering. The Alternating Burrows-Wheeler Transform (ABWT) is another transformation recently introduced in Gessel et al. (2012) [21] and studied in the field of Combinatorics on Words. It is analogous to the BWT, except that it uses an alternating lexicographical order instead of the usual one. Building on results in Giancarlo et al. (2018) [23] , where we have shown that BWT and ABWT are part of a larger class of reversible transformations, …
Minimal Forbidden Factors of Circular Words
Minimal forbidden factors are a useful tool for investigating properties of words and languages. Two factorial languages are distinct if and only if they have different (antifactorial) sets of minimal forbidden factors. There exist algorithms for computing the minimal forbidden factors of a word, as well as of a regular factorial language. Conversely, Crochemore et al.ÃÂ [IPL, 1998] gave an algorithm that, given the trie recognizing a finite antifactorial language M, computes a DFA of the language having M as set of minimal forbidden factors. In the same paper, they showed that the obtained DFA is minimal if the input trie recognizes the minimal forbidden factors of a single word. We gener…
Primitive sets of words
Given a (finite or infinite) subset $X$ of the free monoid $A^*$ over a finite alphabet $A$, the rank of $X$ is the minimal cardinality of a set $F$ such that $X \subseteq F^*$. We say that a submonoid $M$ generated by $k$ elements of $A^*$ is {\em $k$-maximal} if there does not exist another submonoid generated by at most $k$ words containing $M$. We call a set $X \subseteq A^*$ {\em primitive} if it is the basis of a $|X|$-maximal submonoid. This definition encompasses the notion of primitive word -- in fact, $\{w\}$ is a primitive set if and only if $w$ is a primitive word. By definition, for any set $X$, there exists a primitive set $Y$ such that $X \subseteq Y^*$. We therefore call $Y$…
Balanced Words Having Simple Burrows-Wheeler Transform
The investigation of the "clustering effect" of the Burrows-Wheeler transform (BWT) leads to study the words having simple BWT , i.e. words w over an ordered alphabet $A=\{a_1,a_2,\ldots,a_k\}$, with $a_1 < a_2 < \ldots <a_k$, such that $bwt(w)$ is of the form $a_k^{n_k} a_{k-1}^{n_{k-1}} \cdots a_1^{n_1}$, for some non-negative integers $n_1, n_2, \ldots, n_k$. We remark that, in the case of binary alphabets, there is an equivalence between words having simple BWT, the family of (circular) balanced words and the conjugates of standard words. In the case of alphabets of size greater than two, there is no more equivalence between these notions. As a main result of this paper we prove that, u…
Representations of integers and Language theory
Equations on trees
We introduce the notion of equation on trees, generalizing the corresponding notion for words, and we develop the first steps of a theory of tree equations. The main result of the paper states that, if a pair of trees is the solution of a tree equation with two indeterminates, then the two trees are both powers of the same tree. As an application, we show that a tree can be expressed in a unique way as a power of a primitive tree. This extends a basic result of combinatorics on words to trees. Some open problems are finally proposed.
An extension of the Burrows-Wheeler Transform and applications to sequence comparison and data compression
We introduce a generalization of the Burrows-Wheeler Transform (BWT) that can be applied to a multiset of words. The extended transformation, denoted by E, is reversible, but, differently from BWT, it is also surjective. The E transformation allows to give a definition of distance between two sequences, that we apply here to the problem of the whole mitochondrial genome phylogeny. Moreover we give some consideration about compressing a set of words by using the E transformation as preprocessing.
Enumeration of L-convex polyominoes
Hopcroft’s Algorithm and Tree-like Automata
In order to analyze some extremal cases of Hopcroft’s algorithm we deepened the relationship between combinatorial properties of circular words and the ex- ecution of the algorithm on cyclic automata associated to such words.In this paper we highlight the notion of word tree and in particular, we char- acterize the word trees for which Hopcroft’s algorithm on the associated tree-like automata has a unique refinement process. Moreover, we show the relationship between the time complexity of the refinements process of the Hopcroft’s algo- rithm on unary cyclic automata and binary tree-like automata. Such a result allows to exhibit a family of tree-like automata representing the worst case of …
On Balancing of a Direct Product
A direct product of two sequences is a naturally defined sequence on the alphabet of pairs of symbols. By taking inspiration from [Pavel Salimov. On uniform recurrence of a direct product. In AutoMathA, 2009], where the author investigates the case of uniformly recurrent words, here, we study when the product of two balanced sequences on binary alphabet is also balanced.