0000000001185722

AUTHOR

Marinella Sciortino

showing 71 related works from this author

Suffix Automata and Standard Sturmian Words

2007

Blumer et al. showed (cf. [3,2]) that the suffix automaton of a word w must have at least |w|+1 states and at most 2|w|-1 states. In this paper we characterize the language L of all binary words w whose minimal suffix automaton S(w) has exactly |w| + 1 states; they are precisely all prefixes of standard Sturmian words. In particular, we give an explicit construction of suffix automaton of words that are palindromic prefixes of standard words. Moreover, we establish a necessary and sufficient condition on S(w) which ensures that if w ∈ L and a ∈ {0, 1} then wa ∈ L. By using such a condition, we show how to construct the automaton S(wa) from S(w). More generally, we provide a simple construct…

PrefixCombinatoricsSettore INF/01 - InformaticaLevenshtein automatonSimple (abstract algebra)PalindromeSuffix automatonSuffix AutomataArithmeticSuffixWord (group theory)AutomatonMathematics
researchProduct

Burrows–Wheeler transform and Sturmian words

2003

Burrows–Wheeler transformSignal ProcessingFormal languageSturmian wordArithmeticWord (computer architecture)Computer Science ApplicationsInformation SystemsTheoretical Computer ScienceMathematicsInformation Processing Letters
researchProduct

Forbidden Factors and Fragment Assembly

2001

In this paper methods and results related to the notion of minimal forbidden words are applied to the fragment assembly problem. The fragment assembly problem can be formulated, in its simplest form, as follows: reconstruct a word w from a given set I of substrings (fragments ) of a word w . We introduce an hypothesis involving the set of fragments I and the maximal length m(w) of the minimal forbidden factors of w . Such hypothesis allows us to reconstruct uniquely the word w from the set I in linear time. We prove also that, if w is a word randomly generated by a memoryless source with identical symbol probabilities, m(w) is logarithmic with respect to the size of w . This result shows th…

CombinatoricsSet (abstract data type)Fragment (logic)LogarithmDeterministic automatonSymbol (programming)General MathematicsTime complexitySoftwareWord (computer architecture)SubstringComputer Science ApplicationsMathematicsRAIRO - Theoretical Informatics and Applications
researchProduct

Logarithmic Equal-Letter Runs for BWT of Purely Morphic Words

2022

In this paper we study the number r(bwt) of equal-letter runs produced by the Burrows-Wheeler transform (BWT) when it is applied to purely morphic finite words, which are words generated by iterating prolongable morphisms. Such a parameter r(bwt) is very significant since it provides a measure of the performances of the BWT, in terms of both compressibility and indexing. In particular, we prove that, when BWT is applied to whichever purely morphic finite word on a binary alphabet, r(bwt) is O(log n), where n is the length of the word. Moreover, we prove that r(bwt) is Theta(log n) for the binary words generated by a large class of prolongable binary morphisms. These bounds are proved by pro…

Equal-letter runsSettore INF/01 - InformaticaMorphismsBurrows-Wheeler TransformBispecial circular factors
researchProduct

Languages with mismatches

2007

AbstractIn this paper we study some combinatorial properties of a class of languages that represent sets of words occurring in a text S up to some errors. More precisely, we consider sets of words that occur in a text S with k mismatches in any window of size r. The study of this class of languages mainly focuses both on a parameter, called repetition index, and on the set of the minimal forbidden words of the language of factors of S with errors. The repetition index of a string S is defined as the smallest integer such that all strings of this length occur at most in a unique position of the text S up to errors. We prove that there is a strong relation between the repetition index of S an…

Combinatorics on wordsApproximate string matchingGeneral Computer ScienceRepetition (rhetorical device)String (computer science)Search engine indexingComputer Science::Computation and Language (Computational Linguistics and Natural Language and Speech Processing)Approximate string matchingData structureTheoretical Computer ScienceCombinatoricsSet (abstract data type)Formal languagesCombinatorics on words Formal languages Approximate string matching IndexingIndexingWord (group theory)MathematicsInteger (computer science)Computer Science(all)Theoretical Computer Science
researchProduct

A New Combinatorial Approach to Sequence Comparison

2008

In this paper we introduce a new alignment-free method for comparing sequences which is combinatorial by nature and does not use any compressor nor any information-theoretic notion. Such a method is based on an extension of the Burrows-Wheeler Transform, a transformation widely used in the context of Data Compression. The new extended transformation takes as input a multiset of sequences and produces as output a string obtained by a suitable rearrangement of the characters of all the input sequences. By using such a transformation we give a general method for comparing sequences that takes into account how much the characters coming from the different input sequences are mixed in the output…

MultisetTheoretical computer scienceBurrows–Wheeler transformSettore INF/01 - InformaticaComputer scienceBurrows-Wheeler transform; Sequence comparisonString (computer science)Context (language use)Extension (predicate logic)ComparisonInformation theoryGenomeBurrows-Wheeler transform; ComparisonTheoretical Computer ScienceTransformation (function)CategorizationComputational Theory and MathematicsPhylogeneticsSequence comparisonTheory of computationBurrows-Wheeler TransformSequence ComparisonAlgorithmMathematicsData compression
researchProduct

Suffixes, Conjugates and Lyndon Words

2013

In this paper we are interested in the study of the combinatorial aspects connecting three important constructions in the field of string algorithms: the suffix array, the Burrows-Wheeler transform (BWT) and the extended Burrows-Wheeler transform (EBWT). Such constructions involve the notions of suffixes and conjugates of words and are based on two different order relations, denoted by $\plex$ and $\pom$, that, even if strictly connected, are quite different from the computational point of view. In this study an important role is played by Lyndon words. In particular, we improve the upper bound on the number of symbol comparisons needed to establish the $\pom$ order between two primitive wo…

MultisetReduction (recursion theory)BWT; Lyndon factorization; Suffix ArrayString (computer science)Suffix arrayLyndon words Lyndon factorization BWT Suffix array EBWT Circular words ConjugacyLexicographical orderlaw.inventionSuffix ArrayCombinatoricsBWTLyndon factorizationlawOrder (group theory)Symbol (formal)Word (group theory)Mathematics
researchProduct

From First Principles to the Burrows and Wheeler Transform and Beyond, via Combinatorial Optimization

2007

AbstractWe introduce a combinatorial optimization framework that naturally induces a class of optimal word permutations with respect to a suitably defined cost function taking into account various measures of relatedness between words. The Burrows and Wheeler transform (bwt) (cf. [M. Burrows, D. Wheeler, A block sorting lossless data compression algorithm, Technical Report 124, Digital Equipment Corporation, 1994]), and its analog for labelled trees (cf. [P. Ferragina, F. Luccio, G. Manzini, S. Muthukrishnan, Structuring labeled trees for optimal succinctness, and beyond, in: Proc. of the 45th Annual IEEE Symposium on Foundations of Computer Science, 2005, pp. 198–207]), are special cases i…

Lossless compressionBoosting (machine learning)General Computer ScienceComputer scienceComputationData_CODINGANDINFORMATIONTHEORYLyndon wordOptimal word permutationTheoretical Computer ScienceCombinatoricsPermutationSuffix treeCombinatorial optimizationBurrows–Wheeler transformTime complexityComputer Science(all)
researchProduct

Optimal Partitions of Strings: A New Class of Burrows-Wheeler Compression Algorithms

2003

The Burrows-Wheeler transform [1] is one of the mainstays of lossless data compression. In most cases, its output is fed to Move to Front or other variations of symbol ranking compression. One of the main open problems [2] is to establish whether Move to Front, or more in general symbol ranking compression, is an essential part of the compression process. We settle this question positively by providing a new class of Burrows-Wheeler algorithms that use optimal partitions of strings, rather than symbol ranking, for the additional step. Our technique is a quite surprising specialization to strings of partitioning techniques devised by Buchsbaum et al. [3] for two-dimensional table compression…

Lossless compressionNew classBurrows–Wheeler transformComputer scienceString (computer science)Entropy (information theory)Data_CODINGANDINFORMATIONTHEORYPattern matchingAlgorithmData compression
researchProduct

"Indexing structures for approximate string matching

2003

In this paper we give the first, to our knowledge, structures and corresponding algorithms for approximate indexing, by considering the Hamming distance, having the following properties. i) Their size is linear times a polylog of the size of the text on average. ii) For each pattern x, the time spent by our algorithms for finding the list occ(x) of all occurrences of a pattern x in the text, up to a certain distance, is proportional on average to |x| + |occ(x)|, under an additional but realistic hypothesis.

CombinatoricsCombinatorics on wordsPattern recognition (psychology)Search engine indexingAutomata theoryHamming distanceString searching algorithmApproximate string matchingTime complexityMathematics
researchProduct

Block Sorting-Based Transformations on Words: Beyond the Magic BWT

2018

The Burrows-Wheeler Transform (BWT) is a word transformation introduced in 1994 for Data Compression and later results have contributed to make it a fundamental tool for the design of self-indexing compressed data structures. The Alternating Burrows-Wheeler Transform (ABWT) is a more recent transformation, studied in the context of Combinatorics on Words, that works in a similar way, using an alternating lexicographical order instead of the usual one. In this paper we study a more general class of block sorting-based transformations. The transformations in this new class prove to be interesting combinatorial tools that offer new research perspectives. In particular, we show that all the tra…

0301 basic medicineSettore INF/01 - InformaticaComputer scienceData_CODINGANDINFORMATIONTHEORY0102 computer and information sciencesBlock sortingData structureLexicographical order01 natural sciencesUpper and lower bounds03 medical and health sciencesCombinatorics on words030104 developmental biology010201 computation theory & mathematicsArithmeticCompressed Data Structures Block Sorting Combinatorics on Words AlgorithmsData compression
researchProduct

Computing the Original eBWT Faster, Simpler, and with Less Memory

2021

Mantaci et al. [TCS 2007] defined the \(\mathrm {eBWT}\) to extend the definition of the \(\mathrm {BWT}\) to a collection of strings. However, since this introduction, it has been used more generally to describe any \(\mathrm {BWT}\) of a collection of strings, and the fundamental property of the original definition (i.e., the independence from the input order) is frequently disregarded. In this paper, we propose a simple linear-time algorithm for the construction of the original \(\mathrm {eBWT}\), which does not require the preprocessing of Bannai et al. [CPM 2021]. As a byproduct, we obtain the first linear-time algorithm for computing the \(\mathrm {BWT}\) of a single string that uses …

2019-20 coronavirus outbreakSpeedupString collectionsBig BWTSettore INF/01 - InformaticaSevere acute respiratory syndrome coronavirus 2 (SARS-CoV-2)String (computer science)Suffix arrayOrder (ring theory)omega-orderQuantitative Biology::GenomicsBurrows-Wheeler-TransformBurrows-Wheeler-Transform String collections SAIS Big BWT prefix-free parsing extended BWTlaw.inventionCombinatoricsprefix-free parsingSimple (abstract algebra)lawSAISSAIS algorithmIndependence (probability theory)extended BWTMathematics
researchProduct

Nondeterministic Moore automata and Brzozowski's minimization algorithm

2012

AbstractMoore automata represent a model that has many applications. In this paper we define a notion of coherent nondeterministic Moore automaton (NMA) and show that such a model has the same computational power of the classical deterministic Moore automaton. We consider also the problem of constructing the minimal deterministic Moore automaton equivalent to a given NMA. We propose an algorithm that is a variant of Brzozowski’s minimization algorithm in the sense that it is essentially structured as reverse operation and subset construction performed twice. Moreover, we explore more general classes of NMA and analyze the applicability of the algorithm. For some of such classes the algorith…

Discrete mathematicsTheoryofComputation_COMPUTATIONBYABSTRACTDEVICESGeneral Computer ScienceBrzozowski’s minimization algorithmSettore INF/01 - InformaticaPowerset constructionAutomata minimizationBüchi automatonNonlinear Sciences::Cellular Automata and Lattice GasesTheoretical Computer ScienceNondeterministic algorithmDeterministic finite automatonTheoryofComputation_MATHEMATICALLOGICANDFORMALLANGUAGESDFA minimizationDeterministic automatonTwo-way deterministic finite automatonNondeterministic finite automatonBrzozowski's minimization algorithmComputer Science::Formal Languages and Automata TheoryComputer Science(all)MathematicsNondeterministic Moore automata
researchProduct

r-Indexing the eBWT

2021

The extended Burrows Wheeler Transform (\(\mathrm {eBWT}\)) was introduced by Mantaci et al. [TCS 2007] to extend the definition of the \(\mathrm {BWT}\) to a collection of strings. In our prior work [SPIRE 2021], we give a linear-time algorithm for the \(\mathrm {eBWT}\) that preserves the fundamental property of the original definition (i.e., the independence from the input order). The algorithm combines a modification of the Suffix Array Induced Sorting (SAIS) algorithm [IEEE Trans Comput 2011] with Prefix Free Parsing [AMB 2019; JCB 2020]. In this paper, we show how this construction algorithm leads to r-indexing the \(\mathrm {eBWT}\), i.e., run-length encoded \(\mathrm {eBWT}\) and \(…

Physicsstring compressionBurrows–Wheeler transformSettore INF/01 - InformaticaSearch engine indexingSuffix arrayOrder (ring theory)Burrows-Wheeler-Transform r-index string compression extended BWT compressed indexingBurrows-Wheeler-Transformlaw.inventionCombinatoricsr-indexcompressed indexinglawIndexingextended BWT
researchProduct

Distance measures for biological sequences: Some recent approaches

2008

AbstractSequence comparison has become a very essential tool in modern molecular biology. In fact, in biomolecular sequences high similarity usually implies significant functional or structural similarity. Traditional approaches use techniques that are based on sequence alignment able to measure character level differences. However, the recent developments of whole genome sequencing technology give rise to need of similarity measures able to capture the rearrangements involving large segments contained in the sequences. This paper is devoted to illustrate different methods recently introduced for the alignment-free comparison of biological sequences. Goal of the paper is both to highlight t…

Whole genome sequencingComputer sciencebusiness.industryApplied MathematicsSequence alignmentMachine learningcomputer.software_genreBioinformaticsMeasure (mathematics)GenomeDistance measuresSimilitudeTheoretical Computer ScienceArtificial IntelligenceSimilarity (psychology)Metric (mathematics)Artificial intelligencebusinesscomputerSoftwareInternational Journal of Approximate Reasoning
researchProduct

On extremal cases of Hopcroft’s algorithm

2010

AbstractIn this paper we consider the problem of minimization of deterministic finite automata (DFA) with reference to Hopcroft’s algorithm. Hopcroft’s algorithm has several degrees of freedom, so there can exist different executions that can lead to different sequences of refinements of the set of the states up to the final partition. We find an infinite family of binary automata for which such a process is unique, whatever strategy is chosen. Some recent papers (cf. Berstel and Carton (2004) [3], Castiglione et al. (2008) [6] and Berstel et al. (2009) [1]) have been devoted to find families of automata for which Hopcroft’s algorithm has its worst execution time. They are unary automata as…

Discrete mathematicsFinite-state machineGeneral Computer ScienceUnary operationWord treesStandard treesAutomatonTheoretical Computer ScienceCombinatoricsDeterministic finite automatonDFA minimizationDeterministic automatonHopcroft’s minimization algorithmTree automatonDeterministic finite state automataTime complexityAlgorithmComputer Science::Formal Languages and Automata TheoryMathematicsComputer Science(all)Theoretical Computer Science
researchProduct

Novel Results on the Number of Runs of the Burrows-Wheeler-Transform

2021

The Burrows-Wheeler-Transform (BWT), a reversible string transformation, is one of the fundamental components of many current data structures in string processing. It is central in data compression, as well as in efficient query algorithms for sequence data, such as webpages, genomic and other biological sequences, or indeed any textual data. The BWT lends itself well to compression because its number of equal-letter-runs (usually referred to as $r$) is often considerably lower than that of the original string; in particular, it is well suited for strings with many repeated factors. In fact, much attention has been paid to the $r$ parameter as measure of repetitiveness, especially to evalua…

FOS: Computer and information sciencesBurrows–Wheeler transformSettore INF/01 - InformaticaCombinatorics on wordsFormal Languages and Automata Theory (cs.FL)Computer scienceString (computer science)Search engine indexingCompressed data structuresComputer Science - Formal Languages and Automata TheoryString indexingData structureMeasure (mathematics)Burrows-Wheeler-TransformRepetitivenessCombinatorics on wordsBurrows-Wheeler-Transform Compressed data structures String indexing Repetitiveness Combinatorics on wordsTransformation (function)Computer Science - Data Structures and AlgorithmsData Structures and Algorithms (cs.DS)AlgorithmData compression
researchProduct

Lightweight LCP construction for next-generation sequencing datasets

2012

The advent of "next-generation" DNA sequencing (NGS) technologies has meant that collections of hundreds of millions of DNA sequences are now commonplace in bioinformatics. Knowing the longest common prefix array (LCP) of such a collection would facilitate the rapid computation of maximal exact matches, shortest unique substrings and shortest absent words. CPU-efficient algorithms for computing the LCP of a string have been described in the literature, but require the presence in RAM of large data structures. This prevents such methods from being feasible for NGS datasets. In this paper we propose the first lightweight method that simultaneously computes, via sequential scans, the LCP and B…

Whole genome sequencingGenomics (q-bio.GN)FOS: Computer and information sciencesSequenceBWT; LCP; next-generation sequencing datasetsBWT LCP text indexes next-generation sequencing datasets massive datasetsSettore INF/01 - InformaticaComputer scienceComputationString (computer science)LCP arrayParallel computingData structureDNA sequencingSubstringBWTLCPFOS: Biological sciencesComputer Science - Data Structures and AlgorithmsQuantitative Biology - GenomicsData Structures and Algorithms (cs.DS)next-generation sequencing datasets
researchProduct

Lightweight LCP construction for very large collections of strings

2016

The longest common prefix array is a very advantageous data structure that, combined with the suffix array and the Burrows-Wheeler transform, allows to efficiently compute some combinatorial properties of a string useful in several applications, especially in biological contexts. Nowadays, the input data for many problems are big collections of strings, for instance the data coming from "next-generation" DNA sequencing (NGS) technologies. In this paper we present the first lightweight algorithm (called extLCP) for the simultaneous computation of the longest common prefix array and the Burrows-Wheeler transform of a very large collection of strings having any length. The computation is reali…

FOS: Computer and information sciencesComputer scienceComputation0102 computer and information sciences02 engineering and technologyParallel computing01 natural sciencesGeneralized Suffix ArrayTheoretical Computer Sciencelaw.inventionlawComputational Theory and MathematicComputer Science - Data Structures and AlgorithmsExtended Burrows-Wheeler TransformData_FILES0202 electrical engineering electronic engineering information engineeringDiscrete Mathematics and CombinatoricsData Structures and Algorithms (cs.DS)Discrete Mathematics and CombinatoricAuxiliary memoryLongest Common Prefix Array; Extended Burrows-Wheeler Transform; Generalized Suffix Array;String (computer science)LCP arraySuffix arrayData structureComputational Theory and Mathematics010201 computation theory & mathematicsLongest Common Prefix Array020201 artificial intelligence & image processingJournal of Discrete Algorithms
researchProduct

Preface

2015

This special issue of Mathematical Structures in Computer Science is devoted to the fourteenth Italian Conference on Theoretical Computer Science (ICTCS) held at University of Palermo, Italy, from 9th to 11th September 2013. ICTCS is the conference of the Italian Chapter of the European Association for Theoretical Computer Science and covers a wide spectrum of topics in Theoretical Computer Science, ranging from computational complexity to logic, from algorithms and data structure to programming languages, from combinatorics on words to distributed computing. For this reason, the contributions here included come from very different areas of Theoretical Computer Science. In fact this special…

Mathematics (miscellaneous)Computer Science Applications1707 Computer Vision and Pattern RecognitionComputer Science Applications
researchProduct

Hopcroft's algorithm and tree-like automata

2011

Minimizing a deterministic finite automata (DFA) is a very important problem in theory of automata and formal languages. Hopcroft's algorithm represents the fastest known solution to the such a problem. In this paper we analyze the behavior of this algorithm on a family binary automata, called tree-like automata, associated to binary labeled trees constructed by words. We prove that all the executions of the algorithm on tree-like automata associated to trees, constructed by standard words, have running time with the same asymptotic growth rate. In particular, we provide a lower and upper bound for the running time of the algorithm expressed in terms of combinatorial properties of the trees…

Discrete mathematicsNested wordSettore INF/01 - InformaticaGeneral MathematicsAutomata minimizationω-automatonHopcroft's algorithmComputer Science ApplicationsCombinatoricsDeterministic finite automatonDFA minimizationDeterministic automatonContinuous spatial automatonQuantum finite automataAutomata theoryword treesAlgorithmComputer Science::Formal Languages and Automata TheorySoftwareMathematics
researchProduct

Variable-order reference-free variant discovery with the Burrows-Wheeler Transform

2020

Abstract Background In [Prezza et al., AMB 2019], a new reference-free and alignment-free framework for the detection of SNPs was suggested and tested. The framework, based on the Burrows-Wheeler Transform (BWT), significantly improves sensitivity and precision of previous de Bruijn graphs based tools by overcoming several of their limitations, namely: (i) the need to establish a fixed value, usually small, for the order k, (ii) the loss of important information such as k-mer coverage and adjacency of k-mers within the same read, and (iii) bad performance in repeated regions longer than k bases. The preliminary tool, however, was able to identify only SNPs and it was too slow and memory con…

Burrows–Wheeler transformComputer science[SDV]Life Sciences [q-bio]Value (computer science)SNPAssembly-free0102 computer and information scienceslcsh:Computer applications to medicine. Medical informatics01 natural sciencesBiochemistryPolymorphism Single Nucleotide03 medical and health sciencesBWTChromosome (genetic algorithm)Structural BiologyHumansSensitivity (control systems)Molecular Biologylcsh:QH301-705.5Alignment-free; Assembly-free; BWT; INDEL; SNP030304 developmental biologyAlignment-free; Assembly-free; BWT; INDEL; SNP;De Bruijn sequence0303 health sciencesSettore INF/01 - InformaticaAlignment-freeApplied MathematicsResearchGenomicsSequence Analysis DNAINDELData structureGraphComputer Science ApplicationsVariable (computer science)lcsh:Biology (General)010201 computation theory & mathematicsAdjacency listlcsh:R858-859.7Suffix[INFO.INFO-BI]Computer Science [cs]/Bioinformatics [q-bio.QM]AlgorithmAlgorithmsBMC Bioinformatics
researchProduct

Hopcroft’s Algorithm and Cyclic Automata

2008

Minimization of deterministic finite automata is a largely studied problem of the Theory of Automata and Formal Languages. It consists in finding the unique (up to isomorphism) minimal deterministic automaton recognizing a set of words. The first approaches to this topic can be traced back to the 1950’s with the works of Huffman and Moore (cf. [12,15]). Over the years several methods to solve this problem have been proposed but the most efficient algorithm in the worst case was given by Hopcroft in [11]. Such an algorithm computes in O(n log n) the minimal automaton equivalent to a given automaton with n states. The Hopcroft’s algorithm has been widely studied, described and implemented by …

Discrete mathematicsNested wordSettore INF/01 - InformaticaComputer scienceTimed automatonSturmian wordsω-automatonNonlinear Sciences::Cellular Automata and Lattice GasesHopcroft's algorithmCombinatoricsDFA minimizationDeterministic automatonAutomata theoryQuantum finite automataNondeterministic finite automatonAlgorithmComputer Science::Formal Languages and Automata Theory
researchProduct

The Burrows-Wheeler Transform between Data Compression and Combinatorics on Words

2013

The Burrows-Wheeler Transform (BWT) is a tool of fundamental importance in Data Compression and, recently, has found many applications well beyond its original purpose. The main goal of this paper is to highlight the mathematical and combinatorial properties on which the outstanding versatility of the $BWT$ is based, i.e. its reversibility and the clustering effect on the output. Such properties have aroused curiosity and fervent interest in the scientific world both for theoretical aspects and for practical effects. In particular, in this paper we are interested both to survey the theoretical research issues which, by taking their cue from Data Compression, have been developed in the conte…

Theoretical computer scienceSettore INF/01 - InformaticaBurrows–Wheeler transformmedia_common.quotation_subjectTheoretical researchContext (language use)Data_CODINGANDINFORMATIONTHEORYBurrows Wheeler transform; Clustering effect; Combinatorial propertiesCombinatorial propertiesBurrows Wheeler transformCombinatorics on wordsClustering effectBWT balancing optimal partitioning text-compressionCuriosityArithmeticCluster analysisFocus (optics)media_commonData compressionMathematics
researchProduct

Forbidden words in symbolic dynamics

2000

AbstractWe introduce an equivalence relation≃between functions from N to N. By describing a symbolic dynamical system in terms of forbidden words, we prove that the≃-equivalence class of the function that counts the minimal forbidden words of a system is a topological invariant of the system. We show that the new invariant is independent from previous ones, but it is not characteristic. In the case of sofic systems, we prove that the≃-equivalence of the corresponding functions is a decidable question. As a more special application, we show, by using the new invariant, that two systems associated to Sturmian words having “different slope” are not conjugate.

Discrete mathematicsApplied Mathematicsautomata and formal languages010102 general mathematics[INFO.INFO-DS]Computer Science [cs]/Data Structures and Algorithms [cs.DS]Symbolic dynamics[INFO.INFO-DS] Computer Science [cs]/Data Structures and Algorithms [cs.DS]0102 computer and information sciencesFunction (mathematics)16. Peace & justice01 natural sciencesDecidabilitysymbolic dynamics010201 computation theory & mathematicsEquivalence relationcombinatoric on words0101 mathematicsInvariant (mathematics)Dynamical system (definition)Equivalence (measure theory)Computer Science::Formal Languages and Automata TheoryWord (group theory)ComputingMilieux_MISCELLANEOUSMathematics
researchProduct

Words and forbidden factors

2002

AbstractGiven a finite or infinite word v, we consider the set M(v) of minimal forbidden factors of v. We show that the set M(v) is of fundamental importance in determining the structure of the word v. In the case of a finite word w we consider two parameters that are related to the size of M(w): the first counts the minimal forbidden factors of w and the second gives the length of the longest minimal forbidden factor of w. We derive sharp upper and lower bounds for both parameters. We prove also that the second parameter is related to the minimal period of the word w. We are further interested to the algorithmic point of view. Indeed, we design linear time algorithm for the following two p…

CombinatoricsGeneral Computer ScienceGeneral problemFree monoidFormal languageSturmian wordWord problem (mathematics)AutomorphismTime complexityUpper and lower boundsMathematicsTheoretical Computer ScienceComputer Science(all)Theoretical Computer Science
researchProduct

An extension of the Burrows-Wheeler Transform

2007

AbstractWe describe and highlight a generalization of the Burrows–Wheeler Transform (bwt) to a multiset of words. The extended transformation, denoted by ebwt, is reversible. Moreover, it allows to define a bijection between the words over a finite alphabet A and the finite multisets of conjugacy classes of primitive words in A∗. Besides its mathematical interest, the extended transform can be useful for applications in the context of string processing. In the last part of this paper we illustrate one such application, providing a similarity measure between sequences based on ebwt.

Discrete mathematicsMultisetSimilarity (geometry)General Computer ScienceBurrows–Wheeler transformGeneralizationAlignment-free distance measure; Burrows-Wheeler transform; Sequence comparisonContext (language use)Similarity measureBurrows-Wheeler transformSequence comparisonTheoretical Computer ScienceConjugacy classBijectionAlignment-free distance measureBurrows–Wheeler transformComputer Science::Formal Languages and Automata TheoryComputer Science(all)Mathematics
researchProduct

Cyclic Complexity of Words

2014

We introduce and study a complexity function on words $c_x(n),$ called \emph{cyclic complexity}, which counts the number of conjugacy classes of factors of length $n$ of an infinite word $x.$ We extend the well-known Morse-Hedlund theorem to the setting of cyclic complexity by showing that a word is ultimately periodic if and only if it has bounded cyclic complexity. Unlike most complexity functions, cyclic complexity distinguishes between Sturmian words of different slopes. We prove that if $x$ is a Sturmian word and $y$ is a word having the same cyclic complexity of $x,$ then up to renaming letters, $x$ and $y$ have the same set of factors. In particular, $y$ is also Sturmian of slope equ…

FOS: Computer and information sciencesDiscrete Mathematics (cs.DM)Formal Languages and Automata Theory (cs.FL)Computer Science - Formal Languages and Automata Theory0102 computer and information sciences68R15Characterization (mathematics)[INFO.INFO-DM]Computer Science [cs]/Discrete Mathematics [cs.DM]01 natural sciencesTheoretical Computer ScienceCombinatoricsConjugacy class[INFO.INFO-FL]Computer Science [cs]/Formal Languages and Automata Theory [cs.FL][MATH.MATH-CO]Mathematics [math]/Combinatorics [math.CO]FOS: MathematicsDiscrete Mathematics and CombinatoricsMathematics - Combinatorics0101 mathematics[MATH]Mathematics [math]Discrete Mathematics and CombinatoricMathematicsDiscrete mathematicsFactor complexity010102 general mathematicsSturmian wordSturmian wordComputer Science::Computation and Language (Computational Linguistics and Natural Language and Speech Processing)Sturmian wordsCyclic complexity factor complexity Sturmian words minimal forbidden factorInfimum and supremumToeplitz matrixComputational Theory and Mathematics010201 computation theory & mathematicsCyclic complexityBounded functionComplexity functionCombinatorics (math.CO)Word (group theory)Computer Science::Formal Languages and Automata TheoryComputer Science - Discrete Mathematics
researchProduct

Novel Combinatorial and Information-Theoretic Alignment-Free Distances for Biological Data Mining

2010

Among the plethora of alignment-free methods for comparing biological sequences, there are some that we have perceived as representative of the novel techniques that have been devised in the past few years and as being of a fundamental nature and of broad interest and applicability, ranging from combinatorics to information theory. In this chapter, we review these alignment free methods, by presenting both their mathematical definitions and the experiments in which they are involved in.

Settore INF/01 - InformaticaComputer scienceAlignment-free distances for biological sequenceBiological data miningData miningcomputer.software_genrecomputer
researchProduct

Suffix array and Lyndon factorization of a text

2014

Abstract The main goal of this paper is to highlight the relationship between the suffix array of a text and its Lyndon factorization. It is proved in [15] that one can obtain the Lyndon factorization of a text from its suffix array. Conversely, here we show a new method for constructing the suffix array of a text that takes advantage of its Lyndon factorization. The surprising consequence of our results is that, in order to construct the suffix array, the local suffixes inside each Lyndon factor can be separately processed, allowing different implementative scenarios, such as online, external and internal memory, or parallel implementations. Based on our results, the algorithm that we prop…

Sorting suffixes; BWT; Suffix array; Lyndon word; Lyndon factorizationCompressed suffix arraySettore INF/01 - InformaticaSorting suffixesGeneralized suffix treeSuffix arrayOrder (ring theory)Construct (python library)Lyndon wordSorting suffixeTheoretical Computer Sciencelaw.inventionBWTLyndon factorizationComputational Theory and MathematicsFactorizationlawSuffix arrayFactor (programming language)Internal memoryDiscrete Mathematics and CombinatoricsArithmeticcomputerMathematicscomputer.programming_languageJournal of Discrete Algorithms
researchProduct

Inducing the Lyndon Array

2019

In this paper we propose a variant of the induced suffix sorting algorithm by Nong (TOIS, 2013) that computes simultaneously the Lyndon array and the suffix array of a text in $O(n)$ time using $\sigma + O(1)$ words of working space, where $n$ is the length of the text and $\sigma$ is the alphabet size. Our result improves the previous best space requirement for linear time computation of the Lyndon array. In fact, all the known linear algorithms for Lyndon array computation use suffix sorting as a preprocessing step and use $O(n)$ words of working space in addition to the Lyndon array and suffix array. Experimental results with real and synthetic datasets show that our algorithm is not onl…

FOS: Computer and information sciences050101 languages & linguisticsComputer scienceComputationInduced suffix sorting02 engineering and technologySpace (mathematics)law.inventionSuffix sortinglawSuffix arrayComputer Science - Data Structures and Algorithms0202 electrical engineering electronic engineering information engineeringData_FILESPreprocessorData Structures and Algorithms (cs.DS)0501 psychology and cognitive sciencesComputer Science::Data Structures and AlgorithmsTime complexitySettore ING-INF/05 - Sistemi Di Elaborazione Delle InformazioniSettore INF/01 - Informatica05 social sciencesLightweight algorithmSuffix arraySigmaComputer Science::Computation and Language (Computational Linguistics and Natural Language and Speech Processing)Induced suffix sorting; Lightweight algorithms; Lyndon array; Suffix arrayWorking spaceLyndon arrayLightweight algorithms020201 artificial intelligence & image processingAlgorithmComputer Science::Formal Languages and Automata Theory
researchProduct

Standard Sturmian words and automata minimization algorithms

2015

The study of some close connections between the combinatorial properties of words and the performance of the automata minimization process constitutes the main focus of this paper. These relationships have been, in fact, the basis of the study of the tightness and the extremal cases of Hopcroft's algorithm, that is, up to now, the most efficient minimization method for deterministic finite state automata. Recently, increasing attention has been paid to another minimization method that, unlike the approach proposed by Hopcroft, is not based on refinement of the set of states of the automaton, but on automata operations such as determinization and reverse, and is also applicable to non-determ…

Discrete mathematicsTheoryofComputation_COMPUTATIONBYABSTRACTDEVICESNested wordFinite-state machineGeneral Computer ScienceAutomata minimizationComputer Science (all)ω-automatonNonlinear Sciences::Cellular Automata and Lattice GasesStandard Sturmian wordTheoretical Computer ScienceAutomatonCombinatoricsTheoryofComputation_MATHEMATICALLOGICANDFORMALLANGUAGESDFA minimizationAutomata theoryQuantum finite automataBrzozowski's minimization algorithmTime complexityAlgorithmComputer Science::Formal Languages and Automata TheoryMathematicsTheoretical Computer Science
researchProduct

On fixed points of the Burrows-Wheeler transform

2017

The Burrows-Wheeler Transform is a well known transformation widely used in Data Compression: important competitive compression software, such as Bzip (cf. [1]) and Szip (cf. [2]) and some indexing software, like the FM-index (cf. [3]), are deeply based on the Burrows Wheeler Transform. The main advantage of using BWT for data compression consists in its feature of "clustering" together equal characters. In this paper we show the existence of fixed points of BWT, i.e., words on which BWT has no effect. We show a characterization of the permutations associated to BWT of fixed points and we give the explicit form of fixed points on a binary ordered alphabet a, b having at most four b's and th…

Discrete mathematicsAlgebra and Number TheoryBurrows–Wheeler transformSettore INF/01 - InformaticaPermutationPermutations0102 computer and information sciences02 engineering and technologyInformation SystemFixed point01 natural sciencesTheoretical Computer ScienceComputational Theory and Mathematics010201 computation theory & mathematicsFixed PointFixed Points0202 electrical engineering electronic engineering information engineeringBurrows-Wheeler Transform; Fixed Points; Permutations; Theoretical Computer Science; Algebra and Number Theory; Information Systems; Computational Theory and Mathematics020201 artificial intelligence & image processingBurrows-Wheeler TransformInformation SystemsMathematics
researchProduct

Boosting Textual Compression in Optimal Linear Time

2005

We provide a general boosting technique for Textual Data Compression. Qualitatively, it takes a good compression algorithm and turns it into an algorithm with a better compression performance guarantee. It displays the following remarkable properties: (a) it can turn any memoryless compressor into a compression algorithm that uses the “best possible” contexts; (b) it is very simple and optimal in terms of time; and (c) it admits a decompression algorithm again optimal in time. To the best of our knowledge, this is the first boosting technique displaying these properties.Technically, our boosting technique builds upon three main ingredients: the Burrows--Wheeler Transform, the Suffix Tree d…

Theoretical computer scienceBurrows–Wheeler transformSuffix treeString (computer science)Data_CODINGANDINFORMATIONTHEORYBurrows-Wheeler transformSubstringArithmetic codinglaw.inventionLempel-Ziv compressorsArtificial IntelligenceHardware and ArchitectureControl and Systems Engineeringlawtext compressionempirical entropyArithmetic codingGreedy algorithmTime complexityAlgorithmSoftwareInformation SystemsMathematicsData compression
researchProduct

Burrows-Wheeler transform and Run-Length Enconding

2017

In this paper we study the clustering effect of the Burrows-Wheeler Transform (BWT) from a combinatorial viewpoint. In particular, given a word w we define the BWT-clustering ratio of w as the ratio between the number of clusters produced by BWT and the number of the clusters of w. The number of clusters of a word is measured by its Run-Length Encoding. We show that the BWT-clustering ratio ranges in ]0, 2]. Moreover, given a rational number \(r\,\in \,]0,2]\), it is possible to find infinitely many words having BWT-clustering ratio equal to r. Finally, we show how the words can be classified according to their BWT-clustering ratio. The behavior of such a parameter is studied for very well-…

Discrete mathematicsRational numberBurrows–Wheeler transformComputer scienceComputer Science (all)0102 computer and information sciences02 engineering and technologyBurrows-Wheeler transform01 natural sciencesBurrows-Wheeler transform; Clustering effect; Run-length encoding; Theoretical Computer Science; Computer Science (all)Theoretical Computer ScienceClustering effect010201 computation theory & mathematicsRun-length encoding0202 electrical engineering electronic engineering information engineering020201 artificial intelligence & image processingCluster analysisWord (computer architecture)Run-length encoding
researchProduct

On Extremal Cases of Hopcroft’s Algorithm

2009

In this paper we consider the problem of minimization of deterministic finite automata (DFA) with reference to Hopcroft’s algorithm. Hopcroft’s algorithm has several degrees of freedom, so there can exist different sequences of refinements of the set of the states that lead to the final partition. We find an infinite family of binary automata for which such a process is unique. Some recent papers (cf. [3,7,1]) have been devoted to find families of automata for which Hopcroft’s algorithm has its worst execution time. They are unary automata associated to circular words. However, automata minimization can be achieved also in linear time when the alphabet has only one letter (cf. [14]), so in …

Discrete mathematicsTheoryofComputation_COMPUTATIONBYABSTRACTDEVICESSettore INF/01 - InformaticaUnary operationBinary numberHopcroft's algorithmNonlinear Sciences::Cellular Automata and Lattice GasesAutomatonCombinatoricsSet (abstract data type)TheoryofComputation_MATHEMATICALLOGICANDFORMALLANGUAGESDeterministic finite automatonDFA minimizationMinificationAlgorithmTime complexityComputer Science::Formal Languages and Automata TheoryMathematics
researchProduct

Word assembly through minimal forbidden words

2006

AbstractWe give a linear-time algorithm to reconstruct a finite word w over a finite alphabet A of constant size starting from a finite set of factors of w verifying a suitable hypothesis. We use combinatorics techniques based on the minimal forbidden words, which have been introduced in previous papers. This improves a previous algorithm which worked under the assumption of stronger hypothesis.

General Computer ScienceFragment assemblyFactor automaton[INFO.INFO-DS]Computer Science [cs]/Data Structures and Algorithms [cs.DS][INFO.INFO-DS] Computer Science [cs]/Data Structures and Algorithms [cs.DS]0102 computer and information sciences02 engineering and technology01 natural sciencesMinimal forbidden wordTheoretical Computer ScienceCombinatorics0202 electrical engineering electronic engineering information engineeringFinite setComputingMilieux_MISCELLANEOUSCombinatorics on wordMathematicsShortest superstringCombinatorics on wordsRepetition index16. Peace & justice010201 computation theory & mathematics020201 artificial intelligence & image processingAlphabetConstant (mathematics)Word (computer architecture)Computer Science::Formal Languages and Automata TheoryComputer Science(all)
researchProduct

SORTING CONJUGATES AND SUFFIXES OF WORDS IN A MULTISET

2014

In this paper we are interested in the study of the combinatorial aspects related to the extension of the Burrows-Wheeler transform to a multiset of words. Such study involves the notion of suffixes and conjugates of words and is based on two different order relations, denoted by <lex and ≺ω, that, even if strictly connected, are quite different from the computational point of view. In particular, we introduce a method that only uses the <lex sorting among suffixes of a multiset of words in order to sort their conjugates according to ≺ω-order. In this study an important role is played by Lyndon words. This strategy could be used in applications specially in the field of Bioinformatic…

Lyndon words; Burrows-Wheeler transform; Extended Burrows-Wheeler transform; Circular words; Conjugates; Suffixes; SortingSuffixesMultisetTheoretical computer sciencePoint (typography)Burrows–Wheeler transformSettore INF/01 - InformaticaSortingcircular wordExtension (predicate logic)Lyndon wordsBurrows-Wheeler transformLyndon wordField (computer science)ConjugatesconjugateComputer Science (miscellaneous)sortOrder (group theory)suffixeArithmeticextended Burrows-Wheeler transformCircular wordssortingMathematics
researchProduct

Nondeterministic Moore Automata and Brzozowski’s Algorithm

2011

Moore automata represent a model that has many applications. In this paper we define a notion of coherent nondeterministic Moore automaton (NMA) and show that such a model has the same computational power of the classical deterministic Moore automaton. We consider also the problem of constructing the minimal deterministic Moore automaton equivalent to a given NMA. In this paper we propose an algorithm that is a variant of Brzozowski's algorithm in the sense that it is essentially structured as reverse operation and subset construction performed twice.

Discrete mathematicsTheoryofComputation_COMPUTATIONBYABSTRACTDEVICESSettore INF/01 - InformaticaPowerset constructionBüchi automatonNonlinear Sciences::Cellular Automata and Lattice GasesNondeterministic algorithmTheoryofComputation_MATHEMATICALLOGICANDFORMALLANGUAGESDeterministic finite automatonDFA minimizationDeterministic automatonTwo-way deterministic finite automatonMoore automata minimization Brzozowski'algorithmNondeterministic finite automatonAlgorithmComputer Science::Formal Languages and Automata TheoryMathematics
researchProduct

A combinatorial view on string attractors

2021

Abstract The notion of string attractor has recently been introduced in [Prezza, 2017] and studied in [Kempa and Prezza, 2018] to provide a unifying framework for known dictionary-based compressors. A string attractor for a word w = w 1 w 2 ⋯ w n is a subset Γ of the positions { 1 , … , n } , such that all distinct factors of w have an occurrence crossing at least one of the elements of Γ. In this paper we explore the notion of string attractor by focusing on its combinatorial properties. In particular, we show how the size of the smallest string attractor of a word varies when combinatorial operations are applied and we deduce that such a measure is not monotone. Moreover, we introduce a c…

General Computer ScienceSettore INF/01 - InformaticaString (computer science)de Bruijn word0102 computer and information sciences02 engineering and technologyCharacterization (mathematics)Burrows-Wheeler transform01 natural sciencesMeasure (mathematics)Standard Sturmian wordTheoretical Computer ScienceCombinatoricsConjugacy classMonotone polygonString attractor010201 computation theory & mathematicsAttractorThue-Morse word0202 electrical engineering electronic engineering information engineeringLempel-Ziv encoding020201 artificial intelligence & image processingWord (group theory)Mathematics
researchProduct

Universal Lyndon Words

2014

A word w over an alphabet Σ is a Lyndon word if there exists an order defined on Σ for which w is lexicographically smaller than all of its conjugates (other than itself). We introduce and study universal Lyndon words, which are words over an n-letter alphabet that have length n! and such that all the conjugates are Lyndon words. We show that universal Lyndon words exist for every n and exhibit combinatorial and structural properties of these words. We then define particular prefix codes, which we call Hamiltonian lex-codes, and show that every Hamiltonian lex-code is in bijection with the set of the shortest unrepeated prefixes of the conjugates of a universal Lyndon word. This allows us t…

Discrete mathematicsExistential quantificationLyndon word Universal cycle Universal Lyndon wordComputer Science::Computation and Language (Computational Linguistics and Natural Language and Speech Processing)Lyndon word Universal cycle Universal Lyndon word Lex-codeLexicographical orderLyndon wordUniversal Lyndon wordLyndon wordsPrefixCombinatoricsMathematics::Group TheoryCombinatorics on wordsComputer Science::Discrete MathematicsUniversal cycleBijectionAlphabetMathematics::Representation TheoryComputer Science::Formal Languages and Automata TheoryMathematics
researchProduct

Forbidden Factors and Fragment Assembly

2002

In this paper we approach the fragment assembly problem by using the notion of minimal forbidden factors introduced in previous paper. Denoting by M(w) the set of minimal forbidden factors of a word w, we first focus on the evaluation of the size of elements in M(w) and on designing of an algorithm to recover the word w from M(w). Actually we prove that for a word w randomly generated by a memoryless source with identical symbol probabilities, the maximal length m(w) of words in M(w) is logarithmic and that the reconstruction algorithm runs in linear time. These results have an interesting application to the fragment assembly problem, i.e. reconstruct a word w from a given set I of substrin…

Set (abstract data type)CombinatoricsLogarithmFragment (logic)Reconstruction algorithmFocus (optics)AlgorithmTime complexitySubstringWord (computer architecture)Mathematics
researchProduct

The colored longest common prefix array computed via sequential scans

2018

Due to the increased availability of large datasets of biological sequences, the tools for sequence comparison are now relying on efficient alignment-free approaches to a greater extent. Most of the alignment-free approaches require the computation of statistics of the sequences in the dataset. Such computations become impractical in internal memory when very large collections of long sequences are considered. In this paper, we present a new conceptual data structure, the colored longest common prefix array (cLCP), that allows to efficiently tackle several problems with an alignment-free approach. In fact, we show that such a data structure can be computed via sequential scans in semi-exter…

0301 basic medicineFOS: Computer and information sciencesAlignment-free methodsBurrows–Wheeler transformComputer scienceComputationAverage common substring0206 medical engineeringMatching statisticsScale (descriptive set theory)02 engineering and technologyTheoretical Computer Science03 medical and health sciencesComputer Science - Data Structures and AlgorithmsData Structures and Algorithms (cs.DS)Burrows-wheeler transformString (computer science)Computer Science (all)LCP arrayMatching statisticData structureSubstring030104 developmental biologyAlignment-free methods; Average common substring; Burrows-wheeler transform; Longest common prefix; Matching statistics; Theoretical Computer Science; Computer Science (all)Pairwise comparisonLongest common prefixAlgorithm020602 bioinformaticsAlignment-free method
researchProduct

Measuring the clustering effect of BWT via RLE

2017

Abstract The Burrows–Wheeler Transform (BWT) is a reversible transformation on which are based several text compressors and many other tools used in Bioinformatics and Computational Biology. The BWT is not actually a compressor, but a transformation that performs a context-dependent permutation of the letters of the input text that often create runs of equal letters (clusters) longer than the ones in the original text, usually referred to as the “clustering effect” of BWT. In particular, from a combinatorial point of view, great attention has been given to the case in which the BWT produces the fewest number of clusters (cf. [5] , [16] , [21] , [23] ). In this paper we are concerned about t…

0301 basic medicineGeneral Computer SciencePermutationComputer Science (all)Binary number0102 computer and information sciencesQuantitative Biology::Genomics01 natural sciencesUpper and lower boundsTheoretical Computer ScienceCombinatorics03 medical and health sciencesPermutation030104 developmental biologyTransformation (function)BWT010201 computation theory & mathematicsRun-length encodingComputer Science::Data Structures and AlgorithmsCluster analysisPrimitive root modulo nBWT; Permutation; Run-length encoding; Theoretical Computer Science; Computer Science (all)Word (computer architecture)Run-length encodingMathematics
researchProduct

String Attractors and Infinite Words

2022

The notion of string attractor has been introduced by Kempa and Prezza (STOC 2018) in the context of Data Compression and it represents a set of positions of a finite word in which all of its factors can be “attracted”. The smallest size γ∗ of a string attractor for a finite word is a lower bound for several repetitiveness measures associated with the most common compression schemes, including BWT-based and LZ-based compressors. The combinatorial properties of the measure γ∗ have been studied in [Mantaci et al., TCS 2021]. Very recently, a complexity measure, called string attractor profile function, has been introduced for infinite words, by evaluating γ∗ on each prefix. Such a measure has…

String attractorSettore INF/01 - InformaticaFactor complexityMorphismSturmian wordRecurrent wordRepetitiveness measure
researchProduct

Circular sturmian words and Hopcroft’s algorithm

2009

AbstractIn order to analyze some extremal cases of Hopcroft’s algorithm, we investigate the relationships between the combinatorial properties of a circular sturmian word (x) and the run of the algorithm on the cyclic automaton Ax associated to (x). The combinatorial properties of words taken into account make use of sturmian morphisms and give rise to the notion of reduction tree of a circular sturmian word. We prove that the shape of this tree uniquely characterizes the word itself. The properties of the run of Hopcroft’s algorithm are expressed in terms of the derivation tree of the automaton, which is a tree that represents the refinement process that, in the execution of Hopcroft’s alg…

Discrete mathematicsReduction (recursion theory)Fibonacci numberGeneral Computer ScienceHopcroft'algorithmSturmian wordSturmian wordSturmian morphismsTheoretical Computer ScienceCombinatoricsTree (descriptive set theory)TheoryofComputation_MATHEMATICALLOGICANDFORMALLANGUAGESComputer Science::Discrete MathematicsDeterministic automatonHopcroft’s minimization algorithmCircular sturmian wordsTree automatonDeterministic finite state automataTime complexityAlgorithmComputer Science::Formal Languages and Automata TheoryWord (group theory)Computer Science(all)MathematicsTheoretical Computer Science
researchProduct

Epichristoffel Words and Minimization of Moore Automata

2014

This paper is focused on the connection between the combinatorics of words and minimization of automata. The three main ingredients are the epichristoffel words, Moore automata and a variant of Hopcroft's algorithm for their minimization. Epichristoffel words defined in [14] generalize some properties of circular sturmian words. Here we prove a factorization property and the existence of the reduction tree, that uniquely identifies the structure of the word. Furthermore, in the paper we investigate the problem of the minimization of Moore automata by defining a variant of Hopcroft's minimization algorithm. The use of this variant makes simpler the computation of the running time and consequ…

Discrete mathematicsAlgebra and Number TheoryReduction (recursion theory)Structure (category theory)Tree (graph theory)Theoretical Computer ScienceAutomatonCombinatoricsComputational Theory and MathematicsDFA minimizationFactorizationMinificationComputer Science::Formal Languages and Automata TheoryWord (computer architecture)Information SystemsMathematicsFundamenta Informaticae
researchProduct

SNPs detection by eBWT positional clustering

2019

Sequencing technologies keep on turning cheaper and faster, thus putting a growing pressure for data structures designed to efficiently store raw data, and possibly perform analysis therein. In this view, there is a growing interest in alignment-free and reference-free variants calling methods that only make use of (suitably indexed) raw reads data. We develop the positional clustering theory that (i) describes how the extended Burrows–Wheeler Transform (eBWT) of a collection of reads tends to cluster together bases that cover the same genome position (ii) predicts the size of such clusters, and (iii) exhibits an elegant and precise LCP array based procedure to locate such clusters in the e…

lcsh:QH426-470Computer scienceLCP arrayReference-free[SDV]Life Sciences [q-bio]0206 medical engineeringSequencing dataSNPAssembly-free02 engineering and technologyBWT LCP array SNPs Reference-free Assembly-freecomputer.software_genreSoftwareBWTStructural BiologyComputational Theory and MathematicCluster (physics)Cluster analysislcsh:QH301-705.5Molecular BiologyComputingMilieux_MISCELLANEOUSSettore INF/01 - Informaticabusiness.industryResearchApplied MathematicsLCP arrayData structurePipeline (software)lcsh:GeneticsComputational Theory and Mathematicslcsh:Biology (General)Data miningBWT; LCP array; SNPs; Reference-free; Assembly-free[INFO.INFO-BI]Computer Science [cs]/Bioinformatics [q-bio.QM]businessRaw datacomputer020602 bioinformaticsSNPs
researchProduct

The Alternating BWT: an algorithmic perspective

2020

Abstract The Burrows-Wheeler Transform (BWT) is a word transformation introduced in 1994 for Data Compression. It has become a fundamental tool for designing self-indexing data structures, with important applications in several areas in science and engineering. The Alternating Burrows-Wheeler Transform (ABWT) is another transformation recently introduced in Gessel et al. (2012) [21] and studied in the field of Combinatorics on Words. It is analogous to the BWT, except that it uses an alternating lexicographical order instead of the usual one. Building on results in Giancarlo et al. (2018) [23] , where we have shown that BWT and ABWT are part of a larger class of reversible transformations, …

Discrete mathematicsFOS: Computer and information sciencesSettore INF/01 - InformaticaGeneral Computer ScienceBasis (linear algebra)Computer scienceAlternating Burrows-Wheeler TransformGalois wordRank-invertibilityField (mathematics)Data structureTheoretical Computer ScienceTransformation (function)Difference cover algorithmComputer Science - Data Structures and AlgorithmsData Structures and Algorithms (cs.DS)Time complexityAlternating Burrows-Wheeler Transform; Difference cover algorithm; Galois word; Rank-invertibilityWord (computer architecture)Data compression
researchProduct

The effects of mental rotation on computational thinking

2019

Abstract Although several investigations of spatial reasoning and mental rotation skills have been conducted in research areas linked to STEM education, to the best of our knowledge, few of these studies have examined the relationship between spatial reasoning and computational thinking. Given this gap in the literature, the present study investigates the role and action of spatial reasoning, and specifically the effects of mental rotation on computational thinking within an embodied and enacted perspective. To achieve this, we carried out a study involving 92 students in five primary-school classes (1st grade - 5th grade). The findings reveal a positive correlation between computational th…

Settore ING-INF/05 - Sistemi Di Elaborazione Delle InformazioniUnplugged codingGeneral Computer ScienceSettore INF/01 - InformaticaResearch areasComputational thinking05 social sciencesPerspective (graphical)050301 educationSpatial intelligenceMental rotationSTEMPositive correlation050105 experimental psychologyMental rotationEducationAction (philosophy)Embodied cognitionVisuospatial skillsComputational thinking Mental rotation STEM Unplugged coding Visuospatial skills0501 psychology and cognitive sciencesComputational thinkingPsychology0503 educationCognitive psychology
researchProduct

An extension of the Burrows-Wheeler Transform and applications to sequence comparison and data compression

2005

We introduce a generalization of the Burrows-Wheeler Transform (BWT) that can be applied to a multiset of words. The extended transformation, denoted by E, is reversible, but, differently from BWT, it is also surjective. The E transformation allows to give a definition of distance between two sequences, that we apply here to the problem of the whole mitochondrial genome phylogeny. Moreover we give some consideration about compressing a set of words by using the E transformation as preprocessing.

Discrete mathematicsMultisetBurrows-Wheeler transform; Data Compression; Mitochondrial genome phylogenyBurrows–Wheeler transformMultiplicity (mathematics)Mitochondrial genome phylogenyBurrows-Wheeler transformData CompressionSurjective functionConjugacy classSequence comparisonPreprocessorAlgorithmMathematicsData compression
researchProduct

A challenging family of automata for classical minimization algorithms

2010

In this paper a particular family of deterministic automata that was built to reach the worst case complexity of Hopcroft's state minimization algorithm is considered. This family is also challenging for the two other classical minimization algorithms: it achieves the worst case for Moore's algorithm, as a consequence of a result by Berstel et al., and is of at least quadratic complexity for Brzozowski's solution, which is our main contribution. It therefore constitutes an interesting family, which can be useful to measure the efficiency of implementations of well-known or new minimization algorithms.

Mathematical optimizationComputer science[INFO.INFO-DS]Computer Science [cs]/Data Structures and Algorithms [cs.DS][INFO.INFO-DS] Computer Science [cs]/Data Structures and Algorithms [cs.DS]0102 computer and information sciences02 engineering and technology01 natural sciencesMeasure (mathematics)Classical Minimization AlgorithmAutomatonRegular languageDFA minimization010201 computation theory & mathematics0202 electrical engineering electronic engineering information engineeringWorst-case complexity020201 artificial intelligence & image processingMinificationState (computer science)AlgorithmComputer Science::Formal Languages and Automata TheoryComputingMilieux_MISCELLANEOUS
researchProduct

Computing the original eBWT faster, simpler, and with less memory

2021

Mantaci et al. [TCS 2007] defined the eBWT to extend the definition of the BWT to a collection of strings, however, since this introduction, it has been used more generally to describe any BWT of a collection of strings and the fundamental property of the original definition (i.e., the independence from the input order) is frequently disregarded. In this paper, we propose a simple linear-time algorithm for the construction of the original eBWT, which does not require the preprocessing of Bannai et al. [CPM 2021]. As a byproduct, we obtain the first linear-time algorithm for computing the BWT of a single string that uses neither an end-of-string symbol nor Lyndon rotations. We combine our ne…

FOS: Computer and information sciencesComputer Science - Data Structures and AlgorithmsData Structures and Algorithms (cs.DS)
researchProduct

Comparing Sequences by the Burrows-Wheeler Transform

2004

researchProduct

Fragment assembly through minimal forbidden words

2004

researchProduct

Epichristoffel Words and Minimization of Moore Automata

2014

This paper is focused on the connection between the combinatorics of words and minimization of automata. The three main ingredients are the epichristoffel words, Moore automata and a variant of Hopcroft’s algorithm for their minimization. Epichristoffel words defined in [14] generalize some properties of circular sturmian words. Here we prove a factorization property and the existence of the reduction tree, that uniquely identifies the structure of the word. Furthermore, in the paper we investigate the problem of the minimization of Moore automata by defining a variant of Hopcroft’s minimization algorithm. The use of this variant makes simpler the computation of the running time and consequ…

Settore INF/01 - InformaticaEpichristoffel Words automata minimization.
researchProduct

Detecting mutations by eBWT

2018

In this paper we develop a theory describing how the extended Burrows-Wheeler Transform (eBWT) of a collection of DNA fragments tends to cluster together the copies of nucleotides sequenced from a genome G. Our theory accurately predicts how many copies of any nucleotide are expected inside each such cluster, and how an elegant and precise LCP array based procedure can locate these clusters in the eBWT. Our findings are very general and can be applied to a wide range of different problems. In this paper, we consider the case of alignment-free and reference-free SNPs discovery in multiple collections of reads. We note that, in accordance with our theoretical results, SNPs are clustered in th…

0301 basic medicineFOS: Computer and information sciences000 Computer science knowledge general worksBWT LCP Array SNPs Reference-free Assembly-freeLCP ArraySettore INF/01 - Informatica[SDV]Life Sciences [q-bio]Reference-freeAssembly-freeSNP03 medical and health sciences030104 developmental biologyBWTBWT; LCP Array; SNPs; Reference-free; Assembly-freeComputer ScienceComputer Science - Data Structures and AlgorithmsData Structures and Algorithms (cs.DS)[INFO]Computer Science [cs]SoftwareSNPs
researchProduct

Preface

2017

Settore ING-INF/05 - Sistemi Di Elaborazione Delle InformazioniSettore INF/01 - InformaticaComputer Science (all)String Processinginformation retrievalTheoretical Computer Science
researchProduct

Permutations, Partitions and Combinatorial Compression Boosting

2004

researchProduct

On Extremal Cases of the Hopcroft's Algorithm

2010

In this paper we consider the problem of minimization of deterministic finite automata (DFA) with reference to Hopcroft’s algorithm. Hopcroft’s algorithm has several degrees of freedom, so there can exist different executions that can lead to different sequences of refinements of the set of the states up to the final partition. We find an infinite family of binary automata for which such a process is unique, whatever strategy is chosen. Some recent papers (cf. Berstel and Carton (2004) [3], Castiglione et al. (2008) [6] and Berstel et al. (2009) [1]) have been devoted to find families of automata for which Hopcroft’s algorithm has its worst execution time. They are unary automata associated…

Hopcroft’s minimization algorithmStandard treeDeterministic finite state automataWord trees
researchProduct

Formal Languages and Automata: Models, Methods and Application. In Honour of the 70th Birthday of Antonio Restivo Preface

2017

Automata Theory
researchProduct

An extension of the Burrows-Wheeler Transform to k words

2005

researchProduct

The Burrows-Wheeler Transform: from data compression to combinatorics on words

2005

researchProduct

A new sequence distance measure based on the Burrows-Wheeler Transform

2005

researchProduct

Moore Automata and Epichristoffel Words

2012

Epichristoffel WordSettore INF/01 - InformaticaMoore Automaton
researchProduct

The Burrows-Wheeler Transform: a new tool in Combinatorics on Words

2005

researchProduct

Circular words and automata minimization

2007

researchProduct

String attractors and combinatorics on words

2019

The notion of \emph{string attractor} has recently been introduced in [Prezza, 2017] and studied in [Kempa and Prezza, 2018] to provide a unifying framework for known dictionary-based compressors. A string attractor for a word $w=w[1]w[2]\cdots w[n]$ is a subset $\Gamma$ of the positions $\{1,\ldots,n\}$, such that all distinct factors of $w$ have an occurrence crossing at least one of the elements of $\Gamma$. While finding the smallest string attractor for a word is a NP-complete problem, it has been proved in [Kempa and Prezza, 2018] that dictionary compressors can be interpreted as algorithms approximating the smallest string attractor for a given word. In this paper we explore the noti…

FOS: Computer and information sciencesSettore ING-INF/05 - Sistemi Di Elaborazione Delle InformazioniSettore INF/01 - InformaticaFormal Languages and Automata Theory (cs.FL)De Brujin wordComputer Science - Formal Languages and Automata TheoryBurrows-Wheeler transformString attractorComputer Science - Data Structures and AlgorithmsThue-Morse wordLempel-Ziv encodingBurrows-Wheeler transform; De Brujin word; Lempel-Ziv encoding; Run-length encoding; String attractor; Thue-Morse wordData Structures and Algorithms (cs.DS)Run-length encoding
researchProduct

Words, Trees and Automata Minimization

2013

In this paper we explore some connections between some combinatorial properties of words and the study of extremal cases of the automata minimization process. An intermediate role is played by the notion od word trees for which some properties of words are generalized. In particular, we describe an infinite family of binary automata, called word automata and constructed by using standard sturmian words and more specifically Fibonacci words, that represent the extremal case of some well known automata minimization algorithms, such as Moore’s and Hopcroft’s methods. As well as giving an overview of the main results in this context, the main purpose of this paper is to prove that, even if a re…

Settore INF/01 - InformaticaCombinatorics on words trees automata minimization.
researchProduct

Hopcroft’s Algorithm and Tree-like Automata

2009

In order to analyze some extremal cases of Hopcroft’s algorithm we deepened the relationship between combinatorial properties of circular words and the ex- ecution of the algorithm on cyclic automata associated to such words.In this paper we highlight the notion of word tree and in particular, we char- acterize the word trees for which Hopcroft’s algorithm on the associated tree-like automata has a unique refinement process. Moreover, we show the relationship between the time complexity of the refinements process of the Hopcroft’s algo- rithm on unary cyclic automata and binary tree-like automata. Such a result allows to exhibit a family of tree-like automata representing the worst case of …

Automata minimization Hoprcroft's algorithm trees.
researchProduct

Approximate string matching: indexing and the k-mismatch problem

2004

researchProduct