0000000000009901

AUTHOR

Alessio Langiu

showing 13 related works from this author

ON-LINE CONSTRUCTION OF A SMALL AUTOMATON FOR A FINITE SET OF WORDS

2012

In this paper we describe a "light" algorithm for the on-line construction of a small automaton recognising a finite set of words. The algorithm runs in linear time. We carried out good experimental results on real dictionaries, on biological sequences and on the sets of suffixes (resp. factors) of a set of words that shows how our automaton is near to the minimal one. For the suffixes of a text, we propose a modified construction that leads to an even smaller automaton. We moreover construct linear algorithms for the insertion and deletion of a word in a finite set, directly from the constructed automaton.

minimal automata[INFO.INFO-DS]Computer Science [cs]/Data Structures and Algorithms [cs.DS]Timed automatondeterministic automataBüchi automaton0102 computer and information sciences02 engineering and technology01 natural sciencesDeterministic automaton0202 electrical engineering electronic engineering information engineeringComputer Science (miscellaneous)Two-way deterministic finite automatonNondeterministic finite automatonMathematicsonline construction.Discrete mathematicsSettore INF/01 - InformaticaPowerset constructionPushdown automatonComputer Science::Computation and Language (Computational Linguistics and Natural Language and Speech Processing)010201 computation theory & mathematicsProbabilistic automaton020201 artificial intelligence & image processingFinite set of wordAlgorithmComputer Science::Formal Languages and Automata Theory
researchProduct

On parsing optimality for dictionary-based text compression—the Zip case

2013

Dictionary-based compression schemes are the most commonly used data compression schemes since they appeared in the foundational paper of Ziv and Lempel in 1977, and generally referred to as LZ77. Their work is the base of Zip, gZip, 7-Zip and many other compression software utilities. Some of these compression schemes use variants of the greedy approach to parse the text into dictionary phrases; others have left the greedy approach to improve the compression ratio. Recently, two bit-optimal parsing algorithms have been presented filling the gap between theory and best practice. We present a survey on the parsing problem for dictionary-based text compression, identifying noticeable results …

Theoretical computer scienceComputer scienceData_CODINGANDINFORMATIONTHEORYTop-down parsingcomputer.software_genreTheoretical Computer ScienceParsing optimalityCompression (functional analysis)Discrete Mathematics and CombinatoricsLossless compressionParsingLZ77 algorithmSettore INF/01 - InformaticaDeflate algorithmbusiness.industryDictionary-based text compressionComputational Theory and MathematicsData compressionDEFLATECompression ratioArtificial intelligencebusinesscomputerNatural language processingBottom-up parsingData compressionJournal of Discrete Algorithms
researchProduct

The rightmost equal-cost position problem.

2013

LZ77-based compression schemes compress the input text by replacing factors in the text with an encoded reference to a previous occurrence formed by the couple (length, offset). For a given factor, the smallest is the offset, the smallest is the resulting compression ratio. This is optimally achieved by using the rightmost occurrence of a factor in the previous text. Given a cost function, for instance the minimum number of bits used to represent an integer, we define the Rightmost Equal-Cost Position (REP) problem as the problem of finding one of the occurrences of a factor whose cost is equal to the cost of the rightmost one. We present the Multi-Layer Suffix Tree data structure that, for…

FOS: Computer and information sciencesOffset (computer science)Computer scienceSuffix treeComputer Science - Information Theorylaw.inventionCombinatoricslawLog-log plotComputer Science - Data Structures and AlgorithmsCompression schemetext compressiondictionary text compressionData Structures and Algorithms (cs.DS)LZ77 compressiondata compressionLossless compressionfull text indexSuffix Tree Data StructuresSettore INF/01 - InformaticaInformation Theory (cs.IT)Data structurePrefixCompression ratioCompression scheme; Constant time; Suffix Tree Data StructuresAlgorithmData compressionConstant time
researchProduct

Dictionary-symbolwise flexible parsing

2012

AbstractLinear-time optimal parsing algorithms are rare in the dictionary-based branch of the data compression theory. A recent result is the Flexible Parsing algorithm of Matias and Sahinalp (1999) that works when the dictionary is prefix closed and the encoding of dictionary pointers has a constant cost. We present the Dictionary-Symbolwise Flexible Parsing algorithm that is optimal for prefix-closed dictionaries and any symbolwise compressor under some natural hypothesis. In the case of LZ78-like algorithms with variable costs and any, linear as usual, symbolwise compressor we show how to implement our parsing algorithm in linear time. In the case of LZ77-like dictionaries and any symbol…

Theoretical computer scienceComputer science[INFO.INFO-DS]Computer Science [cs]/Data Structures and Algorithms [cs.DS][INFO.INFO-DS] Computer Science [cs]/Data Structures and Algorithms [cs.DS]Data_CODINGANDINFORMATIONTHEORY0102 computer and information sciences02 engineering and technologycomputer.software_genre01 natural sciencesDirected acyclic graphTheoretical Computer ScienceConstant (computer programming)020204 information systemsEncoding (memory)Optimal parsing0202 electrical engineering electronic engineering information engineeringDiscrete Mathematics and CombinatoricsStringologySymbolwise text compressionTime complexityLossless compressionParsingSettore INF/01 - InformaticaDictionary-based compressionOptimal Parsing Lossless Data Compression DAGDirected acyclic graphPrefixComputational Theory and MathematicsText compression010201 computation theory & mathematicsAlgorithmcomputerBottom-up parsingData compressionJournal of Discrete Algorithms
researchProduct

Pelagic species identification by using a PNN neural network and echo-sounder data

2017

For several years, a group of CNR researchers conducted acoustic surveys in the Sicily Channel to estimate the biomass of small pelagic species, their geographical distribution and their variations over time. The instrument used to carry out these surveys is the scientific echo-sounder, set for different frequencies. The processing of the back scattered signals in the volume of water under investigation determines the abundance of the species. These data are then correlated with the biological data of experimental catches, to attribute the composition of the various fish schools investigated. Of course, the recognition of the fish schools helps to produce very good results, that is very clo…

Probabilistic neural networkComputer Science (all)ClassificationPelagic species identificationTheoretical Computer Science
researchProduct

Abelian Powers and Repetitions in Sturmian Words

2016

Richomme, Saari and Zamboni (J. Lond. Math. Soc. 83: 79-95, 2011) proved that at every position of a Sturmian word starts an abelian power of exponent $k$ for every $k > 0$. We improve on this result by studying the maximum exponents of abelian powers and abelian repetitions (an abelian repetition is an analogue of a fractional power) in Sturmian words. We give a formula for computing the maximum exponent of an abelian power of abelian period $m$ starting at a given position in any Sturmian word of rotation angle $\alpha$. vAs an analogue of the critical exponent, we introduce the abelian critical exponent $A(s_\alpha)$ of a Sturmian word $s_\alpha$ of angle $\alpha$ as the quantity $A(s_\a…

FOS: Computer and information sciencesFibonacci numberGeneral Computer ScienceDiscrete Mathematics (cs.DM)Formal Languages and Automata Theory (cs.FL)[INFO.INFO-DS]Computer Science [cs]/Data Structures and Algorithms [cs.DS]Computer Science - Formal Languages and Automata Theory0102 computer and information sciences01 natural sciencesTheoretical Computer ScienceCombinatoricsFOS: MathematicsMathematics - Combinatorics[INFO]Computer Science [cs]Number Theory (math.NT)0101 mathematicsAbelian groupContinued fractionFibonacci wordComputingMilieux_MISCELLANEOUSQuotientMathematicsMathematics - Number Theoryta111010102 general mathematicsComputer Science (all)Sturmian wordSturmian wordAbelian period; Abelian power; Critical exponent; Lagrange constant; Sturmian word; Theoretical Computer Science; Computer Science (all)Abelian periodLagrange constantCritical exponentAbelian power010201 computation theory & mathematicsBounded functionExponentCombinatorics (math.CO)Computer Science::Formal Languages and Automata TheoryComputer Science - Discrete Mathematics
researchProduct

Indexing a sequence for mapping reads with a single mismatch

2014

Mapping reads against a genome sequence is an interesting and useful problem in computational molecular biology and bioinformatics. In this paper, we focus on the problem of indexing a sequence for mapping reads with a single mismatch. We first focus on a simpler problem where the length of the pattern is given beforehand during the data structure construction. This version of the problem is interesting in its own right in the context of the next generation sequencing. In the sequel, we show how to solve the more general problem. In both cases, our algorithm can construct an efficient data structure in time and space and can answer subsequent queries in time. Here, n is the length of the s…

Computer sciencegenome sequenceGeneral Mathematics[INFO.INFO-DS]Computer Science [cs]/Data Structures and Algorithms [cs.DS]General Physics and AstronomyContext (language use)algorithmscomputer.software_genrePattern matchingSequenceSearch engine indexingGeneral EngineeringWildcard characterArticlescomputer.file_formatConstruct (python library)Data structuremapping readspattern matchingComputingMethodologies_DOCUMENTANDTEXTPROCESSINGData mining[INFO.INFO-BI]Computer Science [cs]/Bioinformatics [q-bio.QM]Focus (optics)mismatchcomputerAlgorithmindexingPhilosophical Transactions of the Royal Society A: Mathematical, Physical and Engineering Sciences
researchProduct

Abelian Repetitions in Sturmian Words

2012

We investigate abelian repetitions in Sturmian words. We exploit a bijection between factors of Sturmian words and subintervals of the unitary segment that allows us to study the periods of abelian repetitions by using classical results of elementary Number Theory. We prove that in any Sturmian word the superior limit of the ratio between the maximal exponent of an abelian repetition of period $m$ and $m$ is a number $\geq\sqrt{5}$, and the equality holds for the Fibonacci infinite word. We further prove that the longest prefix of the Fibonacci infinite word that is an abelian repetition of period $F_j$, $j>1$, has length $F_j(F_{j+1}+F_{j-1} +1)-2$ if $j$ is even or $F_j(F_{j+1}+F_{j-1}…

FOS: Computer and information sciencesFibonacci numberDiscrete Mathematics (cs.DM)Formal Languages and Automata Theory (cs.FL)Computer Science - Formal Languages and Automata TheoryG.2.168R15FOS: MathematicsCombinatorics on words Sturmian wordMathematics - CombinatoricsAbelian groupFibonacci wordMathematicsDiscrete mathematicsMathematics::CombinatoricsSturmian wordCombinatorics on wordsNumber theoryF.2.2; F.4.3; G.2.1F.4.3ExponentCombinatorics (math.CO)F.2.2Word (group theory)Computer Science::Formal Languages and Automata TheoryComputer Science - Discrete Mathematics
researchProduct

Bacteria classification using minimal absent words

2017

Bacteria classification has been deeply investigated with different tools for many purposes, such as early diagnosis, metagenomics, phylogenetics. Classification methods based on ribosomal DNA sequences are considered a reference in this area. We present a new classificatier for bacteria species based on a dissimilarity measure of purely combinatorial nature. This measure is based on the notion of Minimal Absent Words, a combinatorial definition that recently found applications in bioinformatics. We can therefore incorporate this measure into a probabilistic neural network in order to classify bacteria species. Our approach is motivated by the fact that there is a vast literature on the com…

0301 basic medicinesupervised classificationRelation (database)Computer science0102 computer and information sciences01 natural sciencesMeasure (mathematics)03 medical and health sciencesProbabilistic neural networkcombinatorics on wordsprobabilistic neural networkminimal absent wordlcsh:R5-920Settore INF/01 - Informaticabusiness.industryBacterial taxonomyPattern recognitionbacteria classificationGeneral MedicineCombinatorics on words030104 developmental biology010201 computation theory & mathematicsMetagenomicsClassification methodsArtificial intelligencebusinesslcsh:Medicine (General)AIMS Medical Science
researchProduct

Artificial neural networks for fault tollerance of an air-pressure sensor network

2017

A meteorological tsunami, commonly called Meteotsunami, is a tsunami-like wave originated by rapid changes in barometric pressure that involve the displacement of a body of water. This phenomenon is usually present in the sea cost area of Mazara del Vallo (Sicily, Italy), in particular in the internal part of the seaport canal, sometimes making local population at risk. The Institute for Coastal Marine Environment (IAMC) of the National Research Council in Italy (CNR) have already conducted several studies upon meteotsunami phenomenon. One of the project has regarded the creation of a sensors network composed by micro-barometric sensors, located in 4 different stations close to the seaport …

Settore ING-INF/05 - Sistemi Di Elaborazione Delle InformazioniMeteotsunamiSettore INF/01 - InformaticaPressure sensorComputer Science (all)Neural networkTheoretical Computer Science
researchProduct

Optimal Parsing for Dictionary-Based Compression

2013

Dictionary-based compression algorithms include a parsing strategy to transform the input text into a sequence of dictionary phrases. Given a text, such process usually is not unique and, for compression purposes, it makes sense to find one of the possible parsing that minimise the final compression ratio. This is the parsing problem. In more than 30 years of history of dictionary-based text compression only few optimal parsing algorithms were presented. Most of the practical dictionary-based compression solutions need or prefer to factorise the input data into a sequence of dictionary-phrases and symbols. Those two output categories are usually encoded via two different encoders producing …

Settore INF/01 - Informaticaoptimal parsingtext compressiondata compressionLZ-compression
researchProduct

Optimal Parsing for Dictionary Text Compression

2012

Dictionary-based compression algorithms include a parsing strategy to transform the input text into a sequence of dictionary phrases. Given a text, such process usually is not unique and, for compression purpose, it makes sense to find one of the possible parsing that minimize the final compression ratio. This is the parsing problem. An optimal parsing is a parsing strategy or a parsing algorithm that solve the parsing problem taking account of all the constraints of a compression algorithm or of a class of homogeneous compression algorithms. Compression algorithm constrains are, for instance, the dictionary itself, i.e. the dynamic set of available phrases, and how much a phrase weights on…

Text Compression;Settore INF/01 - InformaticaText Compression
researchProduct

Metodo e apparato per la compressione e la decompressione dati tramite il quale si possono utilizzare efficientemente molteplici compressori e decomp…

2010

Settore INF/01 - Informaticaoptimal parsingdictionary-based compressiondata compression
researchProduct