0000000000115064

AUTHOR

Giovanni Manzini

0000-0002-5047-0196

showing 7 related works from this author

The Myriad Virtues of Wavelet Trees

2009

Wavelet Trees have been introduced in [Grossi, Gupta and Vitter, SODA '03] and have been rapidly recognized as a very flexible tool for the design of compressed full-text indexes and data compressors. Although several papers have investigated the beauty and usefulness of this data structure in the full-text indexing scenario, its impact on data compression has not been fully explored. In this paper we provide a complete theoretical analysis of a wide class of compression algorithms based on Wavelet Trees. We also show how to improve their asymptotic performance by introducing a novel framework, called Generalized Wavelet Trees, that aims for the best combination of binary compressors (like,…

Binary treeWeight-balanced treeWavelet transformCascade algorithmData_CODINGANDINFORMATIONTHEORYHuffman codingData CompressionTheoretical Computer ScienceComputer Science ApplicationsSet partitioning in hierarchical treessymbols.namesakeWaveletComputational Theory and Mathematicssymbolsempirical entropyBurrows-Wheeler TransformAlgorithmData compressionMathematicsInformation SystemsWavelet Trees
researchProduct

Compression-based classification of biological sequences and structures via the Universal Similarity Metric: experimental assessment.

2007

Abstract Background Similarity of sequences is a key mathematical notion for Classification and Phylogenetic studies in Biology. It is currently primarily handled using alignments. However, the alignment methods seem inadequate for post-genomic studies since they do not scale well with data set size and they seem to be confined only to genomic and proteomic sequences. Therefore, alignment-free similarity measures are actively pursued. Among those, USM (Universal Similarity Metric) has gained prominence. It is based on the deep theory of Kolmogorov Complexity and universality is its most novel striking feature. Since it can only be approximated via data compression, USM is a methodology rath…

Computer scienceAlgorismesPrediction by partial matchingCompression dissimilaritycomputer.software_genreBiochemistryProtein Structure SecondaryPhylogenetic studiesStructural BiologySequence Analysis ProteinDatabases Proteinlcsh:QH301-705.5Biological dataNCDApplied MathematicsGenomicsClassificationCDComputer Science ApplicationsBenchmarking:Informàtica::Informàtica teòrica [Àrees temàtiques de la UPC]Universal compression dissimilarityArea Under CurveMetric (mathematics)lcsh:R858-859.7Data miningAlgorithmsData compressionResearch Article:Informàtica::Aplicacions de la informàtica::Bioinformàtica [Àrees temàtiques de la UPC]Normalization (statistics)lcsh:Computer applications to medicine. Medical informaticsBioinformatics Sequence Alignment AlgorithmsSet (abstract data type)Similarity (network science)Normalized compression sissimilarityData compression (Computer science)AnimalsHumansAmino Acid SequenceMolecular BiologyBiologyDades -- Compressió (Informàtica)USMUniversal similarity metricProteinsUCDProtein Structure TertiaryData setGenòmicaStatistical classificationlcsh:Biology (General)ROC CurvecomputerSequence AlignmentSoftwareBMC bioinformatics
researchProduct

Block Sorting-Based Transformations on Words: Beyond the Magic BWT

2018

The Burrows-Wheeler Transform (BWT) is a word transformation introduced in 1994 for Data Compression and later results have contributed to make it a fundamental tool for the design of self-indexing compressed data structures. The Alternating Burrows-Wheeler Transform (ABWT) is a more recent transformation, studied in the context of Combinatorics on Words, that works in a similar way, using an alternating lexicographical order instead of the usual one. In this paper we study a more general class of block sorting-based transformations. The transformations in this new class prove to be interesting combinatorial tools that offer new research perspectives. In particular, we show that all the tra…

0301 basic medicineSettore INF/01 - InformaticaComputer scienceData_CODINGANDINFORMATIONTHEORY0102 computer and information sciencesBlock sortingData structureLexicographical order01 natural sciencesUpper and lower bounds03 medical and health sciencesCombinatorics on words030104 developmental biology010201 computation theory & mathematicsArithmeticCompressed Data Structures Block Sorting Combinatorics on Words AlgorithmsData compression
researchProduct

The Engineering of a Compression Boosting Library: Theory vs Practice in BWT Compression

2006

Data Compression is one of the most challenging arenas both for algorithm design and engineering. This is particularly true for Burrows and Wheeler Compression a technique that is important in itself and for the design of compressed indexes. There has been considerable debate on how to design and engineer compression algorithms based on the BWT paradigm. In particular, Move-to-Front Encoding is generally believed to be an "inefficient " part of the Burrows-Wheeler compression process. However, only recently two theoretically superior alternatives to Move-to-Front have been proposed, namely Compression Boosting and Wavelet Trees. The main contribution of this paper is to provide the first ex…

Lossless compressionBoosting (machine learning)Computer sciencebusiness.industrySupervised learningCompression Boosting LibraryData_CODINGANDINFORMATIONTHEORYMachine learningcomputer.software_genreWaveletAlgorithm designArtificial intelligencebusinesscomputerAlgorithmsData compression
researchProduct

Inducing the Lyndon Array

2019

In this paper we propose a variant of the induced suffix sorting algorithm by Nong (TOIS, 2013) that computes simultaneously the Lyndon array and the suffix array of a text in $O(n)$ time using $\sigma + O(1)$ words of working space, where $n$ is the length of the text and $\sigma$ is the alphabet size. Our result improves the previous best space requirement for linear time computation of the Lyndon array. In fact, all the known linear algorithms for Lyndon array computation use suffix sorting as a preprocessing step and use $O(n)$ words of working space in addition to the Lyndon array and suffix array. Experimental results with real and synthetic datasets show that our algorithm is not onl…

FOS: Computer and information sciences050101 languages & linguisticsComputer scienceComputationInduced suffix sorting02 engineering and technologySpace (mathematics)law.inventionSuffix sortinglawSuffix arrayComputer Science - Data Structures and Algorithms0202 electrical engineering electronic engineering information engineeringData_FILESPreprocessorData Structures and Algorithms (cs.DS)0501 psychology and cognitive sciencesComputer Science::Data Structures and AlgorithmsTime complexitySettore ING-INF/05 - Sistemi Di Elaborazione Delle InformazioniSettore INF/01 - Informatica05 social sciencesLightweight algorithmSuffix arraySigmaComputer Science::Computation and Language (Computational Linguistics and Natural Language and Speech Processing)Induced suffix sorting; Lightweight algorithms; Lyndon array; Suffix arrayWorking spaceLyndon arrayLightweight algorithms020201 artificial intelligence & image processingAlgorithmComputer Science::Formal Languages and Automata Theory
researchProduct

Boosting Textual Compression in Optimal Linear Time

2005

We provide a general boosting technique for Textual Data Compression. Qualitatively, it takes a good compression algorithm and turns it into an algorithm with a better compression performance guarantee. It displays the following remarkable properties: (a) it can turn any memoryless compressor into a compression algorithm that uses the “best possible” contexts; (b) it is very simple and optimal in terms of time; and (c) it admits a decompression algorithm again optimal in time. To the best of our knowledge, this is the first boosting technique displaying these properties.Technically, our boosting technique builds upon three main ingredients: the Burrows--Wheeler Transform, the Suffix Tree d…

Theoretical computer scienceBurrows–Wheeler transformSuffix treeString (computer science)Data_CODINGANDINFORMATIONTHEORYBurrows-Wheeler transformSubstringArithmetic codinglaw.inventionLempel-Ziv compressorsArtificial IntelligenceHardware and ArchitectureControl and Systems Engineeringlawtext compressionempirical entropyArithmetic codingGreedy algorithmTime complexityAlgorithmSoftwareInformation SystemsMathematicsData compression
researchProduct

The Alternating BWT: an algorithmic perspective

2020

Abstract The Burrows-Wheeler Transform (BWT) is a word transformation introduced in 1994 for Data Compression. It has become a fundamental tool for designing self-indexing data structures, with important applications in several areas in science and engineering. The Alternating Burrows-Wheeler Transform (ABWT) is another transformation recently introduced in Gessel et al. (2012) [21] and studied in the field of Combinatorics on Words. It is analogous to the BWT, except that it uses an alternating lexicographical order instead of the usual one. Building on results in Giancarlo et al. (2018) [23] , where we have shown that BWT and ABWT are part of a larger class of reversible transformations, …

Discrete mathematicsFOS: Computer and information sciencesSettore INF/01 - InformaticaGeneral Computer ScienceBasis (linear algebra)Computer scienceAlternating Burrows-Wheeler TransformGalois wordRank-invertibilityField (mathematics)Data structureTheoretical Computer ScienceTransformation (function)Difference cover algorithmComputer Science - Data Structures and AlgorithmsData Structures and Algorithms (cs.DS)Time complexityAlternating Burrows-Wheeler Transform; Difference cover algorithm; Galois word; Rank-invertibilityWord (computer architecture)Data compression
researchProduct