Author: Raffaele Giancarlo

0000000000024015

AUTHOR

Raffaele Giancarlo

Forewords

The Myriad Virtues of Wavelet Trees

Wavelet Trees have been introduced in [Grossi, Gupta and Vitter, SODA '03] and have been rapidly recognized as a very flexible tool for the design of compressed full-text indexes and data compressors. Although several papers have investigated the beauty and usefulness of this data structure in the full-text indexing scenario, its impact on data compression has not been fully explored. In this paper we provide a complete theoretical analysis of a wide class of compression algorithms based on Wavelet Trees. We also show how to improve their asymptotic performance by introducing a novel framework, called Generalized Wavelet Trees, that aims for the best combination of binary compressors (like,…

0000000000024015

AUTHOR

Raffaele Giancarlo

Forewords

The Myriad Virtues of Wavelet Trees

Computational cluster validation for microarray data analysis: experimental assessment of Clest, Consensus Clustering, Figure of Merit, Gap Statistics and Model Explorer

ValWorkBench: an open source Java library for cluster validation, with applications to microarray data analysis.

Basic Statistical Indices for SeqAn

On the Suitability of Neural Networks as Building Blocks for the Design of Efficient Learned Indexes

Topological ranks reveal functional knowledge encoded in biological networks: a comparative analysis

Multi-dimensional pattern matching with dimensional wildcards

Differential Expression of Proteolytic Enzymes During Epithelia-Mesenchyma Transcation of Endothelial Cells

Indexed Two-Dimensional String Matching

On the determinization of weighted finite automata

Compression-based classification of biological sequences and structures via the Universal Similarity Metric: experimental assessment.

Guest Editors' Introduction to the Special Section on Algorithms in Bioinformatics

From First Principles to the Burrows and Wheeler Transform and Beyond, via Combinatorial Optimization

FASTdoop: A versatile and efficient library for the input of FASTA and FASTQ files for MapReduce Hadoop bioinformatics applications

Differential expression of proteolytic enzymes during epithelial-mesenchymal transaction of endothelial cells.

Optimal Partitions of Strings: A New Class of Burrows-Wheeler Compression Algorithms

Block Sorting-Based Transformations on Words: Beyond the Magic BWT

Distance Functions, Clustering Algorithms and Microarray Data Analysis

Parallel Construction and Query of Index Data Structures for Pattern Matching on Square Matrices

On finding common neighborhoods in massive graphs

Network Centralities and Node Ranking

Preface

Textual data compression in computational biology: Algorithmic techniques

The Power of Word-Frequency Based Alignment-Free Functions: a Comprehensive Large-Scale Experimental Analysis

On-line construction of two-dimensional suffix trees

Speeding up the Consensus Clustering methodology for microarray data analysis

The Engineering of a Compression Booster: Theory Vs Practice in BWT Compression

Multi-Dimensional Pattern Matching with Dimensional Wildcards: Data Structures and Optimal On-Line Search Algorithms

Articles selected from posters presented at the Tenth Annual International Conference on Research in Computational Biology - Preface and Special Issue

Mapreduce in computational biology via hadoop and spark

O(n 2 log n) Time On-Line Construction of Two-Dimensional Suffix Trees

A Critical Analysis of Classifier Selection in Learned Bloom Filters

Textual data compression in computational biology: a synopsis.

Permutations, Partitions and Combinatorial Compression Boosting

Functional Information, Biomolecular Messages and Complexity of BioSequences and Structures

Epigenomic k-mer dictionaries: shedding light on how sequence composition influences in vivo nucleosome positioning

Learned Sorted Table Search and Static Indexes in Small-Space Data Models

Algorithmic Aspects of Speech Recognition: A Synopsis

Algorithmic paradigms for stability-based cluster validity and model selection statistical methods, with applications to microarray data analysis

On the Construction of Classes of Suffix Trees for Square Matrices: Algorithms and Applications

FASTA/Q data compressors for MapReduce-Hadoop genomics: space and time savings made easy

The Engineering of a Compression Boosting Library: Theory vs Practice in BWT Compression

Standard Vs Uniform Binary Search and Their Variants in Learned Static Indexing: The Case of the Searching on Sorted Data Benchmarking Software Platform

Generalizations of the periodicity Theorem of Fine and Wilf

A basic analysis toolkit for biological sequences

Pattern Matching Algorithms

Table Compression

DNA combinatorial messages and Epigenomics: The case of chromatin organization and nucleosome occupancy in eukaryotic genomes

Forewords-Special Issue Combinatorial Pattern Matching 2011

The Three Steps of Clustering In The Post-Genomic Era

Novel Combinatorial and Information-Theoretic Alignment-Free Distances for Biological Data Mining

A methodology to assess the intrinsic discriminative ability of a distance function and its interplay with clustering algorithms for microarray data analysis

Periodicity and repetitions in parameterized strings

Boosting Textual Compression in Optimal Linear Time

Statistical Indexes for Computational and Data Driven Class Discovery in Microarray Data

Alignment-free Genomic Analysis via a Big Data Spark Platform

Bayesian versus data driven model selection for microarray data

$O(n^2 log n)$ Time On-line Construction of Two-Dimensional Suffix Trees

Compressive biological sequence analysis and archival in the era of high-throughput sequencing technologies

Learned Sorted Table Search and&nbsp;Static Indexes in&nbsp;Small Model Space

SAIL: String Algorithms, Information and Learning, Preface and Special Issue

Computation Cluster Validation in the Big Data Era

An effective extension of the applicability of alignment-free biological sequence comparison algorithms with Hadoop

Stability-Based Model Selection for High Throughput Genomic Data: An Algorithmic Paradigm

Longest Common Subsequence from Fragments via Sparse Dynamic Programming

Alignment-Free Sequence Comparison over Hadoop for Computational Biology

Algorithms in Bioinformatics

Algorithmics for the Life Sciences

New results for finding common neighborhoods in massive graphs in the data stream model

Foreword: Special issue in honor of the 60th Birthday of Professor Alberto Apostolico

The intrinsic combinatorial organization and information theoretic content of a sequence are correlated to the DNA encoded nucleosome organization of eukaryotic genomes

Algoritmica Per Le Scienze Della Vita

A Tutorial on Computational Cluster Analysis with Applications to Pattern Discovery in Microarray Data

The Three Steps of Clustering in the Post-Genomic Era: A Synopsis

Additional file 1 of FASTA/Q data compressors for MapReduce-Hadoop genomics: space and time savings made easy

Improving table compression with combinatorial optimization

Learned Sorted Table Search and Static Indexes in Small Model Space