Search results for "data compression"
showing 10 items of 99 documents
Massively Parallel ANS Decoding on GPUs
2019
In recent years, graphics processors have enabled significant advances in the fields of big data and streamed deep learning. In order to keep control of rapidly growing amounts of data and to achieve sufficient throughput rates, compression features are a key part of many applications including popular deep learning pipelines. However, as most of the respective APIs rely on CPU-based preprocessing for decoding, data decompression frequently becomes a bottleneck in accelerated compute systems. This establishes the need for efficient GPU-based solutions for decompression. Asymmetric numeral systems (ANS) represent a modern approach to entropy coding, combining superior compression results wit…
Massively Parallel Huffman Decoding on GPUs
2018
Data compression is a fundamental building block in a wide range of applications. Besides its intended purpose to save valuable storage on hard disks, compression can be utilized to increase the effective bandwidth to attached storage as realized by state-of-the-art file systems. In the foreseeing future, on-the-fly compression and decompression will gain utmost importance for the processing of data-intensive applications such as streamed Deep Learning tasks or Next Generation Sequencing pipelines, which establishes the need for fast parallel implementations. Huffman coding is an integral part of a number of compression methods. However, efficient parallel implementation of Huffman decompre…
Constructing Antidictionaries of Long Texts in Output-Sensitive Space
2021
AbstractA wordxthat is absent from a wordyis calledminimalif all its proper factors occur iny. Given a collection ofkwordsy1, … ,ykover an alphabetΣ, we are asked to compute the set$\mathrm {M}^{\ell }_{\{y_1,\ldots ,y_k\}}$M{y1,…,yk}ℓof minimal absent words of length at mostℓof the collection {y1, … ,yk}. The set$\mathrm {M}^{\ell }_{\{y_1,\ldots ,y_k\}}$M{y1,…,yk}ℓcontains all the wordsxsuch thatxis absent from all the words of the collection while there existi,j, such that the maximal proper suffix ofxis a factor ofyiand the maximal proper prefix ofxis a factor ofyj. In data compression, this corresponds to computing the antidictionary ofkdocuments. In bioinformatics, it corresponds to c…
Block Sorting-Based Transformations on Words: Beyond the Magic BWT
2018
The Burrows-Wheeler Transform (BWT) is a word transformation introduced in 1994 for Data Compression and later results have contributed to make it a fundamental tool for the design of self-indexing compressed data structures. The Alternating Burrows-Wheeler Transform (ABWT) is a more recent transformation, studied in the context of Combinatorics on Words, that works in a similar way, using an alternating lexicographical order instead of the usual one. In this paper we study a more general class of block sorting-based transformations. The transformations in this new class prove to be interesting combinatorial tools that offer new research perspectives. In particular, we show that all the tra…
The Myriad Virtues of Suffix Trees
2006
Wavelet Trees have been introduced in [Grossi, Gupta and Vitter, SODA ’03] and have been rapidly recognized as a very flexible tool for the design of compressed full-text indexes and data compressors. Although several papers have investigated the beauty and usefulness of this data structure in the full-text indexing scenario, its impact on data compression has not been fully explored. In this paper we provide a complete theoretical analysis of a wide class of compression algorithms based on Wavelet Trees. We also show how to improve their asymp- totic performance by introducing a novel framework, called Generalized Wavelet Trees, that aims for the best combination of binary compressors (lik…
Image Processing Chain For Digital Still Cameras Based On The Simpil Architecture
2005
The new generation of wireless devices herald the development of products for integrated portable image and video communication requiring to image and video applications high computing performance. Portable MultiMedia Supercomputers (PMMS), a new class of architectures, allow to combine high computational performance, needed by multimedia applications, and a big energy efficiency, needed by portable devices. Among PMMS, the SIMPil (SIMD processor pixel) architecture satisfies the above requirements, especially with video and digital images processing tasks. In this paper we, exploit the SIMPil computation and throughput efficiency to implement the whole image processing chain of a digital s…
FASTA/Q data compressors for MapReduce-Hadoop genomics: space and time savings made easy
2021
Abstract Background Storage of genomic data is a major cost for the Life Sciences, effectively addressed via specialized data compression methods. For the same reasons of abundance in data production, the use of Big Data technologies is seen as the future for genomic data storage and processing, with MapReduce-Hadoop as leaders. Somewhat surprisingly, none of the specialized FASTA/Q compressors is available within Hadoop. Indeed, their deployment there is not exactly immediate. Such a State of the Art is problematic. Results We provide major advances in two different directions. Methodologically, we propose two general methods, with the corresponding software, that make very easy to deploy …
The Myriad Virtues of Wavelet Trees
2009
Wavelet Trees have been introduced in [Grossi, Gupta and Vitter, SODA '03] and have been rapidly recognized as a very flexible tool for the design of compressed full-text indexes and data compressors. Although several papers have investigated the beauty and usefulness of this data structure in the full-text indexing scenario, its impact on data compression has not been fully explored. In this paper we provide a complete theoretical analysis of a wide class of compression algorithms based on Wavelet Trees. We also show how to improve their asymptotic performance by introducing a novel framework, called Generalized Wavelet Trees, that aims for the best combination of binary compressors (like,…
Textual data compression in computational biology: Algorithmic techniques
2012
Abstract In a recent review [R. Giancarlo, D. Scaturro, F. Utro, Textual data compression in computational biology: a synopsis, Bioinformatics 25 (2009) 1575–1586] the first systematic organization and presentation of the impact of textual data compression for the analysis of biological data has been given. Its main focus was on a systematic presentation of the key areas of bioinformatics and computational biology where compression has been used together with a technical presentation of how well-known notions from information theory have been adapted to successfully work on biological data. Rather surprisingly, the use of data compression is pervasive in computational biology. Starting from…
Data Compression Approach for Long-Term Monitoring of Pavement Structures
2019
Pavement structures are designed to withstand continuous damage during their design life. Damage starts as soon as the pavement is open to traffic and increases with time. If maintenance activities are not considered in the initial design or considered but not applied during the service life, damage will grow to a point where rehabilitation may be the only and most expensive option left. In order to monitor the evolution of damage and its severity in pavement structures, a novel data compression approach based on cumulative measurements from a piezoelectric sensor is presented in this paper. Specifically, the piezoelectric sensor uses a thin film of polyvinylidene fluoride to sense the ener…