6533b839fe1ef96bd12a6ae8

RESEARCH PRODUCT

Optimal Parsing for Dictionary-Based Compression

Alessio Langiu

subject

Settore INF/01 - Informaticaoptimal parsingtext compressiondata compressionLZ-compression

description

Dictionary-based compression algorithms include a parsing strategy to transform the input text into a sequence of dictionary phrases. Given a text, such process usually is not unique and, for compression purposes, it makes sense to find one of the possible parsing that minimise the final compression ratio. This is the parsing problem. In more than 30 years of history of dictionary-based text compression only few optimal parsing algorithms were presented. Most of the practical dictionary-based compression solutions need or prefer to factorise the input data into a sequence of dictionary-phrases and symbols. Those two output categories are usually encoded via two different encoders producing a compressed output that is a mixture of two compressors. This book contains a review of many dictionary-based compression schemes, their theoretical basis, a focus on the parsing problem and related problems, a recent theoretical model for such compression schemes, and an optimal solution called Dictionary-Symbolwise Flexible Parsing that covers almost all the classic dictionary-based compression schemes and the more general Dictionary-Symbolwise variant, where letters and dictionary references are compressed via different variable-length encoders.

http://hdl.handle.net/10447/86125