6533b832fe1ef96bd129ac79

RESEARCH PRODUCT

Massively Parallel ANS Decoding on GPUs

Bertil SchmidtAndré Weißenberger

subject

020203 distributed computingComputer science020206 networking & telecommunicationsData_CODINGANDINFORMATIONTHEORY02 engineering and technologyParallel computingCUDAScalability0202 electrical engineering electronic engineering information engineeringCodecSIMDEntropy encodingMassively parallelDecoding methodsData compression

description

In recent years, graphics processors have enabled significant advances in the fields of big data and streamed deep learning. In order to keep control of rapidly growing amounts of data and to achieve sufficient throughput rates, compression features are a key part of many applications including popular deep learning pipelines. However, as most of the respective APIs rely on CPU-based preprocessing for decoding, data decompression frequently becomes a bottleneck in accelerated compute systems. This establishes the need for efficient GPU-based solutions for decompression. Asymmetric numeral systems (ANS) represent a modern approach to entropy coding, combining superior compression results with high compression and decompression speeds. Concepts for parallelizing ANS decompression on GPUs have been published recently. However, they only exhibit limited scalability in practical applications. In this paper, we present the first massively parallel, arbitrarily scalable approach to ANS decoding on GPUs, based on a novel overflow pattern. Our performance evaluation on three different CUDA-enabled GPUs (V100, TITAN V, GTX 1080) demonstrates speedups of up to 17 over 64 CPU threads, up to 31 over a high performance SIMD-based solution, and up to 39 over Zstandard's entropy codec. Our implementation is publicly available at https://github.com/weissenberger/multians.

https://doi.org/10.1145/3337821.3337888