6533b82ffe1ef96bd1294fb0

RESEARCH PRODUCT

Deriving and comparing deduplication techniques using a model-based classification

Jürgen KaiserTim SüßAndré BrinkmannDirk Meister

subject

Set (abstract data type)Work (electrical)Computer scienceData deduplicationData miningcomputer.software_genrecomputerReal world data

description

Data deduplication has been a hot research topic and a large number of systems have been developed. These systems are usually seen as an inherently linked set of characteristics. However, a detailed analysis shows independent concepts that can be used in other systems. In this work, we perform this analysis on the main representatives of deduplication systems. We embed the results in a model, which shows two yet unexplored combinations of characteristics. In addition, the model enables a comprehensive evaluation of the representatives and the two new systems. We perform this evaluation based on real world data sets.

https://doi.org/10.1145/2741948.2741952