6533b862fe1ef96bd12c6169
RESEARCH PRODUCT
Design of an exact data deduplication cluster
Sascha EffertJürgen KaiserAndré BrinkmannDirk Meistersubject
EthernetLoad managementbusiness.industryComputer scienceScalabilityData_FILESLocal area networkData deduplicationFault toleranceLoad balancing (computing)businessComputer networkElectronic data interchangedescription
Data deduplication is an important component of enterprise storage environments. The throughput and capacity limitations of single node solutions have led to the development of clustered deduplication systems. Most implemented clustered inline solutions are trading deduplication ratio versus performance and are willing to miss opportunities to detect redundant data, which a single node system would detect. We present an inline deduplication cluster with a joint distributed chunk index, which is able to detect as much redundancy as a single node solution. The use of locality and load balancing paradigms enables the nodes to minimize information exchange. Therefore, we are able to show that, despite different claims in previous papers, it is possible to combine exact deduplication, small chunk sizes, and scalability within one environment using only a commodity GBit Ethernet interconnect. Additionally, we investigate the throughput and scalability limitations with a special focus on the intra-node communication.
year | journal | country | edition | language |
---|---|---|---|---|
2012-04-01 | 012 IEEE 28th Symposium on Mass Storage Systems and Technologies (MSST) |