Author: Jürgen Kaiser

0000000000161620

AUTHOR

Jürgen Kaiser

Extending SSD lifetime in database applications with page overwrites

Flash-based Solid State Disks (SSDs) have been a great success story over the last years and are widely used in embedded systems, servers, and laptops.One often overlooked ability of NAND flash is that flash pages can be overwritten in certain circumstances. This can be used to decrease wear out and increase performance.In this paper, we analyze the potential of overwrites for the most used data structure in database applications: the B-Tree. We show that with overwrites it is possible to significantly reduce flash wear out and increase overall performance.

research product

Phosphonomethyl-substituted phenols are readily obtained from o-hydroxymethylated phenols and trialkyl phosphites. The free acids, incorporated into phenol-formaldehyde resins, act as cation exchangers with remarkable selectivity for different metal ions.

research product

MCD: Overcoming the Data Download Bottleneck in Data Centers

The data download problem in data centers describes the increasingly common task of coordinated loading of identical data to a large number of nodes. Data download is seen as a significant problem in exascale HPC applications. Uncoor-dinated reading from a central file server creates contention at the file server and its network interconnect. We propose and evaluation a reliable multicast based approach to solve the data download problem. The MCD system builds a logical multi-rooted tree based on the physical network topology and uses the logical view for a two-phase approach. In the first phase, the data is multicasted to all nodes. In the second phase, the logical tree is used for an effi…

research product

Deduplication Potential of HPC Applications’ Checkpoints

HPC systems contain an increasing number of components, decreasing the mean time between failures. Checkpoint mechanisms help to overcome such failures for long-running applications. A viable solution to remove the resulting pressure from the I/O backends is to deduplicate the checkpoints. However, there is little knowledge about the potential to save I/Os for HPC applications by using deduplication within the checkpointing process. In this paper, we perform a broad study about the deduplication behavior of HPC application checkpointing and its impact on system design.

research product

A configurable rule based classful token bucket filter network request scheduler for the lustre file system

HPC file systems today work in a best-effort manner where individual applications can flood the file system with requests, effectively leading to a denial of service for all other tasks. This paper presents a classful Token Bucket Filter (TBF) policy for the Lustre file system. The TBF enforces Remote Procedure Call (RPC) rate limitations based on (potentially complex) Quality of Service (QoS) rules. The QoS rules are enforced in Lustre's Object Storage Servers, where each request is assigned to an automatically created QoS class.The proposed QoS implementation for Lustre enables various features for each class including the support for high-priority and real-time requests even under heavy …

research product

Deriving and comparing deduplication techniques using a model-based classification

Data deduplication has been a hot research topic and a large number of systems have been developed. These systems are usually seen as an inherently linked set of characteristics. However, a detailed analysis shows independent concepts that can be used in other systems. In this work, we perform this analysis on the main representatives of deduplication systems. We embed the results in a model, which shows two yet unexplored combinations of characteristics. In addition, the model enables a comprehensive evaluation of the representatives and the two new systems. We perform this evaluation based on real world data sets.

research product

Sorted deduplication: How to process thousands of backup streams

The requirements of deduplication systems have changed in the last years. Early deduplication systems had to process dozens to hundreds of backup streams at the same time while today they are able to process hundreds to thousands of them. Traditional approaches rely on stream-locality, which supports parallelism, but which easily leads to many non-contiguous disk accesses, as each stream competes with all other streams for the available resources. This paper presents a new exact deduplication approach designed for processing thousands of backup streams at the same time on the same fingerprint index. The underlying approach destroys the traditionally exploited temporal chunk locality and cre…

research product

ESB: Ext2 Split Block Device

Solid State Disks (SSDs) start to replace rotating media (hard disks, HDD) in many areas, but are still not as cost efficient concerning capacity to completely replace them. One approach to use their superior performance properties is to use them as a cache for magnetic disks to speed up overall storage operations. In this paper, we present and evaluate a file system level optimization based on ext2. We split metadata and data and store the metadata on a SDD while the data remains on a common HDD. We evaluate our system with filebench under a file server, web server, and web proxy scenario and compare the results with flashcache. We find that many of the scenarios do not contain enough meta…

research product

Design of an exact data deduplication cluster

Data deduplication is an important component of enterprise storage environments. The throughput and capacity limitations of single node solutions have led to the development of clustered deduplication systems. Most implemented clustered inline solutions are trading deduplication ratio versus performance and are willing to miss opportunities to detect redundant data, which a single node system would detect. We present an inline deduplication cluster with a joint distributed chunk index, which is able to detect as much redundancy as a single node solution. The use of locality and load balancing paradigms enables the nodes to minimize information exchange. Therefore, we are able to show that, …

research product