0000000000161892
AUTHOR
Tim Suss
Parallel macro pipelining on the intel SCC many-core computer
In this paper we present how Intel's Single-Chip-Cloud processor behaves for parallel macro pipeline applications. Subsets of the SCC's available cores can be arranged as a pipeline where each core processes one stage of the overall workload. Each of the independent cores processes a small part of a larger task and feeds the following core with new data after it finishes its work. Our case-study is a parallel rendering system which renders successive images and applies different filters on them. On normal graphics adapters this is usually done in multiple cycles, we do this in a single pipeline pass. We show that we can achieve a significant speedup by using multiple parallel pipelines on t…
Towards Dynamic Scripted pNFS Layouts
Today's network file systems consist of a variety of complex subprotocols and backend storage classes. The data is typically spread over multiple data servers to achieve higher levels of performance and reliability. A metadata server is responsible for creating the mapping of a file to these data servers. It is hard to map application specific access patterns to storage system specific features, which can result in a degraded IO performance. We present an NFSv4.1/pNFS protocol extension that integrates the client's ability to provide hints and I/O advices to metadata servers. We define multiple storage classes and allow the client to choose which type of storage fits best for its desired ac…
Reducing False Node Failure Predictions in HPC
Future HPC applications must be able to scale to thousands of compute nodes, while running for several days. The increased runtime and node count inconveniently raises the probability of hardware failures that may interrupt computations. Scientists must therefore protect their simulations against hardware failures. This is typically done using frequent checkpoint& restart, which may have significant overheads. Consequently, the frequency in which checkpoints are taken should be minimized. Predicting hardware failures ahead of time is a promising approach to address this problem, but has remaining issues like false alarms at large scales. In this paper, we introduce the probability of unnece…
Effects and Benefits of Node Sharing Strategies in HPC Batch Systems
Processor manufacturers today scale performance by increasing the number of cores on each CPU. Unfortunately, not all HPC applications can efficiently saturate all cores of a single node, even if they successfully scale to thousands of nodes. For these applications, sharing nodes with other applications can help to stress different resources on the nodes to more efficiently use them. Previous work has shown that the performance impact of node sharing is very application dependent but very little work has studied its effects within batch systems and for complex parallel application mixes. Administrators therefore typically fear the complexity of running a batch system supporting node sharing…
POSTER: Optimizing scientific file I/O patterns using advice based knowledge
Before us, other works have used data prefetching to boost applications performance [1]–[8]. Our approach differs from these works since we do not rely on precise I/O pattern information to predict and prefetch every chunck of data in advance. Instead we use data prefetching to group many small requests in a few big ones, improving applications performance and utilization of the whole storage system. Moreover, we provide the infrastructure that enables users to access file system specific interfaces for guided I/O without modifying applications and hiding the intrinsic complexity that such interfaces introduce.
Evaluation of a hash-compress-encrypt pipeline for storage system applications
Great efforts are made to store data in a secure, reliable, and authentic way in large storage systems. Specialized, system specific clients help to achieve these goals. Nevertheless, often standard tools for hashing, compressing, and encrypting data are arranged in transparent pipelines. We analyze the potential of Unix shell pipelines with several high-speed and high-compression algorithms that can be used to achieve data security, reduction, and authenticity. Furthermore, we compare the pipelines of standard tools against a house made pipeline implemented in C++ and show that there is great potential for performance improvement.
Sorted deduplication: How to process thousands of backup streams
The requirements of deduplication systems have changed in the last years. Early deduplication systems had to process dozens to hundreds of backup streams at the same time while today they are able to process hundreds to thousands of them. Traditional approaches rely on stream-locality, which supports parallelism, but which easily leads to many non-contiguous disk accesses, as each stream competes with all other streams for the available resources. This paper presents a new exact deduplication approach designed for processing thousands of backup streams at the same time on the same fingerprint index. The underlying approach destroys the traditionally exploited temporal chunk locality and cre…