Search results for "computer.software_genre"
showing 10 items of 3858 documents
GekkoFS - A Temporary Distributed File System for HPC Applications
2018
We present GekkoFS, a temporary, highly-scalable burst buffer file system which has been specifically optimized for new access patterns of data-intensive High-Performance Computing (HPC) applications. The file system provides relaxed POSIX semantics, only offering features which are actually required by most (not all) applications. It is able to provide scalable I/O performance and reaches millions of metadata operations already for a small number of nodes, significantly outperforming the capabilities of general-purpose parallel file systems. The work has been funded by the German Research Foundation (DFG) through the ADA-FS project as part of the Priority Programme 1648. It is also support…
Sorted deduplication: How to process thousands of backup streams
2016
The requirements of deduplication systems have changed in the last years. Early deduplication systems had to process dozens to hundreds of backup streams at the same time while today they are able to process hundreds to thousands of them. Traditional approaches rely on stream-locality, which supports parallelism, but which easily leads to many non-contiguous disk accesses, as each stream competes with all other streams for the available resources. This paper presents a new exact deduplication approach designed for processing thousands of backup streams at the same time on the same fingerprint index. The underlying approach destroys the traditionally exploited temporal chunk locality and cre…
DelveFS - An Event-Driven Semantic File System for Object Stores
2020
Data-driven applications are becoming increasingly important in numerous industrial and scientific fields, growing the need for scalable data storage, such as object storage. Yet, many data-driven applications cannot use object interfaces directly and often have to rely on third-party file system connectors that support only a basic representation of objects as files in a flat namespace. With sometimes millions of objects per bucket, this simple organization is insufficient for users and applications who are usually only interested in a small subset of objects. These huge buckets are not only lacking basic semantic properties and structure, but they are also challenging to manage from a tec…
LPCC
2019
Most high-performance computing (HPC) clusters use a global parallel file system to enable high data throughput. The parallel file system is typically centralized and its storage media are physically separated from the compute cluster. Compute nodes as clients of the parallel file system are often additionally equipped with SSDs. The node internal storage media are rarely well-integrated into the I/O and compute workflows. How to make full and flexible use of these storage media is therefore a valuable research question. In this paper, we propose a hierarchical Persistent Client Caching (LPCC) mechanism for the Lustre file system. LPCC provides two modes: RW-PCC builds a read-write cache on…
A configurable rule based classful token bucket filter network request scheduler for the lustre file system
2017
HPC file systems today work in a best-effort manner where individual applications can flood the file system with requests, effectively leading to a denial of service for all other tasks. This paper presents a classful Token Bucket Filter (TBF) policy for the Lustre file system. The TBF enforces Remote Procedure Call (RPC) rate limitations based on (potentially complex) Quality of Service (QoS) rules. The QoS rules are enforced in Lustre's Object Storage Servers, where each request is assigned to an automatically created QoS class.The proposed QoS implementation for Lustre enables various features for each class including the support for high-priority and real-time requests even under heavy …
Challenges and Solutions for Tracing Storage Systems
2018
IBM Spectrum Scale’s parallel file system General Parallel File System (GPFS) has a 20-year development history with over 100 contributing developers. Its ability to support strict POSIX semantics across more than 10K clients leads to a complex design with intricate interactions between the cluster nodes. Tracing has proven to be a vital tool to understand the behavior and the anomalies of such a complex software product. However, the necessary trace information is often buried in hundreds of gigabytes of by-product trace records. Further, the overhead of tracing can significantly impact running applications and file system performance, limiting the use of tracing in a production system. In…
Direct lookup and hash-based metadata placement for local file systems
2013
New challenges to file systems' metadata performance are imposed by the continuously growing number of files existing in file systems. The total amount of metadata can become too big to be cached, potentially leading to multiple storage device accesses for a single metadata lookup operation. This paper takes a look at the limitations of traditional file system designs and discusses an alternative metadata handling approach, using hash-based concepts already established for metadata and data placement in distributed storage systems. Furthermore, a POSIX compliant prototype implementation based on these concepts is introduced and benchmarked. A variety of file system metadata and data operati…
MERCURY: A Transparent Guided I/O Framework for High Performance I/O Stacks
2017
The performance gap between processors and I/O represents a serious scalability limitation for applications running on computing clusters. Parallel file systems often provide mechanisms that allow programmers to disclose their I/O pattern knowledge to the lower layers of the I/O stack through a hints API. This information can be used by the file system to boost the application performance. Unfortunately, programmers rarely make use of these features, missing the opportunity to exploit the full potential of the storage system. In this paper we propose MERCURY, a transparent guided I/O framework able to optimize file I/O patterns in scientific applications, allowing users to control the I/O b…
ESB: Ext2 Split Block Device
2012
Solid State Disks (SSDs) start to replace rotating media (hard disks, HDD) in many areas, but are still not as cost efficient concerning capacity to completely replace them. One approach to use their superior performance properties is to use them as a cache for magnetic disks to speed up overall storage operations. In this paper, we present and evaluate a file system level optimization based on ext2. We split metadata and data and store the metadata on a SDD while the data remains on a common HDD. We evaluate our system with filebench under a file server, web server, and web proxy scenario and compare the results with flashcache. We find that many of the scenarios do not contain enough meta…
Kriterien für die Auswahl von Elektronischen Rechenanlagen für Biomedizinische Forschungsinstitute
1979
Computers are now a recognized tool in biomedical research. They are used for the evaluation of data on one hand and on the other hand for data acquisition and control of experiments. Based on our experience, some suggestions concerning the structure of a mini-computer system suitable for a research laboratory are made. According to the two major classes of application, two sets or requirements arise. We argue that it is effective to use this system for data reduction and evaluation because a large percentage of tasks require program development or at least specific input data handling. Therefore, we call for a multi-user time-sharing system which should be equipped with a set of commands t…