0000000000225784

AUTHOR

Marc-andré Vef

0000-0001-7398-3034

showing 8 related works from this author

GekkoFS - A Temporary Distributed File System for HPC Applications

2018

We present GekkoFS, a temporary, highly-scalable burst buffer file system which has been specifically optimized for new access patterns of data-intensive High-Performance Computing (HPC) applications. The file system provides relaxed POSIX semantics, only offering features which are actually required by most (not all) applications. It is able to provide scalable I/O performance and reaches millions of metadata operations already for a small number of nodes, significantly outperforming the capabilities of general-purpose parallel file systems. The work has been funded by the German Research Foundation (DFG) through the ADA-FS project as part of the Priority Programme 1648. It is also support…

File system020203 distributed computingBurst buffersParallel processing (Electronic computers)Computer scienceProcessament en paral·lel (Ordinadors)020207 software engineering02 engineering and technologyBuffer storage (Computer science)computer.software_genreData structureDistributed file systemsMetadataParallel processing (DSP implementation)POSIXServerScalabilityHPC0202 electrical engineering electronic engineering information engineeringOperating systemHigh performance computingDistributed File System:Informàtica::Arquitectura de computadors::Arquitectures paral·leles [Àrees temàtiques de la UPC]computerCàlcul intensiu (Informàtica)2018 IEEE International Conference on Cluster Computing (CLUSTER)
researchProduct

Streamlining distributed Deep Learning I/O with ad hoc file systems

2021

With evolving techniques to parallelize Deep Learning (DL) and the growing amount of training data and model complexity, High-Performance Computing (HPC) has become increasingly important for machine learning engineers. Although many compute clusters already use learning accelerators or GPUs, HPC storage systems are not suitable for the I/O requirements of DL workflows. Therefore, users typically copy the whole training data to the worker nodes or distribute partitions. Because DL depends on randomized input data, prior work stated that partitioning impacts DL accuracy. Their solutions focused mainly on training I/O performance on a high-speed network but did not cover the data stage-in pro…

Data setWorkflowDistributed databaseProcess (engineering)Computer sciencebusiness.industryDeep learningDistributed computingComputer data storageData deduplicationArtificial intelligenceGlobal Namespacebusiness2021 IEEE International Conference on Cluster Computing (CLUSTER)
researchProduct

Pure Functions in C: A Small Keyword for Automatic Parallelization

2017

AbstractThe need for parallel task execution has been steadily growing in recent years since manufacturers mainly improve processor performance by increasing the number of installed cores instead of scaling the processor’s frequency. To make use of this potential, an essential technique to increase the parallelism of a program is to parallelize loops. Several automatic loop nest parallelizers have been developed in the past such as PluTo. The main restriction of these tools is that the loops must be statically analyzable which, among other things, disallows function calls within the loops. In this article, we present a seemingly simple extension to the C programming language which marks fun…

LOOP (programming language)Computer sciencemedia_common.quotation_subject020209 energy02 engineering and technologyParallel computingcomputer.software_genreToolchainTheoretical Computer ScienceTask (computing)Automatic parallelizationSide effect (computer science)Parallel processing (DSP implementation)020204 information systemsTheory of computationParallelism (grammar)0202 electrical engineering electronic engineering information engineeringPolytope model020201 artificial intelligence & image processingCompilerFunction (engineering)computerSoftwareInformation Systemsmedia_common2017 IEEE International Conference on Cluster Computing (CLUSTER)
researchProduct

Using On-Demand File Systems in HPC Environments

2019

In modern HPC systems, parallel (distributed) file systems are used to allow fast access from and to the storage infrastructure. However, I/O performance in large-scale HPC systems has failed to keep up with the increase in computational power. As a result, the I/O subsystem which also has to cope with a large number of demanding metadata operations is often the bottleneck of the entire HPC system. In some cases, even a single bad behaving application can be held responsible for slowing down the entire HPC system, disrupting other applications that use the same I/O subsystem. These kinds of situations are likely to become more frequent in the future with larger and more powerful HPC systems…

MetadataFile systemComputer scienceOn demandDistributed computingDATA processing & computer scienceLustre (file system)ddc:004computer.software_genrecomputerGlobal file systemBottleneckBeeGFS
researchProduct

Challenges and Solutions for Tracing Storage Systems

2018

IBM Spectrum Scale’s parallel file system General Parallel File System (GPFS) has a 20-year development history with over 100 contributing developers. Its ability to support strict POSIX semantics across more than 10K clients leads to a complex design with intricate interactions between the cluster nodes. Tracing has proven to be a vital tool to understand the behavior and the anomalies of such a complex software product. However, the necessary trace information is often buried in hundreds of gigabytes of by-product trace records. Further, the overhead of tracing can significantly impact running applications and file system performance, limiting the use of tracing in a production system. In…

File systemComputer sciencebusiness.industryInterface (computing)Distributed computing020206 networking & telecommunications020207 software engineering02 engineering and technologyTracingcomputer.software_genreSoftwareHardware and ArchitecturePOSIXScalability0202 electrical engineering electronic engineering information engineeringOverhead (computing)businesscomputerTRACE (psycholinguistics)ACM Transactions on Storage
researchProduct

DelveFS - An Event-Driven Semantic File System for Object Stores

2020

Data-driven applications are becoming increasingly important in numerous industrial and scientific fields, growing the need for scalable data storage, such as object storage. Yet, many data-driven applications cannot use object interfaces directly and often have to rely on third-party file system connectors that support only a basic representation of objects as files in a flat namespace. With sometimes millions of objects per bucket, this simple organization is insufficient for users and applications who are usually only interested in a small subset of objects. These huge buckets are not only lacking basic semantic properties and structure, but they are also challenging to manage from a tec…

File system020203 distributed computingDatabaseEvent (computing)business.industryComputer scienceRepresentation (systemics)020206 networking & telecommunications02 engineering and technologyDirectorycomputer.software_genreObject (computer science)Object storageComputer data storageScalability0202 electrical engineering electronic engineering information engineeringbusinesscomputer2020 IEEE International Conference on Cluster Computing (CLUSTER)
researchProduct

GekkoFS — A Temporary Burst Buffer File System for HPC Applications

2020

Many scientific fields increasingly use high-performance computing (HPC) to process and analyze massive amounts of experimental data while storage systems in today’s HPC environments have to cope with new access patterns. These patterns include many metadata operations, small I/O requests, or randomized file I/O, while general-purpose parallel file systems have been optimized for sequential shared access to large files. Burst buffer file systems create a separate file system that applications can use to store temporary data. They aggregate node-local storage available within the compute nodes or use dedicated SSD clusters and offer a peak bandwidth higher than that of the backend parallel f…

Information storage and retrieval systemsPOSIXFile systemBurst buffersComputer scienceProcess (computing)computer.software_genreDistributed file systemsComputer Science ApplicationsTheoretical Computer ScienceMetadataInformació -- Sistemes d'emmagatzematge i recuperacióComputational Theory and MathematicsHardware and ArchitecturePOSIXHPC:Informàtica::Sistemes d'informació::Emmagatzematge i recuperació de la informació [Àrees temàtiques de la UPC]ScalabilityOperating systemBandwidth (computing)High performance computingIsolation (database systems)Càlcul intensiu (Informàtica)computerSoftwareJournal of Computer Science and Technology
researchProduct

ADA-FS—Advanced Data Placement via Ad hoc File Systems at Extreme Scales

2020

Today’s High-Performance Computing (HPC) environments increasingly have to manage relatively new access patterns (e.g., large numbers of metadata operations) which general-purpose parallel file systems (PFS) were not optimized for. Burst-buffer file systems aim to solve that challenge by spanning an ad hoc file system across node-local flash storage at compute nodes to relief the PFS from such access patterns. However, existing burst-buffer file systems still support many of the traditional file system features, which are often not required in HPC applications, at the cost of file system performance.

MetadataFile systemComputer scienceData_FILESOperating systemcomputer.software_genrecomputerFlash storageData placement
researchProduct