0000000000213537

AUTHOR

Reza Salkhordeh

showing 8 related works from this author

Online Management of Hybrid DRAM-NVMM Memory for HPC

2019

Non-volatile main memories (NVMMs) offer a comparable performance to DRAM, while requiring lower static power consumption and enabling higher densities. NVMM therefore can provide opportunities for improving both energy efficiency and costs of main memory. Previous hybrid main memory management approaches for HPC either do not consider the unique characteristics of NVMMs, depend on high profiling costs, or need source code modifications. In this paper, we investigate HPC applications' behaviors in the presence of NVMM as part of the main memory. By performing a comprehensive study of HPC applications and based on several key observations, we propose an online hybrid memory architecture for …

010302 applied physicsProfiling (computer programming)Source codebusiness.industryComputer sciencemedia_common.quotation_subject02 engineering and technology01 natural sciences020202 computer hardware & architectureNon-volatile memoryMemory managementEmbedded system0103 physical sciencesMemory architecture0202 electrical engineering electronic engineering information engineeringKey (cryptography)businessDrammedia_common2019 IEEE 26th International Conference on High Performance Computing, Data, and Analytics (HiPC)
researchProduct

Improving checkpointing intervals by considering individual job failure probabilities

2021

Checkpointing is a popular resilience method in HPC and its efficiency highly depends on the choice of the checkpoint interval. Standard analytical approaches optimize intervals for big, long-running jobs that fail with high probability, while they are unable to minimize checkpointing overheads for jobs with a low or medium probability of failing. Nevertheless, our analysis of batch traces of four HPC systems shows that these jobs are extremely common.We therefore propose an iterative checkpointing algorithm to compute efficient intervals for jobs with a medium risk of failure. The method also supports big and long-running jobs by converging to the results of various traditional methods for…

High probabilitySystems simulationComputer scienceBatch processingInterval (mathematics)Medium RiskResilience (network)Reliability engineering2021 IEEE International Parallel and Distributed Processing Symposium (IPDPS)
researchProduct

Constant Time Garbage Collection in SSDs

2021

Computer scienceParallel computingConstant (mathematics)Garbage collection2021 IEEE International Conference on Networking, Architecture and Storage (NAS)
researchProduct

Streamlining distributed Deep Learning I/O with ad hoc file systems

2021

With evolving techniques to parallelize Deep Learning (DL) and the growing amount of training data and model complexity, High-Performance Computing (HPC) has become increasingly important for machine learning engineers. Although many compute clusters already use learning accelerators or GPUs, HPC storage systems are not suitable for the I/O requirements of DL workflows. Therefore, users typically copy the whole training data to the worker nodes or distribute partitions. Because DL depends on randomized input data, prior work stated that partitioning impacts DL accuracy. Their solutions focused mainly on training I/O performance on a high-speed network but did not cover the data stage-in pro…

Data setWorkflowDistributed databaseProcess (engineering)Computer sciencebusiness.industryDeep learningDistributed computingComputer data storageData deduplicationArtificial intelligenceGlobal Namespacebusiness2021 IEEE International Conference on Cluster Computing (CLUSTER)
researchProduct

Simurgh

2021

The availability of non-volatile main memory (NVMM) has started a new era for storage systems and NVMM specific file systems can support extremely high data and metadata rates, which are required by many HPC and data-intensive applications. Scaling metadata performance within NVMM file systems is nevertheless often restricted by the Linux kernel storage stack, while simply moving metadata management to the user space can compromise security or flexibility. This paper introduces Simurgh, a hardware-assisted user space file system with decentralized metadata management that allows secure metadata updates from within user space. Simurgh guarantees consistency, durability, and ordering of updat…

MetadataFile systemConsistency (database systems)Shared memoryComputer scienceScalabilityMetadata managementData_FILESUser spaceOperating systemLinux kernelcomputer.software_genrecomputerProceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis
researchProduct

DelveFS - An Event-Driven Semantic File System for Object Stores

2020

Data-driven applications are becoming increasingly important in numerous industrial and scientific fields, growing the need for scalable data storage, such as object storage. Yet, many data-driven applications cannot use object interfaces directly and often have to rely on third-party file system connectors that support only a basic representation of objects as files in a flat namespace. With sometimes millions of objects per bucket, this simple organization is insufficient for users and applications who are usually only interested in a small subset of objects. These huge buckets are not only lacking basic semantic properties and structure, but they are also challenging to manage from a tec…

File system020203 distributed computingDatabaseEvent (computing)business.industryComputer scienceRepresentation (systemics)020206 networking & telecommunications02 engineering and technologyDirectorycomputer.software_genreObject (computer science)Object storageComputer data storageScalability0202 electrical engineering electronic engineering information engineeringbusinesscomputer2020 IEEE International Conference on Cluster Computing (CLUSTER)
researchProduct

Persistent software transactional memory in Haskell

2021

Emerging persistent memory in commodity hardware allows byte-granular accesses to persistent state at memory speeds. However, to prevent inconsistent state in persistent memory due to unexpected system failures, different write-semantics are required compared to volatile memory. Transaction-based library solutions for persistent memory facilitate the atomic modification of persistent data in languages where memory is explicitly managed by the programmer, such as C/C++. For languages that provide extended capabilities like automatic memory management, a more native integration into the language is needed to maintain the high level of memory abstraction. It is shown in this paper how persiste…

Computer scienceProgramming languagecomputer.software_genreRuntime systemSoftware portabilityMemory managementSoftware transactional memoryHaskellPersistent data structureSafety Risk Reliability and QualitycomputerSoftwareGarbage collectioncomputer.programming_languageVolatile memoryProceedings of the ACM on Programming Languages
researchProduct

AnyOLAP

2021

The volume of data that is processed and produced by modern data-intensive applications is constantly increasing. Of course, along with the volume, the interest in analyzing and interpreting this data increases as well. As a consequence, more and more DBMSs and processing frameworks are specialized towards the efficient execution of long-running, read-only analytical queries. Unfortunately, to enable analysis, the data first has to be moved from the source application to the analytics tool via a lengthy ETL process, which increases the runtime and complexity of the analysis pipeline. In this work, we advocate to simply skip ETL altogether. With AnyOLAP, we can perform online analysis of dat…

Computer scienceGeneral EngineeringComputational scienceProceedings of the VLDB Endowment
researchProduct