0000000000113928

AUTHOR

Daniel Jünger

showing 6 related works from this author

Suffix Array Construction on Multi-GPU Systems

2019

Suffix arrays are prevalent data structures being fundamental to a wide range of applications including bioinformatics, data compression, and information retrieval. Therefore, various algorithms for (parallel) suffix array construction both on CPUs and GPUs have been proposed over the years. Although providing significant speedup over their CPU-based counterparts, existing GPU implementations share a common disadvantage: input text sizes are limited by the scarce memory of a single GPU. In this paper, we overcome aforementioned memory limitations by exploiting multi-GPU nodes featuring fast NVLink interconnects. In order to achieve high performance for this communication-intensive task, we …

Multi-core processorSpeedupComputer scienceSuffix array0102 computer and information sciences02 engineering and technologyParallel computingData structure01 natural scienceslaw.inventionCUDAShared memory010201 computation theory & mathematicslaw0202 electrical engineering electronic engineering information engineering020201 artificial intelligence & image processingSuffixData compressionProceedings of the 28th International Symposium on High-Performance Parallel and Distributed Computing
researchProduct

Gossip

2019

Nowadays, a growing number of servers and workstations feature an increasing number of GPUs. However, slow communication among GPUs can lead to poor application performance. Thus, there is a latent demand for efficient multi-GPU communication primitives on such systems. This paper focuses on the gather, scatter and all-to-all collectives, which are important operations for various algorithms including parallel sorting and distributed hashing. We present two distinct communication strategies (ring-based and flow-oriented) to generate transfer plans for their topology-aware implementation on NVLink-connected multi-GPU systems. We achieve a throughput of up to 526 GB/s for all-to-all and 148 G…

CUDAComputer scienceGossipDistributed computingTransfer (computing)ServerHash functionOverhead (computing)Throughput (business)Proceedings of the 48th International Conference on Parallel Processing
researchProduct

WarpCore: A Library for fast Hash Tables on GPUs

2020

Hash tables are ubiquitous. Properties such as an amortized constant time complexity for insertion and querying as well as a compact memory layout make them versatile associative data structures with manifold applications. The rapidly growing amount of data emerging in many fields motivated the need for accelerated hash tables designed for modern parallel architectures. In this work, we exploit the fast memory interface of modern GPUs together with a parallel hashing scheme tailored to improve global memory access patterns, to design WarpCore -- a versatile library of hash table data structures. Unique device-sided operations allow for building high performance data processing pipelines ent…

FOS: Computer and information sciencesScheme (programming language)Amortized analysisComputer scienceHash functionParallel computingData structureHash tableCUDAComputer Science - Distributed Parallel and Cluster ComputingServerDistributed Parallel and Cluster Computing (cs.DC)Throughput (business)computercomputer.programming_language2020 IEEE 27th International Conference on High Performance Computing, Data, and Analytics (HiPC)
researchProduct

MetaCache-GPU: Ultra-Fast Metagenomic Classification

2021

The cost of DNA sequencing has dropped exponentially over the past decade, making genomic data accessible to a growing number of scientists. In bioinformatics, localization of short DNA sequences (reads) within large genomic sequences is commonly facilitated by constructing index data structures which allow for efficient querying of substrings. Recent metagenomic classification pipelines annotate reads with taxonomic labels by analyzing their $k$-mer histograms with respect to a reference genome database. CPU-based index construction is often performed in a preprocessing phase due to the relatively high cost of building irregular data structures such as hash maps. However, the rapidly growi…

Genomics (q-bio.GN)FOS: Computer and information sciencesSource codeComputer sciencemedia_common.quotation_subjectHash functionContext (language use)MinHashcomputer.software_genreData structureHash tableComputer Science - Distributed Parallel and Cluster ComputingFOS: Biological sciencesPreprocessorQuantitative Biology - GenomicsDistributed Parallel and Cluster Computing (cs.DC)Data miningcomputermedia_commonReference genome50th International Conference on Parallel Processing
researchProduct

WarpDrive: Massively Parallel Hashing on Multi-GPU Nodes

2018

Hash maps are among the most versatile data structures in computer science because of their compact data layout and expected constant time complexity for insertion and querying. However, associated memory access patterns during the probing phase are highly irregular resulting in strongly memory-bound implementations. Massively parallel accelerators such as CUDA-enabled GPUs may overcome this limitation by virtue of their fast video memory featuring almost one TB/s bandwidth in comparison to main memory modules of state-of-the-art CPUs with less than 100 GB/s. Unfortunately, the size of hash maps supported by existing single-GPU hashing implementations is restricted by the limited amount of …

020203 distributed computingComputer scienceHash function0102 computer and information sciences02 engineering and technologyParallel computingData structure01 natural sciencesHash tableElectronic mailMemory management010201 computation theory & mathematicsScalability0202 electrical engineering electronic engineering information engineeringMassively parallelTime complexity2018 IEEE International Parallel and Distributed Processing Symposium (IPDPS)
researchProduct

Ultra-Fast Detection of Higher-Order Epistatic Interactions on GPUs

2017

Detecting higher-order epistatic interactions in Genome-Wide Association Studies (GWAS) remains a challenging task in the fields of genetic epidemiology and computer science. A number of algorithms have recently been proposed for epistasis discovery. However, they suffer from a high computational cost since statistical measures have to be evaluated for each possible combination of markers. Hence, many algorithms use additional filtering stages discarding potentially non-interacting markers in order to reduce the overall number of combinations to be examined. Among others, Mutual Information Clustering (MIC) is a common pre-processing filter for grouping markers into partitions using K-Means…

0301 basic medicineTheoretical computer scienceComputer sciencebusiness.industryContrast (statistics)Genome-wide association study02 engineering and technologyMutual informationMachine learningcomputer.software_genreReduction (complexity)03 medical and health sciences030104 developmental biologyGenetic epidemiology0202 electrical engineering electronic engineering information engineeringEpistasis020201 artificial intelligence & image processingArtificial intelligenceCluster analysisbusinesscomputerGenetic association
researchProduct