0000000001234269

AUTHOR

André Müller

showing 14 related works from this author

Accelerating metagenomic read classification on CUDA-enabled GPUs.

2016

Metagenomic sequencing studies are becoming increasingly popular with prominent examples including the sequencing of human microbiomes and diverse environments. A fundamental computational problem in this context is read classification; i.e. the assignment of each read to a taxonomic label. Due to the large number of reads produced by modern high-throughput sequencing technologies and the rapidly increasing number of available reference genomes software tools for fast and accurate metagenomic read classification are urgently needed. We present cuCLARK, a read-level classifier for CUDA-enabled GPUs, based on the fast and accurate classification of metagenomic sequences using reduced k-mers (…

0301 basic medicineTheoretical computer scienceWorkstationGPUsComputer scienceContext (language use)CUDAParallel computingBiochemistryGenomelaw.invention03 medical and health sciencesCUDAUser-Computer Interface0302 clinical medicineStructural BiologylawTaxonomic assignmentHumansMicrobiomeMolecular BiologyInternetXeonApplied MathematicsHigh-Throughput Nucleotide SequencingSequence Analysis DNAExact k-mer matchingComputer Science Applications030104 developmental biologyTitan (supercomputer)Metagenomics030220 oncology & carcinogenesisMetagenomicsDNA microarraySoftwareBMC bioinformatics
researchProduct

Efficient Parallel Sort on AVX-512-Based Multi-Core and Many-Core Architectures

2019

Sorting kernels are a fundamental part of numerous applications. The performance of sorting implementations is usually limited by a variety of factors such as computing power, memory bandwidth, and branch mispredictions. In this paper we propose an efficient hybrid sorting method which takes advantage of wide vector registers and the high bandwidth memory of modern AVX-512-based multi-core and many-core processors. Our approach employs a combination of vectorized bitonic sorting and load-balanced multi-threaded merging. Thread-level and data-level parallelism are used to exploit both compute power and memory bandwidth. Our single-threaded implementation is ~30x faster than qsort in the C st…

020203 distributed computingBitonic sorterSpeedupComputer scienceRadix sortSortingMemory bandwidth02 engineering and technologyParallel computingBitonic sorting020202 computer hardware & architecture0202 electrical engineering electronic engineering information engineeringsortqsortMerge sortBranch mispredictionXeon Phi2019 IEEE 21st International Conference on High Performance Computing and Communications; IEEE 17th International Conference on Smart City; IEEE 5th International Conference on Data Science and Systems (HPCC/SmartCity/DSS)
researchProduct

MetaCache: context-aware classification of metagenomic reads using minhashing.

2017

Abstract Motivation Metagenomic shotgun sequencing studies are becoming increasingly popular with prominent examples including the sequencing of human microbiomes and diverse environments. A fundamental computational problem in this context is read classification, i.e. the assignment of each read to a taxonomic label. Due to the large number of reads produced by modern high-throughput sequencing technologies and the rapidly increasing number of available reference genomes corresponding software tools suffer from either long runtimes, large memory requirements or low accuracy. Results We introduce MetaCache—a novel software for read classification using the big data technique minhashing. Our…

0301 basic medicineStatistics and ProbabilityComputer scienceSequence analysisContext (language use)BiochemistryGenome03 medical and health scienceschemistry.chemical_compound0302 clinical medicineRefSeqHumansMolecular BiologyInformation retrievalShotgun sequencingHigh-Throughput Nucleotide SequencingSequence Analysis DNAComputer Science ApplicationsComputational Mathematics030104 developmental biologyComputational Theory and MathematicschemistryMetagenomicsMetagenomics030217 neurology & neurosurgeryDNAAlgorithmsSoftwareReference genomeBioinformatics (Oxford, England)
researchProduct

WarpCore: A Library for fast Hash Tables on GPUs

2020

Hash tables are ubiquitous. Properties such as an amortized constant time complexity for insertion and querying as well as a compact memory layout make them versatile associative data structures with manifold applications. The rapidly growing amount of data emerging in many fields motivated the need for accelerated hash tables designed for modern parallel architectures. In this work, we exploit the fast memory interface of modern GPUs together with a parallel hashing scheme tailored to improve global memory access patterns, to design WarpCore -- a versatile library of hash table data structures. Unique device-sided operations allow for building high performance data processing pipelines ent…

FOS: Computer and information sciencesScheme (programming language)Amortized analysisComputer scienceHash functionParallel computingData structureHash tableCUDAComputer Science - Distributed Parallel and Cluster ComputingServerDistributed Parallel and Cluster Computing (cs.DC)Throughput (business)computercomputer.programming_language2020 IEEE 27th International Conference on High Performance Computing, Data, and Analytics (HiPC)
researchProduct

Le site néolithique final de La Fare (Forcalquier, Alpes-de-Haute-Provence). Résultats 1995-1999 et révision chronoculturelle

2004

Publié initialement : LEMERCIER O., CAULIEZ J., FURESTIER R., MULLER A., BOUVILLE C., CONVERTINI F., GILABERT C., JORDA M., KHEDHAIER R., LAZARD N., LOIRAT D., PELLISSIER M., PROVENZANO N., VERDIN P. (2004) – Le site Néolithique final de La Fare (Forcalquier, Alpes-de-Haute-Provence) résultats 1995-1999 et révision chronoculturelle, in : DARTEVELLE H. (Dir.) : Rencontres Méridionales de Préhistoire Récente, 5e session, Clermont-Ferrand, 2002, Archéologie du sud-ouest, 2004, p. 445-455.; Le site de La Fare est un établissement perché sur un grand éperon de la région de Forcalquier (Alpes-de-Haute-Provence). Occupé de la fin de la Préhistoire jusqu'à l'époque contemporaine, il a livré les ves…

Néolithique finalProvence-Alpes-Côte d'Azur[SHS.ARCHEO] Humanities and Social Sciences/Archaeology and PrehistorychronologieoutillagesCampaniforme[SHS.ARCHEO]Humanities and Social Sciences/Archaeology and PrehistorycéramiquesSud-EstNéolithiquesépultureHabitat[ SHS.ARCHEO ] Humanities and Social Sciences/Archaeology and PrehistoryAlpes-de-Haute-ProvenceFranceComputingMilieux_MISCELLANEOUS
researchProduct

MetaCache-GPU: Ultra-Fast Metagenomic Classification

2021

The cost of DNA sequencing has dropped exponentially over the past decade, making genomic data accessible to a growing number of scientists. In bioinformatics, localization of short DNA sequences (reads) within large genomic sequences is commonly facilitated by constructing index data structures which allow for efficient querying of substrings. Recent metagenomic classification pipelines annotate reads with taxonomic labels by analyzing their $k$-mer histograms with respect to a reference genome database. CPU-based index construction is often performed in a preprocessing phase due to the relatively high cost of building irregular data structures such as hash maps. However, the rapidly growi…

Genomics (q-bio.GN)FOS: Computer and information sciencesSource codeComputer sciencemedia_common.quotation_subjectHash functionContext (language use)MinHashcomputer.software_genreData structureHash tableComputer Science - Distributed Parallel and Cluster ComputingFOS: Biological sciencesPreprocessorQuantitative Biology - GenomicsDistributed Parallel and Cluster Computing (cs.DC)Data miningcomputermedia_commonReference genome50th International Conference on Parallel Processing
researchProduct

Ultrametricity property of energy landscapes of multidisperse packing problems

2009

We consider the problem of finding the densest closed packing of hard disks with proposed different radii in a circular environment, such that the radius of the circumcircle is minimal. The subspace of the quasioptimum configurations of this problem exhibits the property of ultrametricity.

PhysicsPacking problemsProperty (philosophy)Mathematical analysisMathematics::Metric GeometryGeometryAstrophysics::Earth and Planetary AstrophysicsRadiusCircumscribed circleHeat capacityAstrophysics::Galaxy AstrophysicsEnergy (signal processing)Subspace topologyPhysical Review E
researchProduct

FMapper: Scalable read mapper based on succinct hash index on SunWay TaihuLight

2022

Abstract One of the most important application in bioinformatics is read mapping. With the rapidly increasing number of reads produced by next-generation sequencing (NGS) technology, there is a need for fast and efficient high-throughput read mappers. In this paper, we present FMapper – a highly scalable read mapper on the TaihuLight supercomputer optimized for its fourth-generation ShenWei many-core architecture (SW26010). In order to fully exploit the computational power of the SW26010, we employ dynamic scheduling of tasks, asynchronous I/O and data transfers and implement a vectorized version of the banded Myers algorithm tailored to the 256 bit vector registers of the SW26010. Our perf…

256-bitSpeedupXeonComputer Networks and CommunicationsComputer scienceHash functionParallel computingSW26010SupercomputerTheoretical Computer ScienceArtificial IntelligenceHardware and ArchitectureScalabilitySoftwareSunway TaihuLightJournal of Parallel and Distributed Computing
researchProduct

AnySeq: A High Performance Sequence Alignment Library based on Partial Evaluation

2020

Sequence alignments are fundamental to bioinformatics which has resulted in a variety of optimized implementations. Unfortunately, the vast majority of them are hand-tuned and specific to certain architectures and execution models. This not only makes them challenging to understand and extend, but also difficult to port to other platforms. We present AnySeq - a novel library for computing different types of pairwise alignments of DNA sequences. Our approach combines high performance with an intuitively understandable implementation, which is achieved through the concept of partial evaluation. Using the AnyDSL compiler framework, AnySeq enables the compilation of algorithmic variants that ar…

FOS: Computer and information sciences0301 basic medicineScheme (programming language)Computer Science - PerformanceComputer science0206 medical engineeringSequence alignment02 engineering and technologyParallel computingcomputer.software_genreMetaprogrammingDNA sequencingPartial evaluationPerformance (cs.PF)03 medical and health sciences030104 developmental biologyComputer Science - Distributed Parallel and Cluster ComputingFunction composition (computer science)MultithreadingDistributed Parallel and Cluster Computing (cs.DC)Compilercomputer020602 bioinformaticscomputer.programming_languageCodebase
researchProduct

RNACache: Fast Mapping of RNA-Seq Reads to Transcriptomes Using MinHashing

2021

The alignment of reads to a transcriptome is an important initial step in a variety of bioinformatics RNA-seq pipelines. As traditional alignment-based tools suffer from high runtimes, alternative, alignment-free methods have recently gained increasing importance. We present a novel approach to the detection of local similarities between transcriptomes and RNA-seq reads based on context-aware minhashing. We introduce RNACache, a three-step processing pipeline consisting of minhashing of k-mers, match-based (online) filtering, and coverage-based filtering in order to identify truly expressed transcript isoforms. Our performance evaluation shows that RNACache produces transcriptomic mappings …

WorkstationComputer sciencebusiness.industryHash functionBig dataRNA-SeqComputational biologyPipeline (software)Fast mappinglaw.inventionTranscriptomelawScalabilitybusiness
researchProduct

AnyDSL: a partial evaluation framework for programming high-performance libraries

2023

This paper advocates programming high-performance code using partial evaluation. We present a clean-slate programming system with a simple, annotation-based, online partial evaluator that operates on a CPS-style intermediate representation. Our system exposes code generation for accelerators (vectorization/parallelization for CPUs and GPUs) via compiler-known higher-order functions that can be subjected to partial evaluation. This way, generic implementations can be instantiated with target-specific code at compile time. In our experimental evaluation we present three extensive case studies from image processing, ray tracing, and genome sequence alignment. We demonstrate that using partial …

Intermediate languageComputer science020207 software engineeringImage processing02 engineering and technologyParallel computingPartial evaluation004020204 information systems0202 electrical engineering electronic engineering information engineeringCode generationRay tracing (graphics)General-purpose computing on graphics processing unitsSafety Risk Reliability and QualityImplementationSoftwareCompile time
researchProduct

cuBool: Bit-Parallel Boolean Matrix Factorization on CUDA-Enabled Accelerators

2018

Boolean Matrix Factorization (BMF) is a commonly used technique in the field of unsupervised data analytics. The goal is to decompose a ground truth matrix C into a product of two matrices A and $B$ being either an exact or approximate rank k factorization of C. Both exact and approximate factorization are time-consuming tasks due to their combinatorial complexity. In this paper, we introduce a massively parallel implementation of BMF - namely cuBool - in order to significantly speed up factorization of huge Boolean matrices. Our approach is based on alternately adjusting rows and columns of A and B using thousands of lightweight CUDA threads. The massively parallel manipulation of entries …

SpeedupRank (linear algebra)Computer science02 engineering and technologyParallel computingMatrix decompositionCUDAMatrix (mathematics)Factorization020204 information systemsSingular value decomposition0202 electrical engineering electronic engineering information engineering020201 artificial intelligence & image processingMassively parallelInteger (computer science)2018 IEEE 24th International Conference on Parallel and Distributed Systems (ICPADS)
researchProduct

Dépiquage au tribulum au Néolithique final dans le sud-est de la France

2002

Néolithique finalProvence-Alpes-Côte d'Azur[SHS.ARCHEO] Humanities and Social Sciences/Archaeology and Prehistory[SHS.ARCHEO]Humanities and Social Sciences/Archaeology and PrehistorybattagePréhistoireLa FaretribulumpaléobotaniqueEurope du Sudindustrie lithiquecéréale[ SHS.ARCHEO ] Humanities and Social Sciences/Archaeology and PrehistoryAlpes-de-Haute-ProvencetracéologiephytolitheFranceagricultureForcalquier
researchProduct

Packing a multidisperse system of hard disks in a circular environment.

2008

We consider the problem of finding the densest closed packing of hard disks with proposed different radii in a circular environment, such that the radius of the circumcircle is minimal. With our approach, we are able to find denser packings for various problem instances than known from the literature. Both for the dynamics of the simulation and for the optimum values of the radii of the circumcircles, we find various scaling laws.

Scaling lawMathematics::Metric GeometryGeometryAstrophysics::Earth and Planetary AstrophysicsRadiusCircumscribed circleHeat capacityAstrophysics::Galaxy AstrophysicsMathematicsPhysical review. E, Statistical, nonlinear, and soft matter physics
researchProduct