Search results for "Parallel computing"

showing 10 items of 189 documents

SAUCE: A web application for interactive teaching and learning of parallel programming

2017

Abstract Prevalent hardware trends towards parallel architectures and algorithms create a growing demand for graduate students familiar with the programming of concurrent software. However, learning parallel programming is challenging due to complex communication and memory access patterns as well as the avoidance of common pitfalls such as dead-locks and race conditions. Hence, the learning process has to be supported by adequate software solutions in order to enable future computer scientists and engineers to write robust and efficient code. This paper discusses a selection of well-known parallel algorithms based on C++11 threads, OpenMP, MPI, and CUDA that can be interactively embedded i…

Computer Networks and Communicationsbusiness.industryComputer scienceProgramming languageWhite-box testingParallel algorithmProcess (computing)020206 networking & telecommunications02 engineering and technologyParallel computingThread (computing)computer.software_genreTheoretical Computer ScienceCUDASoftwareArtificial IntelligenceHardware and Architecture0202 electrical engineering electronic engineering information engineeringCode (cryptography)Web application020201 artificial intelligence & image processingbusinesscomputerSoftwareJournal of Parallel and Distributed Computing

researchProduct

Domain-Knowledge Optimized Simulated Annealing for Network-on-Chip Application Mapping

2013

Network-on-Chip architectures are scalable on-chip interconnection networks. They replace the inefficient shared buses and are suitable for multicore and manycore systems. This paper presents an Optimized Simulated Annealing (OSA) algorithm for the Network-on-Chip application mapping problem. With OSA, the cores are implicitly and dynamically clustered using knowledge about communication demands. We show that OSA is a more feasible Simulated Annealing approach to NoC application mapping by comparing it with a general Simulated Annealing algorithm and a Branch and Bound algorithm, too. Using real applications we show that OSA is significantly faster than a general Simulated Annealing, withou…

Computer Science::Hardware ArchitectureInterconnectionMulti-core processorNetwork on a chipBranch and boundComputer scienceScalabilitySimulated annealingComputer Science::Networking and Internet ArchitectureParallel computingAdaptive simulated annealingCluster analysis

researchProduct

Parallelizing Epistasis Detection in GWAS on FPGA and GPU-Accelerated Computing Systems

2015

This is a post-peer-review, pre-copyedit version of an article published in IEEE - ACM Transactions on Computational Biology and Bioinformatics. The final authenticated version is available online at: http://dx.doi.org/10.1109/TCBB.2015.2389958 [Abstract] High-throughput genotyping technologies (such as SNP-arrays) allow the rapid collection of up to a few million genetic markers of an individual. Detecting epistasis (based on 2-SNP interactions) in Genome-Wide Association Studies is an important but time consuming operation since statistical computations have to be performed for each pair of measured markers. Computational methods to detect epistasis therefore suffer from prohibitively lon…

Computer scienceBioinformaticsDNA Mutational AnalysisGenome-wide association studyParallel computingPolymorphism Single NucleotideSensitivity and SpecificityComputational biologyComputer GraphicsGeneticsComputer architectureField-programmable gate arrayRandom access memoryApplied MathematicsChromosome MappingHigh-Throughput Nucleotide SequencingReproducibility of ResultsField programmable gate arraysEpistasis GeneticSignal Processing Computer-AssistedEquipment DesignRandom access memoryComputing systemsReconfigurable computingEquipment Failure AnalysisTask (computing)EpistasisHost (network)Graphics processing unitsGenome-Wide Association StudyBiotechnology

researchProduct

High Precision Conservative Surface Mesh Generation for Swept Volumes

2015

We present a novel, efficient, and flexible scheme to generate a high-quality mesh that approximates the outer boundary of a swept volume. Our approach comes with two guarantees. First, the approximation is conservative, i.e., the swept volume is enclosed by the generated mesh. Second, the one-sided Hausdorff distance of the generated mesh to the swept volume is upper bounded by a user defined tolerance. Exploiting this tolerance the algorithm generates a mesh that is adapted to the local complexity of the swept volume boundary, keeping the overall output complexity remarkably low. The algorithm is two-phased: the actual sweep and the mesh generation. In the sweeping phase, we introduce a g…

Computer scienceBoundary (topology)Parallel computingUpper and lower boundsComputational scienceCUDAHausdorff distanceEngine displacementControl and Systems EngineeringMesh generationBounded functionElectrical and Electronic EngineeringRuppert's algorithmComputingMethodologies_COMPUTERGRAPHICSIEEE Transactions on Automation Science and Engineering

researchProduct

SoC-Based Implementation of the Backpropagation Algorithm for MLP

2008

The backpropagation algorithm used for the training of multilayer perceptrons (MLPs) has a high degree of parallelism and is therefore well-suited for hardware implementation on an ASIC or FPGA. However, most implementations are lacking in generality of application, either by limiting the range of trainable network topologies or by resorting to fixed-point arithmetic to increase processing speed. We propose a parallel backpropagation implementation on a multiprocessor system-on-chip (SoC) with a large number of independent floating-point processing units, controlled by software running on embedded processors in order to allow flexibility in the selection of the network topology to be traine…

Computer scienceDegree of parallelismOverhead (computing)MultiprocessingParallel computingFixed-point arithmeticPerceptronNetwork topologyField-programmable gate arrayBackpropagation2008 Eighth International Conference on Hybrid Intelligent Systems

researchProduct

Concurrent Molecular Dynamics Simulation of ST2 Water on a Transputer Array

1988

Abstract A concurrent implementation of a Molecular Dynamics program for ST2 water molecules is presented, which exploits the great potentialities of the Transputer arrays for statistical mechanical calculations. High load-balance efficiency is obtained using a new task decomposition algorithm which evenly distributes particles and interaction calculations among the processors. This approach can also help to solve efficiently the more general problem of task distribution in parallel computing of symmetric pairwise system properties.

Computer scienceGeneral Chemical EngineeringGeneral problemTransputerGeneral ChemistryParallel computingCondensed Matter PhysicsProcessor arrayMolecular dynamicsMIMDTask (computing)Modeling and SimulationDecomposition (computer science)General Materials SciencePairwise comparisonInformation SystemsMolecular Simulation

researchProduct

Optimizing PolyACO Training with GPU-Based Parallelization

2016

A central part of Ant Colony Optimisation (ACO) is the function calculating the quality and cost of solutions, such as the distance of a potential ant route. This cost function is used to deposit an opportune amount of pheromones to achieve an apt convergence, and in an active ACO implementation a significant part of the runtime is spent in this part of the code. In some cases, the cost function accumulates up towards 94 % in its run time making it a performance bottle neck.

Computer scienceMathematicsofComputing_NUMERICALANALYSISSignificant part02 engineering and technologyParallel computingFunction (mathematics)Ant colonyComputingMethodologies_ARTIFICIALINTELLIGENCEBottle neck030218 nuclear medicine & medical imaging03 medical and health sciencesAutomatic parallelization0302 clinical medicineConvergence (routing)0202 electrical engineering electronic engineering information engineeringCode (cryptography)020201 artificial intelligence & image processing

researchProduct

Constant Time Garbage Collection in SSDs

2021

Computer scienceParallel computingConstant (mathematics)Garbage collection2021 IEEE International Conference on Networking, Architecture and Storage (NAS)

researchProduct

The Sliced COO Format for Sparse Matrix-Vector Multiplication on CUDA-enabled GPUs

2012

Abstract Existing formats for Sparse Matrix-Vector Multiplication (SpMV) on the GPU are outperforming their corresponding implementations on multi-core CPUs. In this paper, we present a new format called Sliced COO (SCOO) and an effcient CUDA implementation to perform SpMV on the GPU. While previous work shows experiments on small to medium-sized sparse matrices, we perform evaluations on large sparse matrices. We compared SCOO performance to existing formats of the NVIDIA Cusp library. Our resutls on a Fermi GPU show that SCOO outperforms the COO and CSR format for all tested matrices and the HYB format for all tested unstructured matrices. Furthermore, comparison to a Sandy-Bridge CPU sho…

Computer scienceSparse matrix-vector multiplicationCUDAParallel computingMatrix (mathematics)CUDAFactor (programming language)SpMVGeneral Earth and Planetary SciencesMultiplicationcomputerFermiGeneral Environmental Sciencecomputer.programming_languageSparse matrixProcedia Computer Science

researchProduct

Improving LSM‐trie performance by parallel search

2020

Computer scienceTrieParallel computingSoftwareParallel searchSoftware: Practice and Experience

researchProduct