Search results for "parallel computing"

showing 10 items of 189 documents

VLBI-resolution radio-map algorithms: Performance analysis of different levels of data-sharing on multi-socket, multi-core architectures

2012

a b s t r a c t A broad area in astronomy focuses on simulating extragalactic objects based on Very Long Baseline Interferometry (VLBI) radio-maps. Several algorithms in this scope simulate what would be the observed radio-maps if emitted from a predefined extragalactic object. This work analyzes the performance and scaling of this kind of algorithms on multi-socket, multi-core architectures. In particular, we evaluate a sharing approach, a privatizing approach and a hybrid approach on systems with complex memory hierarchy that includes shared Last Level Cache (LLC). In addition, we investigate which manual processes can be systematized and then automated in future works. The experiments sh…

Multi-core processorMemory hierarchy010308 nuclear & particles physicsComputer scienceGeneral Physics and AstronomyParallel computing01 natural sciencesScheduling (computing)Data sharingComputer engineeringHardware and Architecture0103 physical sciencesVery-long-baseline interferometryScalabilityCache010303 astronomy & astrophysicsScalingComputer Physics Communications, CPC, 1937-1946 (2012)
researchProduct

Experimental Study of Six Different Implementations of Parallel Matrix Multiplication on Heterogeneous Computational Clusters of Multicore Processors

2010

Two strategies of distribution of computations can be used to implement parallel solvers for dense linear algebra problems for Heterogeneous Computational Clusters of Multicore Processors (HCoMs). These strategies are called Heterogeneous Process Distribution Strategy (HPS) and Heterogeneous Data Distribution Strategy (HDS). They are not novel and have been researched thoroughly. However, the advent of multicores necessitates enhancements to them. In this paper, we present these enhancements. Our study is based on experiments using six applications to perform Parallel Matrix-matrix Multiplication (PMM) on an HCoM employing the two distribution strategies.

Multi-core processorParallel processing (DSP implementation)Computer scienceComputationLinear algebraParallel algorithmConcurrent computingMultiplicationParallel computingMatrix multiplication2010 18th Euromicro Conference on Parallel, Distributed and Network-based Processing
researchProduct

Accelerating collision detection for large-scale crowd simulation on multi-core and many-core architectures

2013

The computing capabilities of current multi-core and many-core architectures have been used in crowd simulations for both enhancing crowd rendering and simulating continuum crowds. However, improving the scalability of crowd simulation systems by exploiting the inherent parallelism of these architectures is still an open issue. In this paper, we propose different parallelization strategies for the collision check procedure that takes place in agent-based simulations. These strategies are designed for exploiting the parallelism in both multi-core and many-core architectures like graphic processing units (GPUs). As for the many-core implementations, we analyse the bottlenecks of a previous G…

Multi-core processorSpeedupComputer scienceParallel computingCollisionTheoretical Computer ScienceRendering (computer graphics)CrowdsHardware and ArchitectureScalabilityCollision detectionCrowd simulationGeneral-purpose computing on graphics processing unitsSoftwareThe International Journal of High Performance Computing Applications
researchProduct

Suffix Array Construction on Multi-GPU Systems

2019

Suffix arrays are prevalent data structures being fundamental to a wide range of applications including bioinformatics, data compression, and information retrieval. Therefore, various algorithms for (parallel) suffix array construction both on CPUs and GPUs have been proposed over the years. Although providing significant speedup over their CPU-based counterparts, existing GPU implementations share a common disadvantage: input text sizes are limited by the scarce memory of a single GPU. In this paper, we overcome aforementioned memory limitations by exploiting multi-GPU nodes featuring fast NVLink interconnects. In order to achieve high performance for this communication-intensive task, we …

Multi-core processorSpeedupComputer scienceSuffix array0102 computer and information sciences02 engineering and technologyParallel computingData structure01 natural scienceslaw.inventionCUDAShared memory010201 computation theory & mathematicslaw0202 electrical engineering electronic engineering information engineering020201 artificial intelligence & image processingSuffixData compressionProceedings of the 28th International Symposium on High-Performance Parallel and Distributed Computing
researchProduct

A dynamic load-balancing algorithm for molecular dynamics simulation on multi-processor systems

1991

Abstract A new algorithm for dynamic load-balancing on multi-processor systems and its application to the molecular dynamics simulation of the spinodal phase separation are presented. The load-balancer is distributed among the processors and embedded in the application itself. Tests performed on a transputer network show that the load-balancer behaves almost ideally in this application. The same approach can be easily extended to different multi-processor topologies or applications.

Numerical AnalysisInterconnectionSpinodalPhysics and Astronomy (miscellaneous)Computer scienceApplied MathematicsControl reconfigurationMultiprocessingTopology (electrical circuits)Parallel computingNetwork topologyComputer Science ApplicationsDynamic simulationComputational MathematicsMolecular dynamicsModeling and SimulationJournal of Computational Physics
researchProduct

UPC++ for bioinformatics: A case study using genome-wide association studies

2014

Modern genotyping technologies are able to obtain up to a few million genetic markers (such as SNPs) of an individual within a few minutes of time. Detecting epistasis, such as SNP-SNP interactions, in Genome-Wide Association Studies is an important but time-consuming operation since statistical computations have to be performed for each pair of measured markers. Therefore, a variety of HPC architectures have been used to accelerate these studies. In this work we present a parallel approach for multi-core clusters, which is implemented with UPC++ and takes advantage of the features available in the Partitioned Global Address Space and Object Oriented Programming models. Our solution is base…

Object-oriented programmingComputingMethodologies_PATTERNRECOGNITIONComputer scienceComputationSingle-coreGenome-wide association studyPartitioned global address spaceParallel computingBioinformaticsSupercomputer2014 IEEE International Conference on Cluster Computing (CLUSTER)
researchProduct

Unified Parallel C++

2018

Abstract Although MPI is commonly used for parallel programming on distributed-memory systems, Partitioned Global Address Space (PGAS) approaches are gaining attention for programming modern multi-core CPU clusters. They feature a hybrid memory abstraction: distributed memory is viewed as a shared memory that is partitioned among nodes in order to simplify programming. In this chapter you will learn about Unified Parallel C++ (UPC++), a library-based extension of C++ that gathers the advantages of both PGAS and Object Oriented paradigms. The examples included in this chapter will help you to understand the main features of PGAS languages and how they can simplify the task of programming par…

Object-oriented programmingSource codeComputer sciencemedia_common.quotation_subjectParallel computingSoftware_PROGRAMMINGTECHNIQUESShared memoryAsynchronous communicationUnified Parallel CDistributed memoryPartitioned global address spacecomputercomputer.programming_languageAbstraction (linguistics)media_common
researchProduct

A Column Generation Approach to Scheduling of Periodic Tasks

2011

We present an algorithm based on column generation for a real time scheduling problem, in which all tasks appear regularly after a given period. Furthermore, the tasks exchange messages, which have to be transferred over a bus, if the tasks involved are executed on different ECUs. Experiments show that for large instances our preliminary implementation is faster than the previous approach based on an integer linear programming formulation using a state-of-the-art solver.

On columnJob shop schedulingComputer scienceColumn generationParallel computingSolverInteger linear programming formulationScheduling (computing)
researchProduct

Low Level Languages for the PAPIA Machine

1986

The paper presents the low-level languages implemented up to date to program the PAPIA machine. The parallel assembly-level P-MAGRO package, the microcode level instruction set and a machine simulating environment are described.

PAPIA Language Architecture SIMD Processor Parallel-CScalar processorComputer scienceVirtual machineProgramming languageSimd processorParallel computingArchitecturePyramid algorithmcomputer.software_genreLow-level programming languagecomputer
researchProduct

Hybrid CPU/GPU Acceleration of Detection of 2-SNP Epistatic Interactions in GWAS

2014

This is a post-peer-review, pre-copyedit version of an article published in Lecture Notes in Computer Science. The final authenticated version is available online at: https://doi.org/10.1007/978-3-319-09873-9_57 [Abstract] High-throughput genotyping technologies allow the collection of up to a few million genetic markers (such as SNPs) of an individual within a few minutes of time. Detecting epistasis, such as 2-SNP interactions, in Genome-Wide Association Studies is an important but time consuming operation since statistical computations have to be performed for each pair of measured markers. In this work we present EpistSearch, a parallelized tool that, following the log-linear model appr…

POSIX ThreadsMulti-core processorBioinformaticsComputer scienceComputationCUDAParallel computingBioinformaticsPthreadsCUDAAccelerationComputingMethodologies_PATTERNRECOGNITIONTitan (supercomputer)Filter (video)EpistasisGWASEpistasis
researchProduct