Search results for "Parallel"

showing 10 items of 667 documents

CUDA-enabled Sparse Matrix–Vector Multiplication on GPUs using atomic operations

2013

We propose the Sliced Coordinate Format (SCOO) for Sparse Matrix-Vector Multiplication on GPUs.An associated CUDA implementation which takes advantage of atomic operations is presented.We propose partitioning methods to transform a given sparse matrix into SCOO format.An efficient Dual-GPU implementation which overlaps computation and communication is described.Extensive performance comparisons of SCOO compared to other formats on GPUs and CPUs are provided. Existing formats for Sparse Matrix-Vector Multiplication (SpMV) on the GPU are outperforming their corresponding implementations on multi-core CPUs. In this paper, we present a new format called Sliced COO (SCOO) and an efficient CUDA i…

SpeedupComputer Networks and CommunicationsComputer scienceSparse matrix-vector multiplicationParallel computingComputer Graphics and Computer-Aided DesignTheoretical Computer ScienceMatrix (mathematics)CUDAArtificial IntelligenceHardware and ArchitectureBenchmark (computing)MultiplicationGeneral-purpose computing on graphics processing unitsSoftwareSparse matrixParallel Computing

researchProduct

Finding near-perfect parameters for hardware and code optimizations with automatic multi-objective design space explorations

2012

Summary In the design process of computer systems or processor architectures, typically many different parameters are exposed to configure, tune, and optimize every component of a system. For evaluations and before production, it is desirable to know the best setting for all parameters. Processing speed is no longer the only objective that needs to be optimized; power consumption, area, and so on have become very important. Thus, the best configurations have to be found in respect to multiple objectives. In this article, we use a multi-objective design space exploration tool called Framework for Automatic Design Space Exploration (FADSE) to automatically find near-optimal configurations in …

SpeedupComputer Networks and CommunicationsDesign space explorationComputer sciencebusiness.industryParallel computingProgram optimizationMulti-objective optimizationComputer Science ApplicationsTheoretical Computer ScienceMicroarchitectureComputational Theory and MathematicsScalabilityCode (cryptography)Engineering design processbusinessSoftwareComputer hardwareConcurrency and Computation: Practice and Experience

researchProduct

CliffoSor: A Parallel Embedded Architecture for Geometric Algebra and Computer Graphics

2006

Geometric object representation and their transformations are the two key aspects in computer graphics applications. Traditionally, compute-intensive matrix calculations are involved to model and render 3D scenery. Geometric algebra (a.k.a. Clifford algebra) is gaining growing attention for its natural way to model geometric facts coupled with its being a powerful analytical tool for symbolic calculations. In this paper, the architecture of CliffoSor (Clifford Processor) is introduced. ClifforSor is an embedded parallel coprocessing core that offers direct hardware support to Clifford algebra operators. A prototype implementation on an FPGA board is detailed. Initial test results show more …

SpeedupComputer scienceClifford algebraSolid modelingParallel computingComputational geometryApplication softwarecomputer.software_genreComputational scienceComputer graphicsGeometric algebraComputingMethodologies_SYMBOLICANDALGEBRAICMANIPULATIONRepresentation (mathematics)computer

researchProduct

Automatic multi-objective optimization of parameters for hardware and code optimizations

2011

Recent computer architectures can be configured in lots of different ways. To explore this huge design space, system simulators are typically used. As performance is no longer the only decisive factor but also e.g. power usage or the resource usage of the system it became very hard for designers to select optimal configurations. In this article we use a multi-objective design space exploration tool called FADSE to explore the vast design space of the Grid Alu Processor (GAP) and its post-link optimizer called GAPtimize. We improved FADSE with techniques to make it more robust against failures and to speed up evaluations through parallel processing. For the GAP, we present an approximation o…

SpeedupParallel processing (DSP implementation)Computer architectureComputer engineeringComputer scienceDesign space explorationPareto principleProgram optimizationGridMulti-objective optimizationSpace exploration

researchProduct

cuBool: Bit-Parallel Boolean Matrix Factorization on CUDA-Enabled Accelerators

2018

Boolean Matrix Factorization (BMF) is a commonly used technique in the field of unsupervised data analytics. The goal is to decompose a ground truth matrix C into a product of two matrices A and $B$ being either an exact or approximate rank k factorization of C. Both exact and approximate factorization are time-consuming tasks due to their combinatorial complexity. In this paper, we introduce a massively parallel implementation of BMF - namely cuBool - in order to significantly speed up factorization of huge Boolean matrices. Our approach is based on alternately adjusting rows and columns of A and B using thousands of lightweight CUDA threads. The massively parallel manipulation of entries …

SpeedupRank (linear algebra)Computer science02 engineering and technologyParallel computingMatrix decompositionCUDAMatrix (mathematics)Factorization020204 information systemsSingular value decomposition0202 electrical engineering electronic engineering information engineering020201 artificial intelligence & image processingMassively parallelInteger (computer science)2018 IEEE 24th International Conference on Parallel and Distributed Systems (ICPADS)

researchProduct

Reconfigurable Accelerator for the Word-Matching Stage of BLASTN

2013

BLAST is one of the most popular sequence analysis tools used by molecular biologists. It is designed to efficiently find similar regions between two sequences that have biological significance. However, because the size of genomic databases is growing rapidly, the computation time of BLAST, when performing a complete genomic database search, is continuously increasing. Thus, there is a clear need to accelerate this process. In this paper, we present a new approach for genomic sequence database scanning utilizing reconfigurable field programmable gate array (FPGA)-based hardware. In order to derive an efficient structure for BLASTN, we propose a reconfigurable architecture to accelerate the…

SpeedupSequence databaseHardware and ArchitectureComputer scienceSequence analysisGenomicsParallel computingElectrical and Electronic EngineeringData structureGenomic databasesSoftwareReconfigurable computingWord (computer architecture)IEEE Transactions on Very Large Scale Integration (VLSI) Systems

researchProduct

Performance potential for simulating spin models on GPU

2012

Graphics processing units (GPUs) are recently being used to an increasing degree for general computational purposes. This development is motivated by their theoretical peak performance, which significantly exceeds that of broadly available CPUs. For practical purposes, however, it is far from clear how much of this theoretical performance can be realized in actual scientific applications. As is discussed here for the case of studying classical spin models of statistical mechanics by Monte Carlo simulations, only an explicit tailoring of the involved algorithms to the specific architecture under consideration allows to harvest the computational power of GPU systems. A number of examples, ran…

Spin glassPhysics and Astronomy (miscellaneous)Computer scienceMonte Carlo methodFOS: Physical sciencesComputational scienceCUDAHigh Energy Physics - LatticeStatistical physicsGraphicsCondensed Matter - Statistical MechanicsNumerical AnalysisStatistical Mechanics (cond-mat.stat-mech)Applied MathematicsHigh Energy Physics - Lattice (hep-lat)RangingStatistical mechanicsDisordered Systems and Neural Networks (cond-mat.dis-nn)Condensed Matter - Disordered Systems and Neural NetworksComputational Physics (physics.comp-ph)Computer Science ApplicationsComputational MathematicsModeling and SimulationIsing modelParallel temperingPhysics - Computational Physics

researchProduct

Measurement of the single-top-quark production cross section at CDF.

2008

We report a measurement of the single top quark production cross section in 2.2 ~fb-1 of p-pbar collision data collected by the Collider Detector at Fermilab at sqrt{s}=1.96 TeV. Candidate events are classified as signal-like by three parallel analyses which use likelihood, matrix element, and neural network discriminants. These results are combined in order to improve the sensitivity. We observe a signal consistent with the standard model prediction, but inconsistent with the background-only model by 3.7 standard deviations with a median expected sensitivity of 4.9 standard deviations. We measure a cross section of 2.2 +0.7 -0.6(stat+sys) pb, extract the CKM matrix element value |V_{tb}|=0…

StandardsTop quarkParticle physicsFOS: Physical sciencesGeneral Physics and Astronomyddc:500.2Astrophysics::Cosmology and Extragalactic Astrophysics114 Physical sciences01 natural sciencesStandard ModelHigh Energy Physics - ExperimentNuclear physicsHigh Energy Physics - Experiment (hep-ex)Tellurium compoundsMatrix elementsCross section (physics)Colliding beam acceleratorsStandard deviations0103 physical sciences[PHYS.HEXP]Physics [physics]/High Energy Physics - Experiment [hep-ex]Sensitivity (control systems)010306 general physicsStandard models14.65.Ha 13.85Qk 12.15Hh 12.15.JiPhysicshep-ex010308 nuclear & particles physicsCabibbo–Kobayashi–Maskawa matrixPhysicsStatisticsHigh Energy Physics::PhenomenologyOrder (ring theory)Collider Detector at FermilabCross sections_Parallel analysisProduction (computer science)High Energy Physics::ExperimentCollider Detector at FermilabNeural networksQuark productions

researchProduct

Simulating spin models on GPU

2010

Over the last couple of years it has been realized that the vast computational power of graphics processing units (GPUs) could be harvested for purposes other than the video game industry. This power, which at least nominally exceeds that of current CPUs by large factors, results from the relative simplicity of the GPU architectures as compared to CPUs, combined with a large number of parallel processing units on a single chip. To benefit from this setup for general computing purposes, the problems at hand need to be prepared in a way to profit from the inherent parallelism and hierarchical structure of memory accesses. In this contribution I discuss the performance potential for simulating…

Statistical Mechanics (cond-mat.stat-mech)Computer scienceHigh Energy Physics - Lattice (hep-lat)Monte Carlo methodFOS: Physical sciencesGeneral Physics and AstronomyParallel computingComputational Physics (physics.comp-ph)Power (physics)CUDAHigh Energy Physics - LatticeParallel processing (DSP implementation)Hardware and ArchitectureParallelism (grammar)Ising modelGraphicsPhysics - Computational PhysicsVideo gameCondensed Matter - Statistical MechanicsComputer Physics Communications

researchProduct

Sub-threshold signal processing in arrays of non-identical nanostructures

2011

Weak input signals are routinely processed by molecular-scaled biological networks composed of non-identical units that operate correctly in a noisy environment. In order to show that artificial nanostructures can mimic this behavior, we explore theoretically noise-assisted signal processing in arrays of metallic nanoparticles functionalized with organic ligands that act as tunneling junctions connecting the nanoparticle to the external electrodes. The electronic transfer through the nanostructure is based on the Coulomb blockade and tunneling effects. Because of the fabrication uncertainties, these nanostructures are expected to show a high variability in their physical characteristics and…

Statistical ensembleSignal processingMaterials scienceMechanical EngineeringThermal fluctuationsCoulomb blockadeSignal Processing Computer-AssistedBioengineeringNanotechnologyElectrochemical TechniquesEquipment DesignGeneral ChemistryNanostructuresModels ChemicalMechanics of MaterialsNanotechnologyGeneral Materials ScienceKinetic Monte CarloElectrical and Electronic EngineeringBiological systemElectrodesParallel arrayElectronic circuitVoltageNanotechnology

researchProduct