Search results for "Speed"

showing 10 items of 876 documents

Optimization of Reactive Force Field Simulation: Refactor, Parallelization, and Vectorization for Interactions

2022

Molecular dynamics (MD) simulations are playing an increasingly important role in many areas ranging from chemical materials to biological molecules. With the continuing development of MD models, the potentials are getting larger and more complex. In this article, we focus on the reactive force field (ReaxFF) potential from LAMMPS to optimize the computation of interactions. We present our efforts on refactoring for neighbor list building, bond order computation, as well as valence angles and torsion angles computation. After redesigning these kernels, we develop a vectorized implementation for non-bonded interactions, which is nearly $100 \times$ 100 × faster than the management processing…

SpeedupComputational Theory and MathematicsXeonHardware and ArchitectureComputer scienceComputationSignal ProcessingVectorization (mathematics)Node (circuits)Parallel computingSupercomputerForce field (chemistry)Sunway TaihuLightIEEE Transactions on Parallel and Distributed Systems

researchProduct

Reducing complexity in H.264/AVC motion estimation by using a GPU

2011

H.264/AVC applies a complex mode decision technique that has high computational complexity in order to reduce the temporal redundancies of video sequences. Several algorithms have been proposed in the literature in recent years with the aim of accelerating this part of the encoding process. Recently, with the emergence of many-core processors or accelerators, a new approach can be adopted for reducing the complexity of the H.264/AVC encoding algorithm. This paper focuses on reducing the inter prediction complexity adopted in H.264/AVC and proposes a GPU-based implementation using CUDA. Experimental results show that the proposed approach reduces the complexity by as much as 99% (100x of spe…

SpeedupComputational complexity theoryComputer science020206 networking & telecommunicationsData_CODINGANDINFORMATIONTHEORY02 engineering and technologyParallel computingCUDAAlgorithmic efficiency0202 electrical engineering electronic engineering information engineeringWorst-case complexity020201 artificial intelligence & image processingContext-adaptive binary arithmetic codingData compressionContext-adaptive variable-length coding

researchProduct

CUDA-enabled Sparse Matrix–Vector Multiplication on GPUs using atomic operations

2013

We propose the Sliced Coordinate Format (SCOO) for Sparse Matrix-Vector Multiplication on GPUs.An associated CUDA implementation which takes advantage of atomic operations is presented.We propose partitioning methods to transform a given sparse matrix into SCOO format.An efficient Dual-GPU implementation which overlaps computation and communication is described.Extensive performance comparisons of SCOO compared to other formats on GPUs and CPUs are provided. Existing formats for Sparse Matrix-Vector Multiplication (SpMV) on the GPU are outperforming their corresponding implementations on multi-core CPUs. In this paper, we present a new format called Sliced COO (SCOO) and an efficient CUDA i…

SpeedupComputer Networks and CommunicationsComputer scienceSparse matrix-vector multiplicationParallel computingComputer Graphics and Computer-Aided DesignTheoretical Computer ScienceMatrix (mathematics)CUDAArtificial IntelligenceHardware and ArchitectureBenchmark (computing)MultiplicationGeneral-purpose computing on graphics processing unitsSoftwareSparse matrixParallel Computing

researchProduct

Finding near-perfect parameters for hardware and code optimizations with automatic multi-objective design space explorations

2012

Summary In the design process of computer systems or processor architectures, typically many different parameters are exposed to configure, tune, and optimize every component of a system. For evaluations and before production, it is desirable to know the best setting for all parameters. Processing speed is no longer the only objective that needs to be optimized; power consumption, area, and so on have become very important. Thus, the best configurations have to be found in respect to multiple objectives. In this article, we use a multi-objective design space exploration tool called Framework for Automatic Design Space Exploration (FADSE) to automatically find near-optimal configurations in …

SpeedupComputer Networks and CommunicationsDesign space explorationComputer sciencebusiness.industryParallel computingProgram optimizationMulti-objective optimizationComputer Science ApplicationsTheoretical Computer ScienceMicroarchitectureComputational Theory and MathematicsScalabilityCode (cryptography)Engineering design processbusinessSoftwareComputer hardwareConcurrency and Computation: Practice and Experience

researchProduct

CliffoSor: A Parallel Embedded Architecture for Geometric Algebra and Computer Graphics

2006

Geometric object representation and their transformations are the two key aspects in computer graphics applications. Traditionally, compute-intensive matrix calculations are involved to model and render 3D scenery. Geometric algebra (a.k.a. Clifford algebra) is gaining growing attention for its natural way to model geometric facts coupled with its being a powerful analytical tool for symbolic calculations. In this paper, the architecture of CliffoSor (Clifford Processor) is introduced. ClifforSor is an embedded parallel coprocessing core that offers direct hardware support to Clifford algebra operators. A prototype implementation on an FPGA board is detailed. Initial test results show more …

SpeedupComputer scienceClifford algebraSolid modelingParallel computingComputational geometryApplication softwarecomputer.software_genreComputational scienceComputer graphicsGeometric algebraComputingMethodologies_SYMBOLICANDALGEBRAICMANIPULATIONRepresentation (mathematics)computer

researchProduct

Circuits and excitations to enable Brownian token-based computing with skyrmions

2021

Brownian computing exploits thermal motion of discrete signal carriers (tokens) for computations. In this paper we address two major challenges that hinder competitive realizations of circuits and application of Brownian token-based computing in actual devices for instance based on magnetic skyrmions. To overcome the problem that crossings generate for the fabrication of circuits, we design a crossing-free layout for a composite half-adder module. This layout greatly simplifies experimental implementations as wire crossings are effectively avoided. Additionally, our design is shorter to speed up computations compared to conventional designs. To address the key issue of slow computation base…

SpeedupCondensed Matter - Mesoscale and Nanoscale PhysicsPhysics and Astronomy (miscellaneous)Computer science530 PhysicsComputationFOS: Physical sciencesTopologySecurity token530 PhysikPower (physics)Discrete-time signalMesoscale and Nanoscale Physics (cond-mat.mes-hall)TorqueBrownian motionElectronic circuit

researchProduct

First Experiences on an Accurate SPH Method on GPUs

2017

It is well known that the standard formulation of the Smoothed Particle Hydrodynamics is usually poor when scattered data distribution is considered or when the approximation near the boundary occurs. Moreover, the method is computational demanding when a high number of data sites and evaluation points are employed. In this paper an enhanced version of the method is proposed improving the accuracy and the efficiency by using a HPC environment. Our implementation exploits the processing power of GPUs for the basic computational kernel resolution. The performance gain demonstrates the method to be accurate and suitable to deal with large sets of data.

SpeedupExploitGPUsComputer scienceComputer Networks and CommunicationsGPUSmoothed Particle Hydrodynamics method010103 numerical & computational mathematics01 natural sciencesComputational scienceSmoothed-particle hydrodynamicsInstruction setSettore MAT/08 - Analisi NumericaArtificial IntelligenceAccuracy; Approximation; GPUs; Kernel function; Smoothed particle hydrodynamics method; Speed-Up; Artificial Intelligence; Computer Networks and Communications; 1707; Signal Processing0101 mathematicsApproximationAccuracy1707Random access memoryLinear systemKernel functionSpeed-Up010101 applied mathematicsKernel (statistics)Signal Processing

researchProduct

Improved SOM Learning using Simulated Annealing

2007

Self-Organizing Map (SOM) algorithm has been extensively used for analysis and classification problems. For this kind of problems, datasets become more and more large and it is necessary to speed up the SOM learning. In this paper we present an application of the Simulated Annealing (SA) procedure to the SOM learning algorithm. The goal of the algorithm is to obtain fast learning and better performance in terms of matching of input data and regularity of the obtained map. An advantage of the proposed technique is that it preserves the simplicity of the basic algorithm. Several tests, carried out on different large datasets, demonstrate the effectiveness of the proposed algorithm in comparis…

SpeedupMatching (graph theory)Wake-sleep algorithmComputer sciencebusiness.industryPattern recognitioncomputer.software_genreAdaptive simulated annealingGeneralization errorComputingMethodologies_PATTERNRECOGNITIONSimulated annealingSOM simulated Annealing TrainingData miningArtificial intelligencebusinesscomputer

researchProduct

Versatile optimization-based speed-up method for autofocusing in digital holographic microscopy

2021

We propose a speed-up method for the in-focus plane detection in digital holographic microscopy that can be applied to a broad class of autofocusing algorithms that involve repetitive propagation of an object wave to various axial locations to decide the in-focus position. The classical autofocusing algorithms apply a uniform search strategy, i.e., they probe multiple, uniformly distributed axial locations, which leads to heavy computational overhead. Our method substantially reduces the computational load, without sacrificing the accuracy, by skillfully selecting the next location to investigate, which results in a decreased total number of probed propagation distances. This is achieved by…

SpeedupOptimization problemComputer sciencePlane (geometry)business.industryImage and Video Processing (eess.IV)FOS: Physical sciencesÒpticaElectrical Engineering and Systems Science - Image and Video ProcessingQuantitative Biology - Quantitative MethodsAtomic and Molecular Physics and OpticsThree dimensional imagingOpticsPosition (vector)FOS: Biological sciencesObject waveFOS: Electrical engineering electronic engineering information engineeringDigital holographic microscopySuccessive parabolic interpolationbusinessAlgorithmQuantitative Methods (q-bio.QM)Physics - OpticsOptics (physics.optics)

researchProduct

Automatic multi-objective optimization of parameters for hardware and code optimizations

2011

Recent computer architectures can be configured in lots of different ways. To explore this huge design space, system simulators are typically used. As performance is no longer the only decisive factor but also e.g. power usage or the resource usage of the system it became very hard for designers to select optimal configurations. In this article we use a multi-objective design space exploration tool called FADSE to explore the vast design space of the Grid Alu Processor (GAP) and its post-link optimizer called GAPtimize. We improved FADSE with techniques to make it more robust against failures and to speed up evaluations through parallel processing. For the GAP, we present an approximation o…

SpeedupParallel processing (DSP implementation)Computer architectureComputer engineeringComputer scienceDesign space explorationPareto principleProgram optimizationGridMulti-objective optimizationSpace exploration

researchProduct