Search results for "GPU"
showing 10 items of 43 documents
Future Processor Hardware Architectures for the Benefit of Precise Particle Accelerator Modeling
2017
Jaunās procesoru arhitektūras, kā grafiskie procesori (GPU) un Intel Many Integrated Cores (MIC) procesori, sniedz milzīgu veiktspējas potenciālu augstas veiktspējas skaitļošanas aplikācijās. Tomēr izstrādājot programmatūru, kas spēj izmantot šīs jaunās tehnoloģijas ir jāsaskarās ar dažādiem papildus grūtībām. Programmām ir jāspēj izmantot papildus paralēlisms, ko piedāvā šīs iekārtās, tām ir jāspēj pielāgoties dažādām procesoru arhitektūrām un jāizmanto dažādas izstrādes platformas, lai aplikācija spēdu darboties uz iekārtām no dažādiem ražotājiem. Dynamic Kernel Scheduler (DKS) tika izstrādāts, lai nodrošinātu papildus programmatūras slāni starp programmu un papildus processoriem. DKS nod…
Multi-GPU Accelerated Multi-Spin Monte Carlo Simulations of the 2D Ising Model
2010
A Modern Graphics Processing unit (GPU) is able to perform massively parallel scientific computations at low cost. We extend our implementation of the checkerboard algorithm for the two-dimensional Ising model [T. Preis et al., Journal of Chemical Physics 228 (2009) 4468–4477] in order to overcome the memory limitations of a single GPU which enables us to simulate significantly larger systems. Using multi-spin coding techniques, we are able to accelerate simulations on a single GPU by factors up to 35 compared to an optimized single Central Processor Unit (CPU) core implementation which employs multi-spin coding. By combining the Compute Unified Device Architecture (CUDA) with the Message P…
Optical sectioning microscopy through single-shot Lightfield protocol
2020
Optical sectioning microscopy is usually performed by means of a scanning, multi-shot procedure in combination with non-uniform illumination. In this paper, we change the paradigm and report a method that is based in the light field concept, and that provides optical sectioning for 3D microscopy images after a single-shot capture. To do this we fi rst capture multiple orthographic perspectives of the sample by means of Fourier-domain integral microscopy (FiMic). The second stage of our protocol is the application of a novel refocusing algorithm that is able to produce optical sectioning in real time, and with no resolution worsening, in the case of sparse f luorescent samples.We provide the…
A GPU-accelerated augmented Lagrangian based L1-mean curvature Image denoising algorithm implementation
2015
This paper presents a graphics processing unit (GPU) implementation of a recently published augmented Lagrangian based L1-mean curvature image denoising algorithm. The algorithm uses a particular alternating direction method of multipliers to reduce the related saddle-point problem to an iterative sequence of four simpler minimization problems. Two of these subproblems do not contain the derivatives of the unknown variables and can therefore be solved point-wise without inter-process communication. Inparticular, this facilitates the efficient solution of the subproblem that deals with the non-convex term in the original objective function by modern GPUs. The two remaining subproblems are so…
Advanced numerical treatment of an accurate SPH method
2019
The summation of Gaussian kernel functions is an expensive operation frequently encountered in scientific simulation algorithms and several methods have been already proposed to reduce its computational cost. In this work, the Improved Fast Gauss Transform (IFGT) [1] is properly applied to the Smoothed Particle Hydrodynamics (SPH) method [2] in order to speed up its efficiency. A modified version of the SPH method is considered in order to overcome the loss of accuracy of the standard formulation [3]. A suitable use of the IFGT allows us to reduce the computational effort while tuning the desired accuracy into the SPH framework. This technique, coupled with an algorithmic design for exploit…
A prospect for computing in porous materials research: Very large fluid flow simulations
2016
Abstract Properties of porous materials, abundant both in nature and industry, have broad influences on societies via, e.g. oil recovery, erosion, and propagation of pollutants. The internal structure of many porous materials involves multiple scales which hinders research on the relation between structure and transport properties: typically laboratory experiments cannot distinguish contributions from individual scales while computer simulations cannot capture multiple scales due to limited capabilities. Thus the question arises how large domain sizes can in fact be simulated with modern computers. This question is here addressed using a realistic test case; it is demonstrated that current …
GPU-laskennan optimointi
2013
Näytönohjaimet, grafiikkasuorittimet, tarjoavat rinnakkaisen laskennan alustan, jossa voidaan suorittaa ohjelmakoodia satojen ydinten toimesta. Tämä alusta mahdollistaa matemaattisesti työläiden ongelmien ratkaisemisen tehokkaasti. Grafiikkasuorittimen rinnakkainen suoritusympäristö kuitenkin eroaa suuresti tietokoneen suorittimen peräkkäisestä suoritusympäristöstä. Ongelmien ratkaisemiseksi tehokkaasti rinnakkaisympäristössä on noudettava ohjelmointimenetelmiä, jotka soveltuvat erityisesti rinnakkaisympäristöön. Tässä työssä tarkastellaan rinnakkaisen laskennan perusteita, miten erilaiset ohjelmointimenetelmät vaikuttavat ohjelman suoriutumiseen grafiikkasuorittimella sekä miten voidaan sa…
Architecture-Driven Level Set Optimization: From Clustering to Sub-pixel Image Segmentation
2016
Thanks to their effectiveness, active contour models (ACMs) are of great interest for computer vision scientists. The level set methods (LSMs) refer to the class of geometric active contours. Comparing with the other ACMs, in addition to subpixel accuracy, it has the intrinsic ability to automatically handle topological changes. Nevertheless, the LSMs are computationally expensive. A solution for their time consumption problem can be hardware acceleration using some massively parallel devices such as graphics processing units (GPUs). But the question is: which accuracy can we reach while still maintaining an adequate algorithm to massively parallel architecture? In this paper, we attempt to…
CUDASW++ 3.0: accelerating Smith-Waterman protein database search by coupling CPU and GPU SIMD instructions
2013
Background The maximal sensitivity for local alignments makes the Smith-Waterman algorithm a popular choice for protein sequence database search based on pairwise alignment. However, the algorithm is compute-intensive due to a quadratic time complexity. Corresponding runtimes are further compounded by the rapid growth of sequence databases. Results We present CUDASW++ 3.0, a fast Smith-Waterman protein database search algorithm, which couples CPU and GPU SIMD instructions and carries out concurrent CPU and GPU computations. For the CPU computation, this algorithm employs SSE-based vector execution units as accelerators. For the GPU computation, we have investigated for the first time a GPU …
Real-time Sound Source Localization on Graphics Processing Units
2013
Abstract Sound source localization is an important topic in microphone array signal processing applications, such as camera steering systems, human-machine interaction or surveillance systems. The Steered Response Power with Phase Transform (SRP- PHAT) algorithm is one of the most well-known approaches for sound source localization due to its good performance in noisy and reverberant environments. The algorithm analyzes the sound power captured by a microphone array on a grid of spatial points in a given room. While localization accuracy can be improved by using a high resolution spatial grid and a high number of microphones, performing the localization task in these circumstances requires …