Search results for "Graphics processing unit"
showing 10 items of 42 documents
GPU-accelerated integral imaging and full-parallax 3D display using stereo-plenoptic camera system
2019
Abstract In this paper, we propose a novel approach to produce integral images ready to be displayed onto an integral-imaging monitor. Our main contribution is the use of commercial plenoptic camera, which is arranged in a stereo configuration. Our proposed set-up is able to record the radiance, spatial and angular, information simultaneously in each different stereo position. We illustrate our contribution by composing the point cloud from a pair of captured plenoptic images, and generate an integral image from the properly registered 3D information. We have exploited the graphics processing unit (GPU) acceleration in order to enhance the integral-image computation speed and efficiency. We…
Adapting hierarchical bidirectional inter prediction on a GPU-based platform for 2D and 3D H.264 video coding
2013
The H.264/AVC video coding standard introduces some improved tools in order to increase compression efficiency. Moreover, the multi-view extension of H.264/AVC, called H.264/MVC, adopts many of them. Among the new features, variable block-size motion estimation is one which contributes to high coding efficiency. Furthermore, it defines a different prediction structure that includes hierarchical bidirectional pictures, outperforming traditional Group of Pictures patterns in both scenarios: single-view and multi-view. However, these video coding techniques have high computational complexity. Several techniques have been proposed in the literature over the last few years which are aimed at acc…
Three-dimensional Fuzzy Kernel Regression framework for registration of medical volume data
2013
Abstract In this work a general framework for non-rigid 3D medical image registration is presented. It relies on two pattern recognition techniques: kernel regression and fuzzy c-means clustering. The paper provides theoretic explanation, details the framework, and illustrates its application to implement three registration algorithms for CT/MR volumes as well as single 2D slices. The first two algorithms are landmark-based approaches, while the third one is an area-based technique. The last approach is based on iterative hierarchical volume subdivision, and maximization of mutual information. Moreover, a high performance Nvidia CUDA based implementation of the algorithm is presented. The f…
Unified computing facility design based on open source software
2012
The article describes e-infrastructure development in Latvia and migration to national Cloud as regional partner facility (RPF) in European Union (EU). In Latvia many public and private Computing Clouds is in operation and problem is how to integrate these resources as one RPF and how to design one unified computing facility that is used for many different applications. The authors offer their solution at Cloud software as a Service and Hardware as a Service level which is based on usage of open source packaged bundles.
Multi-GPU Accelerated Multi-Spin Monte Carlo Simulations of the 2D Ising Model
2010
A Modern Graphics Processing unit (GPU) is able to perform massively parallel scientific computations at low cost. We extend our implementation of the checkerboard algorithm for the two-dimensional Ising model [T. Preis et al., Journal of Chemical Physics 228 (2009) 4468–4477] in order to overcome the memory limitations of a single GPU which enables us to simulate significantly larger systems. Using multi-spin coding techniques, we are able to accelerate simulations on a single GPU by factors up to 35 compared to an optimized single Central Processor Unit (CPU) core implementation which employs multi-spin coding. By combining the Compute Unified Device Architecture (CUDA) with the Message P…
Efficient and portable acceleration of quantum chemical many-body methods in mixed floating point precision using OpenACC compiler directives
2016
It is demonstrated how the non-proprietary OpenACC standard of compiler directives may be used to compactly and efficiently accelerate the rate-determining steps of two of the most routinely applied many-body methods of electronic structure theory, namely the second-order M{\o}ller-Plesset (MP2) model in its resolution-of-the-identity (RI) approximated form and the (T) triples correction to the coupled cluster singles and doubles model (CCSD(T)). By means of compute directives as well as the use of optimized device math libraries, the operations involved in the energy kernels have been ported to graphics processing unit (GPU) accelerators, and the associated data transfers correspondingly o…
Concept of virtual machine based high resolution display wall
2014
This paper presents the scalability and hardware dependency problems found in existing solutions in the high resolution display wall domain and proposes a new solution. Authors propose hosting the system that provides the visual content for the display wall inside a virtual machine. In such way any needed configuration of displays and resolutions can be applied to the graphics processing unit simulated by the virtualization system. The frame buffer content of the virtual graphics processing unit is then split, encoded with H.264 and sent over gigabit Ethernet as an RTP stream to the display wall. The display wall is driven by Raspberry Pi embedded devices that receive the stream, decode it …
On the Use of GPU for Accelerating Communication-Aware Mapping Techniques
2015
Different communication-aware mapping techniques were proposed in recent years for improving the performance of distributed systems based on both, off-chip and on-chip networks. Some of these proposals were based on heuristic search for finding pseudo-optimal assignments of tasks and processing elements. However, the technology integration improvements have allowed a significant increase in the number of network nodes, requiring the acceleration of the heuristic search. In this paper, we propose a comparative study of the local search method used in a communication-aware mapping technique, when implemented on different parallel architectures. We compare the performance provided by a version…
GPU-laskennan optimointi
2013
Näytönohjaimet, grafiikkasuorittimet, tarjoavat rinnakkaisen laskennan alustan, jossa voidaan suorittaa ohjelmakoodia satojen ydinten toimesta. Tämä alusta mahdollistaa matemaattisesti työläiden ongelmien ratkaisemisen tehokkaasti. Grafiikkasuorittimen rinnakkainen suoritusympäristö kuitenkin eroaa suuresti tietokoneen suorittimen peräkkäisestä suoritusympäristöstä. Ongelmien ratkaisemiseksi tehokkaasti rinnakkaisympäristössä on noudettava ohjelmointimenetelmiä, jotka soveltuvat erityisesti rinnakkaisympäristöön. Tässä työssä tarkastellaan rinnakkaisen laskennan perusteita, miten erilaiset ohjelmointimenetelmät vaikuttavat ohjelman suoriutumiseen grafiikkasuorittimella sekä miten voidaan sa…
SIMULATING SPIN MODELS ON GPU: A TOUR
2012
The use of graphics processing units (GPUs) in scientific computing has gathered considerable momentum in the past five years. While GPUs in general promise high performance and excellent performance per Watt ratios, not every class of problems is equally well suitable for exploiting the massively parallel architecture they provide. Lattice spin models appear to be prototypic examples of problems suitable for this architecture, at least as long as local update algorithms are employed. In this review, I summarize our recent experience with the simulation of a wide range of spin models on GPU employing an equally wide range of update algorithms, ranging from Metropolis and heat bath updates,…