Search results for "Central processing unit"
showing 10 items of 15 documents
Nvidia CUDA parallel processing of large FDTD meshes in a desktop computer
2020
The Finite Difference in Time Domain numerical (FDTD) method is a well know and mature technique in computational electrodynamics. Usually FDTD is used in the analysis of electromagnetic structures, and antennas. However still there is a high computational burden, which is a limitation for use in combination with optimization algorithms. The parallelization of FDTD to calculate in GPU is possible using Matlab and CUDA tools. For instance, the simulation of a planar array, with a three dimensional FDTD mesh 790x276x588, for 6200 time steps, takes one day -elapsed time- using the CPU of an Intel Core i3 at 2.4GHz in a personal computer, 8Gb RAM. This time is reduced 120 times when the calcula…
Scalability of GPU-Processed 3D Distance Maps for Industrial Environments
2018
This paper contains a benchmark analysis of the open source library GPU-Voxels together with the Robot Operating System (ROS) in large-scale industrial robotics environment. Six sensor nodes with embedded computing generate real-time point cloud data as ROS topics. The overall data from all sensor nodes is processed by a combination of CPU and GPU on a central ROS node. Experimental results demonstrate that the system is able to handle frame rates of 10 and 20 Hz with voxel sizes of 4, 6, 8 and 12 cm without saturation of the CPU or the GPU used by the GPU-Voxels library. The results in this paper show that ROS, in combination with GPU-Voxels, can be used as a viable solution for real-time …
Abstract ID: 133 Fast and accurate 3D dose distribution computations using artificial neural networks
2017
In radiation therapy, the trade-off between accuracy and speed is the key of the algorithms used in Treatment Planning Systems (TPS). For photon beams, commercial solutions generally relies on analytic algorithms, biased Monte Carlo, or heavily parallelized Monte Carlo on Graphics Processing Units (GPU). Alternatively, we propose an algorithm using Artificial Neural Network (ANN) to compute the dose distributions resulting from ionizing radiations inside a phantom [1] , [2] . We present an evolution of this platform taking into account modulated field sizes and shapes, and various orientations of the beam to the phantom. Firstly, tomodensitometry-based phantoms are created to validate the d…
A Fast GPU-Based Motion Estimation Algorithm for H.264/AVC
2012
H.264/AVC is the most recent predictive video compression standard to outperform other existing video coding standards by means of higher computational complexity. In recent years, heterogeneous computing has emerged as a cost-efficient solution for high-performance computing. In the literature, several algorithms have been proposed to accelerate video compression, but so far there have not been many solutions that deal with video codecs using heterogeneous systems. This paper proposes an algorithm to perform H.264/AVC inter prediction. The proposed algorithm performs the motion estimation, both with full-pixel and sub-pixel accuracy, using CUDA to assist the CPU, obtaining remarkable time …
Deep Learning-Based Methods for Prostate Segmentation in Magnetic Resonance Imaging
2021
Magnetic Resonance Imaging-based prostate segmentation is an essential task for adaptive radiotherapy and for radiomics studies whose purpose is to identify associations between imaging features and patient outcomes. Because manual delineation is a time-consuming task, we present three deep-learning (DL) approaches, namely UNet, efficient neural network (ENet), and efficient residual factorized convNet (ERFNet), whose aim is to tackle the fully-automated, real-time, and 3D delineation process of the prostate gland on T2-weighted MRI. While UNet is used in many biomedical image delineation applications, ENet and ERFNet are mainly applied in self-driving cars to compensate for limited hardwar…
Automated inventory management and security surveillance system using image processing techniques
2010
Efficient inventory management and allocation of cargo space is an imperative parameter in the industry today. In this paper we propose an algorithm to automate complete inventory unit with the aid of image processing and artificial intelligence algorithms. These algorithms work on the data acquired through image processing. This data then integrates with the Artificial Intelligence algorithm which in turn takes inputs from the sensors present on the robot. These sensors help in precise localization of the robot. The path planning algorithm takes input from virtual map buffer which is present in the CPU memory. This buffer in accordance with the sensor data and Image Processing data generat…
Multi-GPU Accelerated Multi-Spin Monte Carlo Simulations of the 2D Ising Model
2010
A Modern Graphics Processing unit (GPU) is able to perform massively parallel scientific computations at low cost. We extend our implementation of the checkerboard algorithm for the two-dimensional Ising model [T. Preis et al., Journal of Chemical Physics 228 (2009) 4468–4477] in order to overcome the memory limitations of a single GPU which enables us to simulate significantly larger systems. Using multi-spin coding techniques, we are able to accelerate simulations on a single GPU by factors up to 35 compared to an optimized single Central Processor Unit (CPU) core implementation which employs multi-spin coding. By combining the Compute Unified Device Architecture (CUDA) with the Message P…
Kriterien für die Auswahl von Elektronischen Rechenanlagen für Biomedizinische Forschungsinstitute
1979
Computers are now a recognized tool in biomedical research. They are used for the evaluation of data on one hand and on the other hand for data acquisition and control of experiments. Based on our experience, some suggestions concerning the structure of a mini-computer system suitable for a research laboratory are made. According to the two major classes of application, two sets or requirements arise. We argue that it is effective to use this system for data reduction and evaluation because a large percentage of tasks require program development or at least specific input data handling. Therefore, we call for a multi-user time-sharing system which should be equipped with a set of commands t…
Efficient and portable acceleration of quantum chemical many-body methods in mixed floating point precision using OpenACC compiler directives
2016
It is demonstrated how the non-proprietary OpenACC standard of compiler directives may be used to compactly and efficiently accelerate the rate-determining steps of two of the most routinely applied many-body methods of electronic structure theory, namely the second-order M{\o}ller-Plesset (MP2) model in its resolution-of-the-identity (RI) approximated form and the (T) triples correction to the coupled cluster singles and doubles model (CCSD(T)). By means of compute directives as well as the use of optimized device math libraries, the operations involved in the energy kernels have been ported to graphics processing unit (GPU) accelerators, and the associated data transfers correspondingly o…
Accelerated fluctuation analysis by graphic cards and complex pattern formation in financial markets
2009
The compute unified device architecture is an almost conventional programming approach for managing computations on a graphics processing unit (GPU) as a data-parallel computing device. With a maximum number of 240 cores in combination with a high memory bandwidth, a recent GPU offers resources for computational physics. We apply this technology to methods of fluctuation analysis, which includes determination of the scaling behavior of a stochastic process and the equilibrium autocorrelation function. Additionally, the recently introduced pattern formation conformity (Preis T et al 2008 Europhys. Lett. 82 68005), which quantifies pattern-based complex short-time correlations of a time serie…