Search results for "PSCR"
showing 4 items of 4 documents
On solving separable block tridiagonal linear systems using a GPU implementation of radix-4 PSCR method
2018
Partial solution variant of the cyclic reduction (PSCR) method is a direct solver that can be applied to certain types of separable block tridiagonal linear systems. Such linear systems arise, e.g., from the Poisson and the Helmholtz equations discretized with bilinear finite-elements. Furthermore, the separability of the linear system entails that the discretization domain has to be rectangular and the discretization mesh orthogonal. A generalized graphics processing unit (GPU) implementation of the PSCR method is presented. The numerical results indicate up to 24-fold speedups when compared to an equivalent CPU implementation that utilizes a single CPU core. Attained floating point perfor…
Fast Poisson solvers for graphics processing units
2013
Two block cyclic reduction linear system solvers are considered and implemented using the OpenCL framework. The topics of interest include a simplified scalar cyclic reduction tridiagonal system solver and the impact of increasing the radix-number of the algorithm. Both implementations are tested for the Poisson problem in two and three dimensions, using a Nvidia GTX 580 series GPU and double precision floating-point arithmetic. The numerical results indicate up to 6-fold speed increase in the case of the two-dimensional problems and up to 3- fold speed increase in the case of the three-dimensional problems when compared to equivalent CPU implementations run on a Intel Core i7 quad-core CPU…
On GPU-accelerated fast direct solvers and their applications in image denoising
2015
A parallel radix-4 block cyclic reduction algorithm
2014
A conventional block cyclic reduction algorithm operates by halving the size of the linear system at each reduction step, that is, the algorithm is a radix-2 method. An algorithm analogous to the block cyclic reduction known as the radix-q partial solution variant of the cyclic reduction (PSCR) method allows the use of higher radix numbers and is thus more suitable for parallel architectures as it requires fever reduction steps. This paper presents an alternative and more intuitive way of deriving a radix-4 block cyclic reduction method for systems with a coefficient matrix of the form tridiag{ − I,D, − I}. This is performed by modifying an existing radix-2 block cyclic reduction method. Th…