6533b7defe1ef96bd1275b8c

RESEARCH PRODUCT

Iterative sparse matrix-vector multiplication for accelerating the block Wiedemann algorithm over GF(2) on multi-graphics processing unit systems

Hans AribowoHoang-vu DangBertil Schmidt

subject

Block Wiedemann algorithmComputer Networks and CommunicationsComputer scienceGraphics processing unitSparse matrix-vector multiplicationGPU clusterParallel computingGF(2)Computer Science ApplicationsTheoretical Computer ScienceGeneral number field sieveMatrix (mathematics)Computational Theory and MathematicsFactorizationLinear algebraMultiplicationComputer Science::Operating SystemsSoftwareInteger factorizationSparse matrix

description

SUMMARY The block Wiedemann (BW) algorithm is frequently used to solve sparse linear systems over GF(2). Iterative sparse matrix–vector multiplication is the most time-consuming operation. The necessity to accelerate this step is motivated by the application of BW to very large matrices used in the linear algebra step of the number field sieve (NFS) for integer factorization. In this paper, we derive an efficient CUDA implementation of this operation by using a newly designed hybrid sparse matrix format. This leads to speedups between 4 and 8 on a single graphics processing unit (GPU) for a number of tested NFS matrices compared with an optimized multicore implementation. We further present a GPU cluster implementation of the full BW for NFS matrices. A small-sized GPU cluster is able to outperform CPU clusters of larger size for large matrices such as the one obtained from the Kilobit special NFS factorization. Copyright © 2012 John Wiley & Sons, Ltd.

https://doi.org/10.1002/cpe.2896