DENSE MATRIX-VECTOR MULTIPLICATION ON THE CUDA ARCHITECTURE

  • NORIYUKI FUJIMOTO
    Department of Mathematics and Information Sciences, Graduate School of Science, Osaka Prefecture University, 1-1 Gakuen-Cho, Naka-ku, Sakai-Shi, Osaka, 599-8531, Japan

抄録

<jats:p> Recently GPUs have acquired the ability to perform fast general purpose computation by running thousands of threads concurrently. This paper presents a new algorithm for dense matrix-vector multiplication on the NVIDIA CUDA architecture. The experiments are conducted on a PC with GeForce 8800GTX and 2.0 GHz Intel Xeon E5335 CPU. The results show that the proposed algorithm runs a maximum of 11.19 times faster than NVIDIA's BLAS library CUBLAS 1.1 on the GPU and 35.15 times faster than the Intel Math Kernel Library 9.1 on a single core x86 with SSE3 SIMD instructions. The performance of Jacobi's iterative method for solving linear equations, which includes the data transfer time between CPU and GPU, shows that the proposed algorithm is practical for real applications. </jats:p>

収録刊行物

被引用文献 (1)*注記

もっと見る

詳細情報 詳細情報について

問題の指摘

ページトップへ