DENSE MATRIX-VECTOR MULTIPLICATION ON THE CUDA ARCHITECTURE

NORIYUKI FUJIMOTO

doi:10.1142/s0129626408003545

DENSE MATRIX-VECTOR MULTIPLICATION ON THE CUDA ARCHITECTURE

DOI Web Site 被引用文献1件

NORIYUKI FUJIMOTO

Department of Mathematics and Information Sciences, Graduate School of Science, Osaka Prefecture University, 1-1 Gakuen-Cho, Naka-ku, Sakai-Shi, Osaka, 599-8531, Japan

抄録

<jats:p> Recently GPUs have acquired the ability to perform fast general purpose computation by running thousands of threads concurrently. This paper presents a new algorithm for dense matrix-vector multiplication on the NVIDIA CUDA architecture. The experiments are conducted on a PC with GeForce 8800GTX and 2.0 GHz Intel Xeon E5335 CPU. The results show that the proposed algorithm runs a maximum of 11.19 times faster than NVIDIA's BLAS library CUBLAS 1.1 on the GPU and 35.15 times faster than the Intel Math Kernel Library 9.1 on a single core x86 with SSE3 SIMD instructions. The performance of Jacobi's iterative method for solving linear equations, which includes the data transfer time between CPU and GPU, shows that the proposed algorithm is practical for real applications. </jats:p>

収録刊行物

Parallel Processing Letters

Parallel Processing Letters 18 (04), 511-530, 2008-12

World Scientific Pub Co Pte Lt

被引用文献 (1)*注記

詳細情報詳細情報について

CRID

1361137045312994304
DOI

10.1142/s0129626408003545
ISSN

1793642X

01296264
Web Site

https://www.worldscientific.com/doi/pdf/10.1142/S0129626408003545
データソース種別
- Crossref

書き出し

問題の指摘

ページトップへ

DENSE MATRIX-VECTOR MULTIPLICATION ON THE CUDA ARCHITECTURE

抄録

収録刊行物

被引用文献 (1)*注記

キーワード

詳細情報 詳細情報について

書き出し

問題の指摘

参加プロジェクトリスト

詳細情報詳細情報について