An Efficient GPU Implementation of Bulk Computation of the Eigenvalue Problem for Many Small Real Non-symmetric Matrices
-
- Tokura Hiroki
- Hiroshima University
-
- Honda Takumi
- Hiroshima University
-
- Ito Yasuaki
- Hiroshima University
-
- Nakano Koji
- Hiroshima University
-
- Nishino Mitsuya
- Hiroshima University
-
- Hirota Yushiro
- Hiroshima University
-
- Saeki Masami
- Hiroshima University
この論文をさがす
抄録
The main contribution of this paper is to present an efficient GPU implementation of bulk computation of eigenvalues for many small, non-symmetric, real matrices. This work is motivated by the necessity of such bulk computation in designing of control systems, which requires to compute the eigenvalues of hundreds of thousands non-symmetric real matrices of size up to 30x30. Several efforts have been devoted to accelerating the eigenvalue computation including computer languages, systems, environments supporting matrix manipulation offering specific libraries/function calls. Some of them are optimized for computing the eigenvalues of a very large matrix by parallel processing. However, such libraries/function calls are not aimed at accelerating the eigenvalues computation for a lot of small matrices. In our GPU implementation, we considered programming issues of the GPU architecture including warp divergence, coalesced access of the global memory, utilization of the shared memory, and so forth. In particular, we present two types of assignments of GPU threads to matrices and introduce three memory arrangements in the global memory. Furthermore, to hide CPU-GPU data transfer latency, overlapping computation on the GPU with the transfer is employed. Experimental results on NVIDIA TITAN X show that our GPU implementation attains a speed-up factor of up to 83.50 and 17.67 over the sequential CPU implementation and the parallel CPU implementation with eight threads on Intel Core i7-6700K, respectively.
収録刊行物
-
- International Journal of Networking and Computing
-
International Journal of Networking and Computing 7 (2), 227-247, 2017
IJNC編集委員会