Automatic Tuning for Parallel FFTs on Massively Parallel Platforms with Multi-Core Processors(<Special Topics>Auto-Tuning for Numerical Computations (continued))
-
- Takahashi Daisuke
- 筑波大学大学院システム情報工学研究科
Bibliographic Information
- Other Title
-
- マルチコア超並列環境におけるFFTの自動チューニング(<特集>数値計算のための自動チューニング(続))
- マルチコア超並列環境におけるFFTの自動チューニング
- マルチコア チョウヘイレツ カンキョウ ニ オケル FFT ノ ジドウ チューニング
Search this article
Abstract
This paper presents an automatic performance tuning for parallel fast Fourier transforms (FFTs) on massively parallel platforms with multi-core processors. A blocking algorithm for parallel FFTs utilizes cache memory effectively. Since the optimal block size may depend on the problem size, we propose a method to determine the optimal block size that minimizes the number of cache misses. In addition, parallel FFTs require intensive all-to-all communication, which affects the performance of FFTs. An automatic tuning of all-to-all communication is also implemented. The performance results demonstrate that the proposed implementation of parallel FFTs with automatic performance tuning is efficient for improving the performance.
Journal
-
- Bulletin of the Japan Society for Industrial and Applied Mathematics
-
Bulletin of the Japan Society for Industrial and Applied Mathematics 20 (4), 279-286, 2010
The Japan Society for Industrial and Applied Mathematics
- Tweet
Details
-
- CRID
- 1390001205766079104
-
- NII Article ID
- 110008007184
-
- NII Book ID
- AN10288886
-
- ISSN
- 09172270
- 24321982
-
- NDL BIB ID
- 10954610
-
- Text Lang
- ja
-
- Data Source
-
- JaLC
- NDL
- CiNii Articles
-
- Abstract License Flag
- Disallowed