Automatic Tuning for Parallel FFTs on Massively Parallel Platforms with Multi-Core Processors(<Special Topics>Auto-Tuning for Numerical Computations (continued))

Bibliographic Information

Other Title
  • マルチコア超並列環境におけるFFTの自動チューニング(<特集>数値計算のための自動チューニング(続))
  • マルチコア超並列環境におけるFFTの自動チューニング
  • マルチコア チョウヘイレツ カンキョウ ニ オケル FFT ノ ジドウ チューニング

Search this article


This paper presents an automatic performance tuning for parallel fast Fourier transforms (FFTs) on massively parallel platforms with multi-core processors. A blocking algorithm for parallel FFTs utilizes cache memory effectively. Since the optimal block size may depend on the problem size, we propose a method to determine the optimal block size that minimizes the number of cache misses. In addition, parallel FFTs require intensive all-to-all communication, which affects the performance of FFTs. An automatic tuning of all-to-all communication is also implemented. The performance results demonstrate that the proposed implementation of parallel FFTs with automatic performance tuning is efficient for improving the performance.



See more

Details 詳細情報について

Report a problem

Back to top