Automatic Tuning for Parallel FFTs on Massively Parallel Platforms with Multi-Core Processors(<Special Topics>Auto-Tuning for Numerical Computations (continued))

Bibliographic Information

Other Title
  • マルチコア超並列環境におけるFFTの自動チューニング(<特集>数値計算のための自動チューニング(続))
  • マルチコア超並列環境におけるFFTの自動チューニング
  • マルチコア チョウヘイレツ カンキョウ ニ オケル FFT ノ ジドウ チューニング

Search this article

Description

This paper presents an automatic performance tuning for parallel fast Fourier transforms (FFTs) on massively parallel platforms with multi-core processors. A blocking algorithm for parallel FFTs utilizes cache memory effectively. Since the optimal block size may depend on the problem size, we propose a method to determine the optimal block size that minimizes the number of cache misses. In addition, parallel FFTs require intensive all-to-all communication, which affects the performance of FFTs. An automatic tuning of all-to-all communication is also implemented. The performance results demonstrate that the proposed implementation of parallel FFTs with automatic performance tuning is efficient for improving the performance.

Journal

References(15)*help

See more

Details 詳細情報について

Report a problem

Back to top