Automatic Tuning for Parallel FFTs on Massively Parallel Platforms with Multi-Core Processors(<Special Topics>Auto-Tuning for Numerical Computations (continued))

Bibliographic Information

Other Title
  • マルチコア超並列環境におけるFFTの自動チューニング(<特集>数値計算のための自動チューニング(続))
  • マルチコア超並列環境におけるFFTの自動チューニング
  • マルチコア チョウヘイレツ カンキョウ ニ オケル FFT ノ ジドウ チューニング

Search this article

Abstract

This paper presents an automatic performance tuning for parallel fast Fourier transforms (FFTs) on massively parallel platforms with multi-core processors. A blocking algorithm for parallel FFTs utilizes cache memory effectively. Since the optimal block size may depend on the problem size, we propose a method to determine the optimal block size that minimizes the number of cache misses. In addition, parallel FFTs require intensive all-to-all communication, which affects the performance of FFTs. An automatic tuning of all-to-all communication is also implemented. The performance results demonstrate that the proposed implementation of parallel FFTs with automatic performance tuning is efficient for improving the performance.

Journal

References(15)*help

See more

Details

  • CRID
    1390001205766079104
  • NII Article ID
    110008007184
  • NII Book ID
    AN10288886
  • ISSN
    09172270
    24321982
  • DOI
    10.11540/bjsiam.20.4_279
  • NDL BIB ID
    10954610
  • Text Lang
    ja
  • Data Source
    • JaLC
    • NDL
    • CiNii Articles
  • Abstract License Flag
    Disallowed

Report a problem

Back to top