FLAT: An MPI Friendly GPGPU Programming Framework for GPU Clusters

島, 圭吾, Keigo, Shima

GPU搭載PCクラスタで動作するプログラムは，GPU上の処理を記述するコードと通信処理を行うCPUのコードで構成される．GPUコードは並列化されたアルゴリズムを高速に実行し，CPUはノード間の通信処理を担当する．ノード間通信にはMPIの利用が一般的だがGPUコードには記述できないため，並列化の効果を引き出すためには，プログラマはCPUとGPUのデータの移動を考えつつCPUコードとGPUコードを並行して実装することになる．そこで，GPU間のデータ通信に関わるプログラミングコストを低減させるために，MPIを埋め込み可能なGPUプログラミングフレームワーク“FLAT”を提案する．FLATを用いることでGPUコードにMPI関数が記述できるようになるため，GPU間で転送されるデータが明確化される．本論文では，まず，FLATの実行モデルと実装方法について述べる．その後，LivermoreループLoop18，オプティカルフロー計算の2つの実プログラムを用いてFLATの有効性と実行性能を示す．実験の結果，GPUコードの計算粒度が粗粒度の場合，FLATの利用による性能低下率は，3%以下であることが確認された．

A program for a PC cluster which equips GPUs consists of two types of code, for GPUs and for CPUs. The GPU code executes parallelized algorithms to introduce high speed computing supported by a CPU code which performs communication with other nodes. Although MPI library is commonly utilized to transfer data in the CPU code, MPI functions can not be written in the GPU code. Programmers are forced to implement CPU and GPU codes alternately with taking care of data movement among nodes. In order to reduce software development costs, we propose a programming framework called FLAT which enables GPU codes to embed MPI functions. This paper describes execution model and implementation of FLAT, and discusses availability and performance obtained by two case studies, Livermore Loop18 and optical flow programs. Through the experimental results, we confirmed that FLAT increases readability in synthesized GPU codes with maintaining bearable performance degradation, which is less than 3% for a coarse-grained parallel program.

FLAT: An MPI Friendly GPGPU Programming Framework for GPU Clusters

Bibliographic Information

Search this article

Description

Journal

Keywords

Details 詳細情報について

Export

Report a problem