GPGPUフレームワークMESI-CUDAのマルチGPU環境への対応

山本, 怜, 大野, 和彦

GPGPUの分野において，複数のGPUを搭載したマルチGPU環境を用いてより高い計算性能を実現する試みがなされている．現在主流の開発環境であるCUDAはマルチGPUに対応しているが，個々のGPUを明示的に操作する必要があり，プログラムの記述が煩雑になる．さらに，1台のホスト上に搭載できるGPUの個数が限られているため，より多くのGPUを利用する大規模な環境は分散型マルチGPU環境となる．この場合，同一ホスト上のGPUか否かで通信オーバヘッドを考慮するなど，プログラムの記述やチューニングはさらに難易度が高くなる．我々はCUDAよりプログラム記述が容易なフレームワークMESI-CUDAを開発している．MESI-CUDAはCPU・GPUコアが単一の仮想共有メモリにアクセスするプログラミングモデルを採用している．処理系はホストメモリ・デバイスメモリの確保・解放やデータ転送などのコードを自動生成することで，このモデルで記述されたプログラムをCUDAコードに変換する．本提案では，このモデルをそのままマルチGPU環境に拡張することで，低レベルな各GPUへの操作の記述を不要にする．また，論理的なスレッド生成方式を導入し，ユーザが生成を指示したスレッド群は実行時スケジューラにより適切なGPUへ自動的に割り当てる．コンパイラは各スレッドのデータアクセス範囲などを静的解析し，実行時スケジューラはデータ転送量の最小化などの自動最適化を実現する．

Recently, GPGPU is used for high performance computing. Although multi-GPU is expected as the platform for higher performance, current standard programming environment CUDA requires explicit operation on the individual GPUs. Furthermore, hand-tuning is necessary to use all GPUs efficiently. Because only a few GPUs can be physically installed on a single host, a large-scale multi-GPU environment will be a cluster of hosts connected by the network. On such a environment, the user must specify inter/intra-host communication considering the difference of the overhead. Thus the programming and tuning will be more difficult. We are developing a new programming framework named MESI-CUDA which enables easier GPU programming than CUDA. In this paper, we propose an extension of MESI-CUDA to support multi-GPU environments. Current MESI-CUDA provides a simple programming model that every CPU/GPU cores accesses a single virtual shared memory. The compiler translates a MESI-CUDA program to CUDA program automatically generating memory management and data transfer code. We extend this model to support multi-GPU environments, hiding the individual GPUs from the user and eliminating low-level specifications. We introduce a new logical thread creation scheme; the user creates GPU threads without specifying the target GPU and the runtime thread scheduler automatically invokes physical threads on the available GPUs. The MESI-CUDA compiler makes static analysis to obtain the data access range of each thread. Using the analysis result, the runtime scheduler performs automatic optimization such as minimizing data transfer.

GPGPUフレームワークMESI-CUDAのマルチGPU環境への対応

書誌事項

この論文をさがす

抄録

収録刊行物

キーワード

詳細情報詳細情報について

書き出し

問題の指摘

GPGPUフレームワークMESI-CUDAのマルチGPU環境への対応

書誌事項

この論文をさがす

抄録

収録刊行物

キーワード

詳細情報 詳細情報について

書き出し

問題の指摘

詳細情報詳細情報について