長行を折畳む疎行列ベクトル積方式とGather機能付メモリによる高速化

田邊, 昇, 小郷, 絢子, 小川, 裕佳, 高田, 雅美, 城, 和貴

書誌事項

タイトル別名

チョウコウオオリタタムソギョウレツベクトルセキホウシキト Gather キノウツキメモリニヨルコウソクカ
Acceleration of Sparse Matrix-vector Product by Algorithm with Folding Long Rows and Memory with Gather Functions

この論文をさがす

抄録

本論文では，疎行列ベクトル積のベクトルがデバイスメモリに入りきらないほど大きな問題向けの並列処理方式を提案する．提案手法はGather機能を有する大容量機能メモリ（メモリアクセラレータ）が実現できると仮定し，これが整列したデータをGPUがアクセスする．長い行を適切な折り目で折り畳む提案アルゴリズム（Fold法）が負荷分散を改善し並列性を高める．これが生成した行列を転置して用いる方式はGPU向けのアクセス順序にしている．フロリダ大学の疎行列コレクションを用いて提案方式の性能評価を行った．その結果，間接アクセスの直接アクセス化により，単体性能は既存研究の最大4.1倍に向上した．GPU内キャッシュが溢れる心配もない．GPU間の1対1通信を完全に排除可能にした構成によりスケーラビリティは保証されており，機能メモリとのインタフェースのバースト転送バンド幅で制約される単体性能にノード数を乗じたものが並列実効性能となる．

In this paper, we propose a parallel processing strategy for huge scale sparse matrix-vector product whose vector cannot be held on a device memory. The strategy uses a system with GPUs accessing data aligned by functional memories named Memory Accelerator with gather function. This strategy assumes the feasibility of the Memory Accelerator. Proposed algorithm named “Fold method” improves load distribution and parallelism. Transposing matrix produced by it improves access sequence for GPU. We evaluate the performance of proposed strategy with University of Florida Sparse Matrix Collection. The result shows the 4.1 times acceleration over the existing performance record with a GPU in the maximum case. There is no risk of performance degradation by overflowing cache capacity on GPU. Because of the architecture without inter-GPU communications, scalability is guaranteed. Therefore, parallel effective performance is the product of number of nodes and single GPU performance limited by burst transfer bandwidth of interface of functional memory.

収録刊行物

情報処理学会論文誌コンピューティングシステム（ACS）

情報処理学会論文誌コンピューティングシステム（ACS） 5 (4), 112-124, 2012-08-24

東京 : 情報処理学会

詳細情報詳細情報について

CRID: 1050564287855371904

NII論文ID: 40019469450

NII書誌ID: AA11833852

ISSN: 18827829; 18827772; 03875806

NDL書誌ID: 024049719

Web Site: http://id.nii.ac.jp/1001/00083642/; http://id.ndl.go.jp/bib/024049719; https://ndlsearch.ndl.go.jp/books/R000000004-I024049719

本文言語コード: ja

資料種別: article

データソース種別

IRDB
NDL
CiNii Articles

長行を折畳む疎行列ベクトル積方式とGather機能付メモリによる高速化

書誌事項

この論文をさがす

抄録

収録刊行物

キーワード

詳細情報詳細情報について

書き出し

問題の指摘

長行を折畳む疎行列ベクトル積方式とGather機能付メモリによる高速化

書誌事項

この論文をさがす

抄録

収録刊行物

キーワード

詳細情報 詳細情報について

書き出し

問題の指摘

参加プロジェクトリスト

詳細情報詳細情報について