High performance LU factorization for non-dedicated clusters

Kenjiro Taura, Toshio Endo, Akinori Yonezawa, Kenji Kaneda

doi:10.1109/ccgrid.2004.1336698

【Updated on May 12, 2025】 Integration of CiNii Dissertations and CiNii Books into CiNii Research
Trial version of CiNii Research Knowledge Graph Search feature is available on CiNii Labs
【Updated on June 30, 2025】Suspension and deletion of data provided by Nikkei BP
Regarding the recording of “Research Data” and “Evidence Data”

High performance LU factorization for non-dedicated clusters

DOI Open Access

Description

This paper describes an implementation of parallel LU factorization. The focus is to achieve high performance on non-dedicated clusters, where the number of available computing resources may be arbitrary and even dynamically changing. We accommodate joining/leaving processes by describing the algorithm in the Phoenix programming model. We achieve high performance in this setting by a combination of techniques including a latency tolerant communication and data partitioning that achieves both load balance and small communication volume for arbitrary and dynamically changing number of processors. We observed 130 GFlops with 128 processes on a 70-node dual 2.4GHz Xeon cluster, at matrix size = 46080. This performance is comparable to that of the High Performance Linpack (HPL). When cluster nodes are loaded by background processes, our implementation surpasses HPL.

Journal

IEEE International Symposium on Cluster Computing and the Grid, 2004. CCGrid 2004.

IEEE International Symposium on Cluster Computing and the Grid, 2004. CCGrid 2004. 678-685, 2004-11-08

IEEE

Details 詳細情報について

CRID

1874242817653919744
DOI

10.1109/ccgrid.2004.1336698
Data Source
- OpenAIRE

High performance LU factorization for non-dedicated clusters

Description

Journal

Details 詳細情報について

Export

Report a problem