Parallel LU-decomposition on Pentium Streaming SIMD Extensions
説明
Solving systems of linear equations is central in scientific computation. In this paper, we focus on using Intel’s Pentium Streaming SIMD Extensions (SSE) for parallel implementation of LU-decomposition algorithm. Two implementations (non-SSE and SSE) of LU-decomposition are compared. Moreover, two different variants of the algorithm for the SSE version are also compared. Our results demonstrate an average performance of 2.25 times faster than the non-SSE version. This speedup is higher than 1.74 times the speedup of Intel’s SSE implementation. The source of the speedup is highly reusing of loaded data by efficiently organizing SSE instructions.