ClPy: A NumPy-Compatible Library Accelerated with OpenCL
説明
We developed ClPy, a Python library that supports OpenCL with a simple NumPy-like interface, and an extension of Chainer machine learning framework for OpenCL support. OpenCL emerged as a parallel computing standard with the goal of supporting a wide range of accelerators including GPUs (NVIDIA and others), FPGAs, DSPs, and CPUs. In contrast, many machine learning frameworks including Chainer have been built on top of CUDA, a predominant API for programming NVIDIA GPUs. As such, they cannot leverage other devices including non-NVIDIA GPUs and FPGAs. To facilitate developing cross-platform machine learning frameworks, ClPy is designed with an interface compatible with CuPy (CUDA Python), which itself has a NumPy-compatible interface and is used in Chainer to support both CPUs and NVIDIA GPUs. ClPy extends Chainer to any platform supporting OpenCL and can potentially do the same for other machine learning frameworks. This paper describes the design and implementation of ClPy and demonstrates it achieves reasonable performance on several machine learning applications. Our experiments show that the overhead of ClPy itself and serious performance degradation was caused by the lack of GPU-accelerated libraries of OpenCL including BLAS.
収録刊行物
-
- 2019 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW)
-
2019 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW) 933-940, 2019-05-01
IEEE