Efficient GPU multitasking with latency minimization and cache boosting

Kim Jiho, Chu Minsung, Park Yongjun

doi:10.1587/elex.14.20161158

Abstract

<p>GPU spatial multitasking has been proven to be quite effective at executing different applications concurrently using SM partitioning. However, while it maximizes total throughput, latency-critical applications often cannot meet their deadlines due to the increased execution time. Furthermore, SM partitioning cannot allocate the appropriate L1 cache size per kernel. To solve these problems, this paper proposes a new application-aware resource allocation framework called GPU Fine-Tuner, for assigning appropriate resources to GPU kernels. To minimize the execution time of latency-constrained applications, it assigns them more SMs when performance is not affected. It also increases the cache size of SMs for cache-sensitive kernels using resource borrowing from neighbors for cache-insensitive kernels. Experimental results show that the Fine-Tuner outperforms GPU spatial multitasking with up to 15% less average latency without performance degradation.</p>

Journal

IEICE Electronics Express

IEICE Electronics Express 14 (7), 20161158-20161158, 2017

The Institute of Electronics, Information and Communication Engineers

Keywords

Details 詳細情報について

CRID: 1390282680195567488

NII Article ID: 130005589255

DOI: 10.1587/elex.14.20161158

ISSN: 13492543

Web Site: https://www.jstage.jst.go.jp/article/elex/14/7/14_14.20161158/_pdf

Text Lang: en

Data Source

JaLC
Crossref
CiNii Articles

Abstract License Flag: Disallowed

Export

Efficient GPU multitasking with latency minimization and cache boosting

Abstract

Journal

Citations (1)*help

References(14)*help

Keywords

Details 詳細情報について

Export

Report a problem