-
- DEGAWA Yuya
- Graduate School of Information Science and Technology, The University of Tokyo
-
- KOIZUMI Toru
- Graduate School of Information Science and Technology, The University of Tokyo
-
- NAKAMURA Tomoki
- Graduate School of Information Science and Technology, The University of Tokyo
-
- SHIOYA Ryota
- Graduate School of Information Science and Technology, The University of Tokyo
-
- KADOMOTO Junichiro
- Graduate School of Information Science and Technology, The University of Tokyo
-
- IRIE Hidetsugu
- Graduate School of Information Science and Technology, The University of Tokyo
-
- SAKAI Shuichi
- Graduate School of Information Science and Technology, The University of Tokyo
抄録
<p>One of the performance bottlenecks of a processor is the front-end that supplies instructions. Various techniques, such as cache replacement algorithms and hardware prefetching, have been investigated to facilitate smooth instruction supply at the front-end and to improve processor performance. In these approaches, one of the most important factors has been the reduction in the number of instruction cache misses. By using the number of instruction cache misses or derived factors, previous studies have explained the performance improvements achieved by their proposed methods. However, we found that the number of instruction cache misses does not always explain performance changes well in modern processors. This is because the front-end in modern processors handles subsequent instruction cache misses in overlap with earlier ones. Based on this observation, we propose a novel factor: the number of miss regions. We define a region as a sequence of instructions from one branch misprediction to the next, while we define a miss region as a region that contains one or more instruction cache misses. At the boundary of each region, the pipeline is flushed owing to a branch misprediction. Thus, cache misses after this boundary are not handled in overlap with cache misses before the boundary. As a result, the number of miss regions is equal to the number of cache misses that are processed without overlap. In this paper, we demonstrate that the number of miss regions can well explain the variation in performance through mathematical models and simulation results. The results show that the model explains cycles per instruction with an average error of 1.0% and maximum error of 4.1% when applying an existing prefetcher to the instruction cache. The idea of miss regions highlights that instruction cache misses and branch mispredictions interact with each other in processors with a decoupled front-end. We hope that considering this interaction will motivate the development of fast performance estimation methods and new microarchitectural methods.</p>
収録刊行物
-
- IEICE Transactions on Information and Systems
-
IEICE Transactions on Information and Systems E106.D (12), 1960-1968, 2023-12-01
一般社団法人 電子情報通信学会
- Tweet
詳細情報 詳細情報について
-
- CRID
- 1390861305860372480
-
- ISSN
- 17451361
- 09168532
-
- 本文言語コード
- en
-
- データソース種別
-
- JaLC
- Crossref
-
- 抄録ライセンスフラグ
- 使用不可