FPGA‐accelerated deep convolutional neural networks for high throughput and energy efficiency

Yuran Qiao, Junzhong Shen, Tao Xiao, Qianming Yang, Mei Wen, Chunyuan Zhang

doi:10.1002/cpe.3850

<jats:title>Summary</jats:title><jats:p>Recent breakthroughs in the deep convolutional neural networks (CNNs) have led to great improvements in the accuracy of both vision and auditory systems. Characterized by their deep structures and large numbers of parameters, deep CNNs challenge the computational performance of today. Hardware specialization in the form of field‐programmable gate array offers a promising path towards major leaps in computational performance while achieving high‐energy efficiency.</jats:p><jats:p>In this paper, we focus on accelerating deep CNNs using the Xilinx Zynq‐zq7045 FPGA SoC. As most of the computational workload can be converted to matrix multiplications, we adopt a matrix multiplier‐based accelerator architecture. Dedicated units are designed to eliminate the conversion overhead. We also design a customized memory system according to the memory access pattern of CNNs. To make the accelerator easily usable by application developers, our accelerator supports Caffe, which is a widely used software framework of deep CNN. Different CNN models can be adopted by our accelerator, with good performance portability. The experimental results show that for a typical application of CNN, image classification, an average throughout of 77.8 GFLOPS is achieved, while the energy efficiency is 4.7× better than an Nvidia K20 GPGPU. © 2016 The Authors. <jats:italic>Concurrency and Computation: Practice and Experience</jats:italic> Published by John Wiley & Sons Ltd</jats:p>

FPGA‐accelerated deep convolutional neural networks for high throughput and energy efficiency

抄録

収録刊行物

被引用文献 (1)*注記

キーワード

詳細情報詳細情報について

書き出し

問題の指摘

FPGA‐accelerated deep convolutional neural networks for high throughput and energy efficiency

抄録

収録刊行物

被引用文献 (1)*注記

キーワード

詳細情報 詳細情報について

書き出し

問題の指摘

詳細情報詳細情報について