Open Access iconOpen Access

ARTICLE

crossmark

FPGA Optimized Accelerator of DCNN with Fast Data Readout and Multiplier Sharing Strategy

Tuo Ma, Zhiwei Li, Qingjiang Li*, Haijun Liu, Zhongjin Zhao, Yinan Wang

College of Electronic Science and Technology, National University of Defense Technology, Changsha, 410073, China

* Corresponding Author: Qingjiang Li. Email: email

Computers, Materials & Continua 2023, 77(3), 3237-3263. https://doi.org/10.32604/cmc.2023.045948

Abstract

With the continuous development of deep learning, Deep Convolutional Neural Network (DCNN) has attracted wide attention in the industry due to its high accuracy in image classification. Compared with other DCNN hardware deployment platforms, Field Programmable Gate Array (FPGA) has the advantages of being programmable, low power consumption, parallelism, and low cost. However, the enormous amount of calculation of DCNN and the limited logic capacity of FPGA restrict the energy efficiency of the DCNN accelerator. The traditional sequential sliding window method can improve the throughput of the DCNN accelerator by data multiplexing, but this method’s data multiplexing rate is low because it repeatedly reads the data between rows. This paper proposes a fast data readout strategy via the circular sliding window data reading method, it can improve the multiplexing rate of data between rows by optimizing the memory access order of input data. In addition, the multiplication bit width of the DCNN accelerator is much smaller than that of the Digital Signal Processing (DSP) on the FPGA, which means that there will be a waste of resources if a multiplication uses a single DSP. A multiplier sharing strategy is proposed, the multiplier of the accelerator is customized so that a single DSP block can complete multiple groups of 4, 6, and 8-bit signed multiplication in parallel. Finally, based on two strategies of appeal, an FPGA optimized accelerator is proposed. The accelerator is customized by Verilog language and deployed on Xilinx VCU118. When the accelerator recognizes the CIRFAR-10 dataset, its energy efficiency is 39.98 GOPS/W, which provides 1.73 × speedup energy efficiency over previous DCNN FPGA accelerators. When the accelerator recognizes the IMAGENET dataset, its energy efficiency is 41.12 GOPS/W, which shows 1.28 × −3.14 × energy efficiency compared with others.

Keywords


Cite This Article

T. Ma, Z. Li, Q. Li, H. Liu, Z. Zhao et al., "Fpga optimized accelerator of dcnn with fast data readout and multiplier sharing strategy," Computers, Materials & Continua, vol. 77, no.3, pp. 3237–3263, 2023.



cc This work is licensed under a Creative Commons Attribution 4.0 International License , which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
  • 252

    View

  • 152

    Download

  • 0

    Like

Share Link