Open Access iconOpen Access

ARTICLE

CNN Accelerator Using Proposed Diagonal Cyclic Array for Minimizing Memory Accesses

Hyun-Wook Son1, Ali A. Al-Hamid1,2, Yong-Seok Na1, Dong-Yeong Lee1, Hyung-Won Kim1,*

1 Department of Electronics, College of Electrical and Computer Engineering, Chungbuk National University, Cheongju, 28644, Korea
2 Department of Electrical Engineering, College of Engineering, Al-Azhar University, Cairo, 11651, Egypt

* Corresponding Author: Hyung-Won Kim. Email: email

Computers, Materials & Continua 2023, 76(2), 1665-1687. https://doi.org/10.32604/cmc.2023.038760

Abstract

This paper presents the architecture of a Convolution Neural Network (CNN) accelerator based on a new processing element (PE) array called a diagonal cyclic array (DCA). As demonstrated, it can significantly reduce the burden of repeated memory accesses for feature data and weight parameters of the CNN models, which maximizes the data reuse rate and improve the computation speed. Furthermore, an integrated computation architecture has been implemented for the activation function, max-pooling, and activation function after convolution calculation, reducing the hardware resource. To evaluate the effectiveness of the proposed architecture, a CNN accelerator has been implemented for You Only Look Once version 2 (YOLOv2)-Tiny consisting of 9 layers. Furthermore, the methodology to optimize the local buffer size with little sacrifice of inference speed is presented in this work. We implemented the proposed CNN accelerator using a Xilinx Zynq ZCU102 Ultrascale+ Field Programmable Gate Array (FPGA) and ISE Design Suite. The FPGA implementation uses 34,336 Look Up Tables (LUTs), 576 Digital Signal Processing (DSP) blocks, and an on-chip memory of only 58 KB, and it could achieve accuracies of 57.92% and 56.42% mean Average Precession @0.5 thresholds for intersection over union (mAP@0.5) using quantized 16-bit and 8-bit full integer data manipulation with only 0.68% as a loss for 8-bit version and computation time of 137.9 and 69 ms for each input image respectively using a clock speed of 200 MHz. These speeds are expected to be doubled five times using a clock speed of 1 GHz if implemented in a silicon System on Chip (SoC) using a sub-micron process.

Keywords


Cite This Article

APA Style
Son, H., Al-Hamid, A.A., Na, Y., Lee, D., Kim, H. (2023). CNN accelerator using proposed diagonal cyclic array for minimizing memory accesses. Computers, Materials & Continua, 76(2), 1665-1687. https://doi.org/10.32604/cmc.2023.038760
Vancouver Style
Son H, Al-Hamid AA, Na Y, Lee D, Kim H. CNN accelerator using proposed diagonal cyclic array for minimizing memory accesses. Comput Mater Contin. 2023;76(2):1665-1687 https://doi.org/10.32604/cmc.2023.038760
IEEE Style
H. Son, A.A. Al-Hamid, Y. Na, D. Lee, and H. Kim "CNN Accelerator Using Proposed Diagonal Cyclic Array for Minimizing Memory Accesses," Comput. Mater. Contin., vol. 76, no. 2, pp. 1665-1687. 2023. https://doi.org/10.32604/cmc.2023.038760



cc This work is licensed under a Creative Commons Attribution 4.0 International License , which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
  • 383

    View

  • 214

    Download

  • 0

    Like

Share Link