Open AccessOpen Access


CNN Accelerator Using Proposed Diagonal Cyclic Array for Minimizing Memory Accesses

Hyun-Wook Son1, Ali A. Al-Hamid1,2, Yong-Seok Na1, Dong-Yeong Lee1, Hyung-Won Kim1,*

1 Department of Electronics, College of Electrical and Computer Engineering, Chungbuk National University, Cheongju, 28644, Korea
2 Department of Electrical Engineering, College of Engineering, Al-Azhar University, Cairo, 11651, Egypt

* Corresponding Author: Hyung-Won Kim. Email:

Computers, Materials & Continua 2023, 76(2), 1665-1687.


This paper presents the architecture of a Convolution Neural Network (CNN) accelerator based on a new processing element (PE) array called a diagonal cyclic array (DCA). As demonstrated, it can significantly reduce the burden of repeated memory accesses for feature data and weight parameters of the CNN models, which maximizes the data reuse rate and improve the computation speed. Furthermore, an integrated computation architecture has been implemented for the activation function, max-pooling, and activation function after convolution calculation, reducing the hardware resource. To evaluate the effectiveness of the proposed architecture, a CNN accelerator has been implemented for You Only Look Once version 2 (YOLOv2)-Tiny consisting of 9 layers. Furthermore, the methodology to optimize the local buffer size with little sacrifice of inference speed is presented in this work. We implemented the proposed CNN accelerator using a Xilinx Zynq ZCU102 Ultrascale+ Field Programmable Gate Array (FPGA) and ISE Design Suite. The FPGA implementation uses 34,336 Look Up Tables (LUTs), 576 Digital Signal Processing (DSP) blocks, and an on-chip memory of only 58 KB, and it could achieve accuracies of 57.92% and 56.42% mean Average Precession @0.5 thresholds for intersection over union (mAP@0.5) using quantized 16-bit and 8-bit full integer data manipulation with only 0.68% as a loss for 8-bit version and computation time of 137.9 and 69 ms for each input image respectively using a clock speed of 200 MHz. These speeds are expected to be doubled five times using a clock speed of 1 GHz if implemented in a silicon System on Chip (SoC) using a sub-micron process.


Cite This Article

H. Son, A. A. Al-Hamid, Y. Na, D. Lee and H. Kim, "Cnn accelerator using proposed diagonal cyclic array for minimizing memory accesses," Computers, Materials & Continua, vol. 76, no.2, pp. 1665–1687, 2023.

This work is licensed under a Creative Commons Attribution 4.0 International License , which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
  • 118


  • 66


  • 0


Share Link