TY - EJOU AU - Wang, Yuejiao AU - Ma, Zhong AU - Yang, Chaojie AU - Yang, Yu AU - Wei, Lu TI - Reinforcement Learning Based Quantization Strategy Optimal Assignment Algorithm for Mixed Precision T2 - Computers, Materials \& Continua PY - 2024 VL - 79 IS - 1 SN - 1546-2226 AB - The quantization algorithm compresses the original network by reducing the numerical bit width of the model, which improves the computation speed. Because different layers have different redundancy and sensitivity to data bit width. Reducing the data bit width will result in a loss of accuracy. Therefore, it is difficult to determine the optimal bit width for different parts of the network with guaranteed accuracy. Mixed precision quantization can effectively reduce the amount of computation while keeping the model accuracy basically unchanged. In this paper, a hardware-aware mixed precision quantization strategy optimal assignment algorithm adapted to low bit width is proposed, and reinforcement learning is used to automatically predict the mixed precision that meets the constraints of hardware resources. In the state-space design, the standard deviation of weights is used to measure the distribution difference of data, the execution speed feedback of simulated neural network accelerator inference is used as the environment to limit the action space of the agent, and the accuracy of the quantization model after retraining is used as the reward function to guide the agent to carry out deep reinforcement learning training. The experimental results show that the proposed method obtains a suitable model layer-by-layer quantization strategy under the condition that the computational resources are satisfied, and the model accuracy is effectively improved. The proposed method has strong intelligence and certain universality and has strong application potential in the field of mixed precision quantization and embedded neural network model deployment. KW - Mixed precision quantization; quantization strategy optimal assignment; reinforcement learning; neural network model deployment DO - 10.32604/cmc.2024.047108