Open AccessOpen Access


Efficient Concurrent L1-Minimization Solvers on GPUs

Xinyue Chu1, Jiaquan Gao1,*, Bo Sheng2

1 Jiangsu Key Laboratory for NSLSCS, School of Computer and Electronic Information, Nanjing Normal University, Nanjing 210023, China
2 Department of Computer Science, University of Massachusetts Boston, MA 02125, USA

* Corresponding Author: Jiaquan Gao. Email:

Computer Systems Science and Engineering 2021, 38(3), 305-320.


Given that the concurrent L1-minimization (L1-min) problem is often required in some real applications, we investigate how to solve it in parallel on GPUs in this paper. First, we propose a novel self-adaptive warp implementation of the matrix-vector multiplication (Ax) and a novel self-adaptive thread implementation of the matrix-vector multiplication (ATx), respectively, on the GPU. The vector-operation and inner-product decision trees are adopted to choose the optimal vector-operation and inner-product kernels for vectors of any size. Second, based on the above proposed kernels, the iterative shrinkage-thresholding algorithm is utilized to present two concurrent L1-min solvers from the perspective of the streams and the thread blocks on a GPU, and optimize their performance by using the new features of GPU such as the shuffle instruction and the read-only data cache. Finally, we design a concurrent L1-min solver on multiple GPUs. The experimental results have validated the high effectiveness and good performance of our proposed methods.


Cite This Article

X. Chu, J. Gao and B. Sheng, "Efficient concurrent l1-minimization solvers on gpus," Computer Systems Science and Engineering, vol. 38, no.3, pp. 305–320, 2021.

This work is licensed under a Creative Commons Attribution 4.0 International License , which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
  • 1192


  • 856


  • 0


Share Link