Open Access


Implementing Delay Multiply and Sum Beamformer on a Hybrid CPU-GPU Platform for Medical Ultrasound Imaging Using OpenMP and CUDA

Ke Song1,*, Paul Liu2, Dongquan Liu3
1 School of Mathematics and Information Engineering, Chongqing University of Education, Chongqing, 400065, China
2 Stork Healthcare, Ltd., Chengdu, 610041, China
3 Saset (Chengdu) Inc., Chengdu, 610041, China
* Corresponding Author: Ke Song. Email:
(This article belongs to this Special Issue: Computer Methods in Bio-mechanics and Biomedical Engineering)

Computer Modeling in Engineering & Sciences 2021, 128(3), 1133-1150.

Received 30 January 2021; Accepted 10 May 2021; Issue published 11 August 2021


A novel beamforming algorithm named Delay Multiply and Sum (DMAS), which excels at enhancing the resolution and contrast of ultrasonic image, has recently been proposed. However, there are nested loops in this algorithm, so the calculation complexity is higher compared to the Delay and Sum (DAS) beamformer which is widely used in industry. Thus, we proposed a simple vector-based method to lower its complexity. The key point is to transform the nested loops into several vector operations, which can be efficiently implemented on many parallel platforms, such as Graphics Processing Units (GPUs), and multi-core Central Processing Units (CPUs). Consequently, we considered to implement this algorithm on such a platform. In order to maximize the use of computing power, we use the GPUs and multi-core CPUs in mixture. The platform used in our test is a low cost Personal Computer (PC), where a GPU and a multi-core CPU are installed. The results show that the hybrid use of a CPU and a GPU can get a significant performance improvement in comparison with using a GPU or using a multi-core CPU alone. The performance of the hybrid system is increased by about 47%–63% compared to a single GPU. When 32 elements are used in receiving, the fame rate basically can reach 30 fps. In the best case, the frame rate can be increased to 40 fps.


Beamforming; delay multiply and sum; graphics processing unit; multi-core central processing unit

Cite This Article

Song, K., Liu, P., Liu, D. (2021). Implementing Delay Multiply and Sum Beamformer on a Hybrid CPU-GPU Platform for Medical Ultrasound Imaging Using OpenMP and CUDA. CMES-Computer Modeling in Engineering & Sciences, 128(3), 1133–1150.

This work is licensed under a Creative Commons Attribution 4.0 International License , which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
  • 1127


  • 1224


  • 0


Share Link

WeChat scan