Tech Science Press - Publisher of Open Access Journals

Open Access

ARTICLE

A Subdomain-Based GPU Parallel Scheme for Accelerating Perdynamics Modeling with Reduced Graphics Memory

Zuokun Yang¹, Jun Li^1,2,*, Xin Lai^1,2, Lisheng Liu^1,2,*

CMES-Computer Modeling in Engineering & Sciences, Vol.146, No.1, 2026, DOI:10.32604/cmes.2026.075980 - 29 January 2026

Abstract Peridynamics (PD) demonstrates unique advantages in addressing fracture problems, however, its nonlocality and meshfree discretization result in high computational and storage costs. Moreover, in its engineering applications, the computational scale of classical GPU parallel schemes is often limited by the finite graphics memory of GPU devices. In the present study, we develop an efficient particle information management strategy based on the cell-linked list method and on this basis propose a subdomain-based GPU parallel scheme, which exhibits outstanding acceleration performance in specific compute kernels while significantly reducing graphics memory usage. Compared to the classical parallel scheme,… More >

Open Access

ARTICLE

A Hybrid Parallel Strategy for Isogeometric Topology Optimization via CPU/GPU Heterogeneous Computing

Zhaohui Xia^1,3, Baichuan Gao³, Chen Yu^2,*, Haotian Han³, Haobo Zhang³, Shuting Wang³

CMES-Computer Modeling in Engineering & Sciences, Vol.138, No.2, pp. 1103-1137, 2024, DOI:10.32604/cmes.2023.029177 - 17 November 2023

Abstract This paper aims to solve large-scale and complex isogeometric topology optimization problems that consume significant computational resources. A novel isogeometric topology optimization method with a hybrid parallel strategy of CPU/GPU is proposed, while the hybrid parallel strategies for stiffness matrix assembly, equation solving, sensitivity analysis, and design variable update are discussed in detail. To ensure the high efficiency of CPU/GPU computing, a workload balancing strategy is presented for optimally distributing the workload between CPU and GPU. To illustrate the advantages of the proposed method, three benchmark examples are tested to verify the hybrid parallel strategy More > Graphic Abstract

A Hybrid Parallel Strategy for Isogeometric Topology Optimization via CPU/GPU Heterogeneous Computing

Open Access

ARTICLE

Accelerating Falcon Post-Quantum Digital Signature Algorithm on Graphic Processing Units

Seog Chung Seo¹, Sang Woo An², Dooho Choi^3,*

CMC-Computers, Materials & Continua, Vol.75, No.1, pp. 1963-1980, 2023, DOI:10.32604/cmc.2023.033910 - 06 February 2023

Abstract Since 2016, the National Institute of Standards and Technology (NIST) has been performing a competition to standardize post-quantum cryptography (PQC). Although Falcon has been selected in the competition as one of the standard PQC algorithms because of its advantages in short key and signature sizes, its performance overhead is larger than that of other lattice-based cryptosystems. This study presents multiple methodologies to accelerate the performance of Falcon using graphics processing units (GPUs) for server-side use. Direct GPU porting significantly degrades performance because the Falcon reference codes require recursive functions in its sampling process. Thus, an… More >

Open Access

ARTICLE

Implementing Delay Multiply and Sum Beamformer on a Hybrid CPU-GPU Platform for Medical Ultrasound Imaging Using OpenMP and CUDA

Ke Song^1,*, Paul Liu², Dongquan Liu³

CMES-Computer Modeling in Engineering & Sciences, Vol.128, No.3, pp. 1133-1150, 2021, DOI:10.32604/cmes.2021.016008 - 11 August 2021

Abstract A novel beamforming algorithm named Delay Multiply and Sum (DMAS), which excels at enhancing the resolution and contrast of ultrasonic image, has recently been proposed. However, there are nested loops in this algorithm, so the calculation complexity is higher compared to the Delay and Sum (DAS) beamformer which is widely used in industry. Thus, we proposed a simple vector-based method to lower its complexity. The key point is to transform the nested loops into several vector operations, which can be efficiently implemented on many parallel platforms, such as Graphics Processing Units (GPUs), and multi-core Central… More >

Open Access

ARTICLE

Efficient Concurrent L1-Minimization Solvers on GPUs

Xinyue Chu¹, Jiaquan Gao^1,*, Bo Sheng²

Computer Systems Science and Engineering, Vol.38, No.3, pp. 305-320, 2021, DOI:10.32604/csse.2021.017144 - 19 May 2021

Abstract Given that the concurrent L1-minimization (L1-min) problem is often required in some real applications, we investigate how to solve it in parallel on GPUs in this paper. First, we propose a novel self-adaptive warp implementation of the matrix-vector multiplication (Ax) and a novel self-adaptive thread implementation of the matrix-vector multiplication (A^Tx), respectively, on the GPU. The vector-operation and inner-product decision trees are adopted to choose the optimal vector-operation and inner-product kernels for vectors of any size. Second, based on the above proposed kernels, the iterative shrinkage-thresholding algorithm is utilized to present two concurrent L1-min solvers from More >

Open Access

ARTICLE

Fast and High-Resolution Optical Inspection System for In-Line Detection and Labeling of Surface Defects

M. Chang^1,2,3, Y. C. Chou^1,2, P. T. Lin^1,2, J. L. Gabayno^2,4

CMC-Computers, Materials & Continua, Vol.42, No.2, pp. 125-140, 2014, DOI:10.3970/cmc.2014.042.125

Abstract Automated optical inspection systems installed in production lines help ensure high throughput by speeding up inspection of defects that are otherwise difficult to detect using the naked eye. However, depending on the size and surface properties of the products such as micro-cracks on touchscreen panels glass cover, the detection speed and accuracy are limited by the imaging module and lighting technique. Therefore the current inspection methods are still delegated to a few qualified personnel whose limited capacity has been a huge tradeoff for high volume production. In this study, an automated optical technology for in-line… More >

Open Access

ARTICLE

Local strong form meshless method on multiple Graphics Processing Units

G. Kosec^1,2, P. Zinterhof³

CMES-Computer Modeling in Engineering & Sciences, Vol.91, No.5, pp. 377-396, 2013, DOI:10.3970/cmes.2013.091.377

Abstract This paper deals with the implementation of the local meshless numerical method (LMM) on general purpose graphics processing units (GPU) in solving partial differential equations (PDE). The local meshless solution procedure is formulated in a way suitable for parallel execution and has been implemented on multiple GPUs. The implementation is tested on a solution of diffusion equation in a 2D domain. Different setups of the meshless approach regarding the selection of basis functions are tested on an interval up to 2.5 million of computational points. It is shown that monomials are a good selection of More >

Open Access

ARTICLE

Particle-based Fluid Flow Simulations on GPGPU Using CUDA

Kazuhiko Kakuda¹, Tsuyoki Nagashima¹, Yuki Hayashi¹, Shunsuke Obara¹, Jun Toyotani¹, Nobuya Katsurada², Shunji Higuchisup>2, Shohei Matsuda²

CMES-Computer Modeling in Engineering & Sciences, Vol.88, No.1, pp. 17-28, 2012, DOI:10.3970/cmes.2012.088.017

Abstract An acceleration of the particle-based incompressible fluid flow simulations on GPU using CUDA is presented. The particle method is based on the MPS (Moving Particle Semi-implicit) scheme using logarithmic-type weighting function to stabilize the spurious oscillatory solutions for the pressure fields which are governed by Poisson equation. The standard MPS scheme is widely utilized as a particle strategy for the free surface flow, the problem of moving boundary, multi-physics/multi-scale ones, and so forth. Numerical results demonstrate the workability and the validity of the present approach through dam-breaking flow problem. More >

Open Access

ARTICLE

Optimizations for Elastodynamic Simulation Analysis with FMM-DRBEM and CUDA

Yixiong Wei¹, Qifu Wang^1,2, Yingjun Wang¹, Yunbao Huang¹

CMES-Computer Modeling in Engineering & Sciences, Vol.86, No.3, pp. 241-274, 2012, DOI:10.3970/cmes.2012.086.241

Abstract In this study, we propose a novel method to accelerate the process of elastodynamic analysis in 3D problems with BEM (boundary element method). With applying the DRBEM (dual reciprocity boundary element method) to form new integral equations for reducing complexity;the modified FMM (fast multipole method)is introduced to simplify the computation process and save storage space by avoiding intermediate coefficientmatrices. At the same time, FMM-DRBEM is reprogrammed in parallel byapplying GPU with CUDA (Compute Unified Device Architecture)for improving efficiency further.The main features in this paper are: ( 1 )with respect to defects of classical method for More >

Open Access

ABSTRACT

CUDA Techniques in Computational Mechanics

Peng Wang

The International Conference on Computational & Experimental Engineering and Sciences, Vol.20, No.4, pp. 117-118, 2011, DOI:10.3970/icces.2011.020.117

Abstract Current trends in high performance computing (HPC) are moving towards the availability of several cores on the same chip of contemporary processors in order to achieve speed-up through the extraction of potential fine-grain parallelism of applications. The trend is led by GPUs, which have been developed exclusively for computational tasks as massively-parallel co-processors to the CPU. During 2010 an extensive set of new HPC architectural feature were developed in the third generation of NVIDIA GPUs (Fermi), giving computational mechanics an opportunity to expand use of GPU modelling and simulation.

This presentation will examine examples relevant More >

Displaying 1-10 on page 1 of 10. Per Page

View

340

Download

167

View

1867

Download

1082

View

2171

Download

1167

Like

1

View

3156

Download

2823

View

2360

Download

2077

View

3460

Download

2499

View

1938

Download

1469

View

1930

Download

1651

View

1747

Download

1453

View

1730

Download

1216

Further Information

Guidelines

Follow Us

Join Us

Contact Us

WhatsApp: