|Computer Modeling in Engineering & Sciences|
A Parallel Computing Schema Based on IGA
1Key Laboratory of Metallurgical Equipment and Control Technology Ministry of Education, Wuhan University of Science and Technology, Wuhan, 430081, China
2Hubei Key Laboratory of Mechanical Transmission and Manufacturing Engineering, Wuhan University of Science and Technology, Wuhan, 430081, China
∗Corresponding Author: Bingquan Zuo. Email: firstname.lastname@example.org
Received: 03 December 2021; Accepted: 27 January 2022
Abstract: In this paper, a new computation scheme based on parallelization is proposed for Isogeometric analysis. The parallel computing is introduced to the whole progress of Isogeometric analysis. Firstly, with the help of the “tensor-product” and “iso-parametric” feature, all the Gaussian integral points in particular element can be mapped to a global matrix using a transformation matrix that varies from element. Then the derivatives of Gauss integral points are computed in parallel, the results of which can be stored in a global matrix. And a middle layer is constructed to assemble the final stiffness matrices in parallel. The numerical example results show that: the method presented in the paper can reduce calculation time and improve the use rate of computing resources.
Keywords: Isogeometric analysis; parallel; Gaussian integration; middle layer
FEM usually require a fine, often multidimensional, discretization, resulting into several doffs and associated unknowns. The research on FEM acceleration has been developed in many aspects. One important way is the fast mesh generation method [1–3], because the meshing process takes up 75% of the time for FEM [4,5]. In order to keep the discrete accuracy using fewer elements, the adaptive mesh generation methods [6,7] are constructed. Besides, researchers also tried to increase the speed of discretization  integration [9,10], stiffness matrix assembling  and solving [12,13]. And most of these methods used parallel technology, especially the GPU [14,15]. In particular, A parallel FEM  framework based on the matrix-free technique has drawn much more attention than before, compared with the element-by-element (EBE)  and node-by-node (NBN) [18,19].
Isogeometric analysis(IGA) is a new numerical method for solving partial differential equations(PDE) , which was proposed by Hughes et al. in 2005 . Different from finite element method (FEM) , in IGA, models from CAD do not need to be transformed into meshes by approximating to the geometric model which can be a time-consuming progress. But more computation time is still required for complex models because the size of the final matrix from both IGA and FEM grows. Obviously, an efficient computing framework is important for them. On behalf of its meshfree characteristic, IGA saved much time on mesh generation, but some efforts still have been done to accelerate the computing process. Researchers firstly take account of different splines to optimize the mesh, such as B-spline, NURBS , T-splines [22,23] and so on. And based these splines, local-refine supported splines (e.g., HB/THB [24–26], LR-Spline [27–29], AST [30,31], PHT [29,32]...) are used to control the non-essential subdivision areas. As done in FEM, the integration is also considered in [33–35]. It should be noted that: the Gaussian integral points are quite important for integration, so reducing the points number  and optimizing the points value  are two main approach in integration for IGA. And a new way of integrating in  In any event, assembly remains a critical part of acceleration, so GPU was used for matrix assembling in IGA .
As shown above, the parallel computation shows its huge potential in accelerating for both FEM and IGA. Especially, as a newcomer in numerical methods, there are fewer research about parallel computation on IGA than others. Related research is mainly focused on the theory and application of IGA [20–22], and the advantages and compare with FEM. It is only in recent years that some work has been done to improve computing efficiency, and it is still in its infancy. Reference  proposed a new level set-based topology optimization (TO) method using a parallel strategy of Graphics Processing Units (GPUs) and IGA. Reference  proposed and investigate new robust preconditioners for space-time IGA of parabolic evolution problems. An innovative family of solution schemes based on domain decomposition methods (DDM) is proposed, that significantly reduces the computational cost in . Reference  proposed a solution strategy for rIGA for parallel distributed memory machines and compare the computational costs of solving rIGA vs IGA discretizations. Since most parallel strategies are only for one or two steps in IGA, it is particularly important to propose a complete parallel framework. In this paper, the process of IGA is reconstructed by combining the parallel algorithm of finite element, so that the whole process of IGA can be processed in parallel to save the time of calculation. The presented paper structure is as follows:
The first chapter mainly introduce related research background, such as IGA, calculate acceleration.
The second chapter introduces the basic concept of IGA based on NURBS.
The third chapter is the content of the Gaussian integral point mapping, parallel derivation and stiffness matrix of parallel assembly based on the middle layer.
The fourth chapter has carried on the contrast experiment in parallel acceleration and traditional geometric analysis such as time-consuming.
The fifth chapter is the summary.
2 NURBS and Isogeometric Analysis
This section mainly introduces some concepts of Non-Uniform Rational B-Splines (NURBS) and IGA with Poison equation as an example.
2.1 NURBS Theory
B-spline is defined as non-decreasing sequence where is called the base point . The one-dimensional B-spline basis of order 0 is composed of
The point sequence may contain repeated points. Defines a specific point sequence with repeated points in the point sequence. The ith B-spline basis with order p > 0 is definedas
The 2-d B spline basis is defined as the tensor product of two 1-d B spline basis
where p and q are the orders on u and v directions. The NURBS basis is defined as
where Wij is the weight of Rij.
2.2 Isogeometric Analysis
Both FEM and IGA have the same theoretical basis, which is the weak form of partial differential equations. As a typical elliptic partial differential equation, Poisson equation is commonly used in gravitational field, electromagnetic field, and thermodynamic field.
where Ω is a two-dimensional field bounded by ∂Ω, f(X) ∈ L2(Ω) : Ω→ R is the source function, U(X) is the unknown function. Transform Eq. (6) into a weak form
In which is zero boundary partial Ω experiment function. In IGA geometry, U(X) is estimated by . is a function of the constituent space of a NURBS basis. Define the mapping F from Ω to parameter domain Ω 0
where Rij is the tensor product of NURBS bases, and is the corresponding control point, , since F is a mapping matrix, F inverse exists, defined as
Now transform Eq. (7) to the parameter field
is the Jacobian matrix and is defined as
With instead of U(X), then test letter substitute in type (10), and then get linear equations
And .Once the linear equation has been solved, substitute Qinto Eq. (9) and the numerical solution is obtained.
The elements of K and b are calculated by Eqs. (8) and (9), respectively. Transform Eqs. (8) and (9) into the parameter field, and get
where is a Nurbs parameter space,
3 Process Refactoring Based on Parallel
The traditional solution process of IGA divides the calculation of Gaussian integral points into elements (see in Fig. 1), which is not suitable for parallel calculation. There are multi-layer cycles in the calculation of elements, which has a great impact on the calculation efficiency. In this chapter, the traditional IGA process is broken, the stiffness matrix is assembled into non-zero elements concentrated in the form of diagonal to accelerate the IGA solution process.
Not all NURBS models are highly continuous, in order not to affect the accuracy of the experiment and increase the universality of the experiment, this paper does not adopt any rule for the time being. The chart of the algorithm in this paper is shown in Fig. 2.
The segmentation proposed in this paper is based on the node vector features and kernel number of each direction of the model. In order to facilitate the subsequent calculation, the amount of computation for each kernel should be roughly the same. The first thing to consider is whether the number of node vector segments (npartu, npartv, and npartw) in each direction can be divisible by the kernel number mx. If it can be divisible, the number of segment segments in each direction is the kernel number mx. If it cannot be divisible, the number of segments is considered to have a common factor with the kernel number, and the kernel number is factorized. The number of segments in each direction is the common factor obtained by factorization. If it is not divisible and has no common factors, the element traversal is carried out in the directions of u, v and w, and each unit traversal is assigned to a core, so as to ensure that the calculation amount of each core is consistent. Each processor’s assigned cell is stored in a cell called gelement.
3.1 Parallel Calculation of the Derivative of Gaussian Integral Points
At first, the number of Gaussian integral points needed to be calculated for each kernel should be allocated. The Gaussian integral points should be divided into mx (Number of cores) regions of roughly the same size based on the number of cores in the computer.
According to the properties of NURBS, it is more efficient to calculate the values and derivatives of all non-zero basis functions at a point. It can be found that the calculation process at different points only depends on the interval value of NURBS. Because the interval values of NURBS are fixed, the calculations at different points are independent. Moreover, the number of Gaussian points in each non-zero computation domain is the same, and the computation amount of each Gaussian point is the same. The traditional IGA calculation of Gaussian integral points is carried out in elements, which cannot be carried out in parallel due to the inclusion of many other parameters in element calculation. Therefore, we separate the solution of Gaussian integral points from the calculation steps of element stiffness matrix, so that the calculation can be carried out in parallel.
The set of Gaussian points obtained from the above is stored in an element called rule, when calculating the Gaussian integral derivative points need to be in the element coordinates and weight projection to the corresponding physical element domain, then be able to establish a matrix Gaussnquad×3 to store the data and the coordinates of the Gaussian integral points as the result of the Gaussian integral points. Gaussnquad×3 includes two directions of information u, v and the weight W,
The Gaussian integral point can be mapped to an element in the form of matrix. GP can be obtained by multiplying the value of [–1, 1] by the element mapping matrix R, and the weight can be updated and stored in GP. For the below element, its mapping region in the u direction uleni = ui+1-ui, v direction vlenj = vj+1-vj.
When the Gaussian integral point is mapped to the whole world, the principle of graphics can be adopted to stretch and translate the low dimension by using the high-dimensional matrix, so the mapping matrix of the original Gaussian integral point is
The same is true for 3D models,
The information of Gaussian integral points GPn*nquad×4 can be obtained by mapping Gaussian integral points to elements through mapping matrix.
After the corresponding data is stored, Gaussian integral point derivative calculation is carried out. At this time, different processors need to be allocated to calculate the values of Gaussian integral points of different blocks. Assuming that the computer has mx processors, the number of threads is m, and each processor computs a quarter of the total number of nodes, nn/mx. The scale of the derivative of each point in each direction is the product of the order in each direction,
Algorithm 2 is the flow that each processor executes.
This paper combines the derivative of Gaussian integral points with parallel processing, and uses multi-threads to carry out parallel processing by using the characteristics of multi-core computer. The derivative of Gaussian integral points obtained by each thread is a vector product, the integral is stored as follows:
3.2 Parallel of Stiffness Matrix Based on Intermediate Layer
Because model by blocking rules is divided into nuclear number block at first, each processor has stored the relevant elements of the information. Each element can be found in the location of the global and numbered, so local stiffness matrix can be assembled by the loacl index. The global stiffness matrix can also be assembled by the loacl stiffness matrix in this way.
As is shown in Fig. 6, in order to reduce the overlap degree of the boundary of each region after partitioning, we should first consider the order of each node vector. The smaller the order is, the worse the continuity is. In order to facilitate comparison, we can upgrade the three direction node vectors to the same order, and then calculate the number of specific gravity nodes. Then according to the number of segments of the node vector with multiple nodes, the number of segments in each direction is determined. To block the node vector is to find the point where the maximum degree of overlap is located on the node vector. That is to say, the nodes with high degree of overlap can be marked while the order is raised, and then regions can be partitioned according to the marked positions, so as to reduce the border overlap degree of each sub-region after partitioning to the greatest extent.
The larger the sub-region is, the closer the individual region is to the whole model; the smaller the sub-region is, the closer the individual region is to a single element. In this paper, the middle layer is adopted for block processing. Simple adjustment in the subsequent application can change the block strategy to make the block region larger or smaller, see in Fig. 7.
Since the elements are numbered at the very beginning, a region index can be obtained according to the initial number when calculating regional stiffness, and regional stiffness can be assembled according to the region index, and then the obtained regional stiffness matrix is assembled into the overall stiffness matrix.
When assembling each zone stiffness matrix, the Jacobian matrix is obtained by taking derivatives which greatly increased the workload, making computing time extended.
It can be known from Eq. (7) that the NURBS basis function is obtained by multiplying the derivative of NURBS parameter points with the coordinates of control points
and the Jacobian is
Substituting the Jacobian matrix into Eqs. (2)–(8) through the above equation, the Jacobian matrix can be written in the form of derivative and coordinate multiplication. In this way, the derivation process of the Jacobian matrix is avoided, and the calculation consumption in each traversal element is reduced, so that the calculation time is greatly reduced.
Algorithm 4 is to seek the overall stiffness matrix program.
Experiments show that the calculated stiffness matrix may exist bandwidth is larger, the overall stiffness matrix is discrete, calculating is not convenient to follow-up the right item, it will need to get the stiffness matrix of the matrix transformation, the elements, to funnel non-zero element stiffness matrix of the diagonal, enables the calculation to saves time.
Regional stiffness matrix assembly was conducted by regional index, by the middle tier block fixing the control points of each element, determine its position in the regional block, then the element stiffness matrix according to fill in the corresponding position stiffness matrix, finally will regional stiffness matrix through the global index assembly into the whole stiffness matrix.The assembly process is as follows:
4 Analysis of Experimental Results
In this chapter, the algorithm is programed in Matlab2018b, and four different tests are carried out on one PC with i7-6700HQ (4 cores) and 16 GB memory.
4.1 2D Example
It is assumed that the inner wall of the thick-walled cylinder (inner diameter Ri, outer diameter Ro) is subjected to uniform radial pressure P, and the upper and lower bottom surfaces are axially fixed. Due to its symmetry, this paper simplifies the model to 1/4.
In order to compare the results under different degrees of freedom, P subdivision and H subdivision are used to process the original model. In this chapter, only the derivation of the parallel Gaussian integral point and only parallel stiffness matrix assembly are compared with the parallelism of the whole process to show the effect brought by the parallelism of each process. The degree of freedom is controlled by inserting different number of nodes into the original node vector, 21 nodes are inserted initially.
The derivative time of the parallel Gaussian integral point is shown as Table 1.
Time of parallel stiffness matrix assembly under H subdivision is shown as Table 2.
Time of the whole process in parallel under H subdivision is shown as Table 3.
As can be seen from the above chart, the advantages of the parallel algorithm are not obvious when the degree of freedom is low, because it takes time to start the parallel algorithm. With the increase of the degree of freedom, the advantages of the parallel algorithm gradually manifest, and the acceleration ratio tends to the number of processors.
Comparison of the two algorithms under different orders is made by statistics, the number of nodes inserted into the initial node vector is 41, and the order is 3, 2 as shown in Table 4.
Time of parallel stiffness matrix assembly under P subdivision is shown as Table 5.
Time of the whole process in parallel with different degree under P subdivision is shown as Table 6.
It can be clearly seen from the figure and table that as the order increases, the degree of freedom increases and the acceleration ratio approaches the kernel number.
After increasing the degree of freedom of the model, the two algorithms were compared, and it was found that whether the number of control points was increased by H subdivision or the order was increased by P subdivision, the parallel algorithm is always less time consuming than the traditional algorithm, the two algorithms are compared. It is found that the parallel algorithm takes less time than the traditional algorithm no matter the number of control points is increased by H subdivision or the order is increased by P subdivision. The calculation efficiency of stiffness matrix can be greatly improved by using multi-thread calculation.
4.2 3D Example
Another example in this paper is the model of the shape of a horseshoe. A load is applied on one side to observe its stress and another side is fixed.
The degree of freedom of the model is enhanced by H subdivision. In this example, the dichotomy method is adopted to insert nodes, shown as Table 7.
Time of parallel stiffness matrix assembly under H subdivision is shown as Table 8.
Time of the whole process in parallel under H subdivision is shown as Table 9.
The degree of freedom of the model is enhanced by P subdivision, ξ = [0.125:0.125:0.875], shown as Table 10.
Time of parallel stiffness matrix assembly under P subdivision is shown as Table 11.
Time of the whole process in parallel with different degree under P subdivision is shown as Table 12.
It can be seen from the above results that with the increase of degrees of freedom, the advantages of the parallel algorithm become more obvious, and when the degrees of freedom are similar, the advantages of the parallel algorithm become greater at high orders. It is found that the stiffness matrix takes up a large proportion of the calculation time in IGA.
In this paper, the solution of IGA is accelerated from three aspects. In the first step, the traditional isogeometric solution process is broken, and the Gaussian integral points are mapped to the global model through matrix by using the characteristics of model parameter elements. In the second step, the parallel algorithm PARFOR is used to calculate the derivatives of Gaussian integral points when calculating the derivative of Gaussian integral point, and the derivatives obtained by each kernel are stored in a single matrix. Thirdly, in the assembly process of the matrix, the model is divided into blocks according to the number of the cores and node vectors in each direction, and then different cores are assigned to each region for parallel calculation. Finally, the regional stiffness matrix is assembled by regional index, and the regional stiffness matrix is assembled by global index.
For the future work, it is necessary to focus on the reduction of Gaussian integral points combined with the existing parallel algorithm to further reduce the solution scale in the process of IGA calculation. In the solving process, the stiffness matrix can be divided into blocks and solved in parallel to further improve the solving efficiency.
Funding Statement: This research was supported by the Natural Science Foundation of Hubei Province (CN) (Grant No. 2019CFB693), the Research Foundation of the Education Department of Hubei Province (CN) (Grant No. B2019003) and the open Foundation of the Key Laboratory of Metallurgical Equipment and Control of Education Ministry (CN) (Grant No. 2015B14). The support is gratefully acknowledged.
Conflicts of Interest: The authors declare that they have no conflicts of interest to report regarding the present study.
|This work is licensed under a Creative Commons Attribution 4.0 International License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.|