LaNets: Hybrid Lagrange Neural Networks for Solving Partial Differential Equations
1 School of Computer Engineering and Science, Shanghai University, Shanghai, 200444, China
2 School of Science, Shanghai University, Shanghai, 200444, China
* Corresponding Author: Shihui Ying. Email:
(This article belongs to this Special Issue: Numerical Methods in Engineering Analysis, Data Analysis and Artificial Intelligence)
Computer Modeling in Engineering & Sciences 2023, 134(1), 657-672. https://doi.org/10.32604/cmes.2022.021277
Received 06 January 2022; Accepted 24 February 2022; Issue published 24 August 2022
AbstractWe propose new hybrid Lagrange neural networks called LaNets to predict the numerical solutions of partial differential equations. That is, we embed Lagrange interpolation and small sample learning into deep neural network frameworks. Concretely, we first perform Lagrange interpolation in front of the deep feedforward neural network. The Lagrange basis function has a neat structure and a strong expression ability, which is suitable to be a preprocessing tool for pre-fitting and feature extraction. Second, we introduce small sample learning into training, which is beneficial to guide the model to be corrected quickly. Taking advantages of the theoretical support of traditional numerical method and the efficient allocation of modern machine learning, LaNets achieve higher predictive accuracy compared to the state-of-the-art work. The stability and accuracy of the proposed algorithm are demonstrated through a series of classical numerical examples, including one-dimensional Burgers equation, one-dimensional carburizing diffusion equations, two-dimensional Helmholtz equation and two-dimensional Burgers equation. Experimental results validate the robustness, effectiveness and flexibility of the proposed algorithm.
In this paper, we consider partial differential equations (PDEs) of the general form in Eqs. (1.1)–(1.3):
where is a differential operator, is a subset of , and represents its boundary. u denotes the unknown function that needs to be solved, and represent initial and boundary conditions, respectively. Different governing equations and homologous initial/boundary conditions can describe many physical phenomena in nature, but it is practically difficult to find the analytical solutions. Therefore, more and more scholars have tried a variety of numerical methods to solve PDEs in recent years.
At first, traditional numerical methods, including finite element method , finite volume method  and finite difference method , were usually used to solve partial differential equations. Afterwards, with the rapid development of machine learning [4–6], the universal approximation ability of neural networks is considered to be helpful for obtaining approximated solutions of differential equations. Han et al. [7–9] proposed deep learning-based numerical approaches to solve variational problems, backward stochastic differential equations and high-dimensional equations. Then Chen et al.  extended their work to solve Navier-Stokes and Cahn-Hillard equation. Sirignano et al.  combined Galerkin method and deep learning to solve high-dimensional free boundary PDEs. Raissi et al.  proposed the physics-informed neural network (PINNs) framework which acts as the benchmark algorithm in this field. In PINNs, physical constraints are added to limit the space of solutions to improve the accuracy. Futhermore, many scholars carry out research based on this method. Dwivedi et al.  incorporated PINNs with extreme learning machine to solve time-dependent linear partial differential equations. Pang et al.  solved space-time fractional advection-diffusion equations by expanding PINNs to fractional PINNs. Raissi et al.  subsequently developed a physics-informed deep learning framework that is able to encode Navier-Stokes equations into the neural networks. Kharazmi et al.  constructed a variational physics-informed neural network to effectively reduce the training cost in network training. Yang et al.  proposed Bayesian physics-informed neural networks for solving forward and inverse nonlinear problems with PDEs and noisy data. Meng et al.  developed a parareal physics-informed neural network to significantly accelerate the long-time integration of partial differential equations. Gao et al.  proposed a new learning architecture of physics-constrained convolutional neural network to learn the solutions to parametric PDEs on irregular domains.
Neural network is a black box model, its approximation ability depends partly on the depth and width of the network, and thus too many parameters will cause a decrease in computational efficiency. One may use Functional Link Artificial Neural Network (FLANN)  model to overcome this problem. In FLANN, the single hidden layer of neural network is replaced by an expansion layer based on distinct polynomials. Mall et al.  used Chebyshev neural network to solve elliptic partial differential equations by replacing the single hidden layer of the neural network with Chebyshev polynomials. Sun et al.  replaced the hidden layer with Bernstein polynomials to obtain the numerical solution of PDEs as well. Due to the application of polynomials, neural network has no actual hidden layers, and the number of parameters is greatly reduced.
On the other hand, deep learning is a type of learning that requires a lot of data. The performance of deep learning depends on large-scale and high-quality sample sets but the cost of data acquisition is prohibitive. Moreover, sample labeling also needs to consume a lot of human and material resources. Therefore, a popular learning paradigm named Small Sample Learning (SSL)  has been used in some new fields. SSL refers to the ability to learn and generalize under a small number of samples. At present, SSL has been successfully applied in medical image analysis , long tail distribution target detection , remote sensing scene classification , etc.
In this paper, we integrate Lagrange interpolation and small sample learning with deep neural networks frameworks to deal with the problems in existing models. Specifically, we replace the first hidden layer of the deep neural network with a Lagrange block. Here, Lagrange block is a pre-processing tool for preliminary fitting and feature extraction of input data. The Lagrange basis function has a neat structure and strong expressive ability, so it is fully capable of better extracting detailed features of input data for feature enhancement. The main thought of Lagrange interpolation is to interpolate the function values of other positions between nodes through the given nodes, so as to make a prefitting behaviour without adding any extra parameters. Then, the enhanced vector is input to the subsequent hidden layer for the training of the network model. Furthermore, we add the residual of a handful of observations into cost function to rectify the model and improve the predictive accuracy with less label data. This composite neural network structure is quite flexible, mainly in that the structure is easy to modify. That is, the number of polynomials and hidden layers can be adjusted according to the complexity of different problems.
The structure of this paper is as follows. In Section 2, we present the introduction of Lagrange polynomials, the structure of the LaNets and the steps of algorithm. Numerical experiments for one-dimensional PDEs and two-dimensional PDEs are described in Section 3. Finally, conclusions are incorporated into Section 4.
2 LaNets: Theory, Architecture, Algorithm
In this section, we start with illustrations on Lagrange interplotation polymonials. After that, we discuss the framework of LaNets. And finally, we clarify the detatils of the proposed algorithm.
2.1 Lagrange Interpolation Polynomial
Lagrange interpolation is a kind of polynomial interpolation methods proposed by Joseph-Louis Lagrange, a French mathematician in the 18th century, for numerical analysis . Interpolation is an important method for the approximation of functions, which uses the value of a function at a finite point to estimate the approximation of the function at other points. That is, the continuous function is interpolated on the basis of discrete data to make the continuous curve pass through all the given discrete data points. Mathematically speaking, Lagrange interpolation can give a polynomial function that passes through several known points on a two-dimensional plane.
Assuming are the distinct points in the complex panels, and are the corresponding values at . The Lagrange polynomial corresponding to their degree not exceeding n is unique. Indeed, the uniqueness of arises from the fact that the difference of two such polynomials vanish at points without a degree greater than n. The following polynomial clearly possesses all the necessary properties in Eqs. (2)–(3):
Here, the polynomial is called Lagrange interpolation polynomial. And the distinct points are called the interpolation points. It can be seen that the corresponding Lagrange polynomial can be obtained by given value points: ,…,. The Lagrange interpolation polynomial obtained from only some points can replace the function to obtain the solution at any other points. The correctness of Lagrange polynomials has been proved in the literatures [27,28].
2.2 The Architecture of LaNets
Fig. 1 displays the structure of LaNets, which is composed of two main parts. One is a preprocessing part based on Lagrange polynomials, the other is the training of deep feedforward neural network. Thus, the LaNets model we designed is a joint feedforward neural network composed of input layer, Lagrange block, hidden layers and output layer. As described in Section 2.1, we can also write Lagrange interpolation polynomial in Eq. (4):
where and in the above formula correspond to the position of the independent variable and the value of function at this position, respectively. Here, we call the Lagrange interpolation basis function, and the expression of is as follows:
As shown in Fig. 1, the original input vector is extended to a new enhanced vector by Lagrange block primarily, and then sent to deep feedforward neural network for training. The black Lagrange block on the right shows the Lagrange interpolation basis functions , and visually. Spatio-temporal variables can be both handled with Lagrange basis functions. Actually, the proposed model not only increases the reliability and stability of the single-layer polynomial neural network, but also improves the predictive accuracy of the deep feedforward neural network without adding any extra parameters.
The problem we aim to solve is described as Eqs. (1.1)–(1.3). Following the original work of Raissi et al. , can be defined as Eq. (6):
We continue to approximate with the deep neural network , where represents the parameter set of the network. The model is then trained by minimizing the following compound loss function in Eq. (7):
Here, denotes the initial and boundary value, represents the small sample data on , and specifies the collocation points on . The first loss term satisfies the initial and boundary conditions, the second term corresponds to the residuals of the small sample data on the domain and the third term serves as a constraint on the governing equation itself. The optimization method used here is L-BFGS-B algorithm , which converges faster in calculations and has lower memory overhead.
An entire overview of this work is shown in Algorithm 1. In the algorithm description, we consider the spatio-temporal variables and t. Without a doubt, the proposed method is also applicable to time-independent partial differential equations, and related examples will be mentioned in the following experiments.
In this section, we verify the performance and accuracy of LaNets numerically through experiments with benchmark equations. In Subsection 3.1, we provide three typical one-dimensional time-dependent PDEs to validate the robustness and validity of the proposed algorithm. In Subsection 3.2, two-dimensional PDEs are shown to illustrate the reliability and stability of the method.
3.1 Numerical Results for One-Dimensional Equations
In this subsection, we demonstrate the predictive accuracy of our method on three one-dimensional time-dependent PDEs including Burgers equation, carburizing constant diffusion coefficient equation and carburizing variable diffusion coefficient equation.
We start with the following one-dimensional time-dependent Burgers equation in Eqs. (8.1)–(8.3):
where is the viscosity parameter. In this case, we take .
Here, the LaNets model consists of one Lagrange block and 7 hidden layers with 20 neurons in each layer. Lagrange block contains three Lagrange basis functions. By default, the Lagrange block is composed of three Lagrange basis functions unless otherwise specified. Fig. 2a illustrates the predicted numerical result of the Burgers equation, and the relative error measured at the end is . The loss curve vs. iteration is displayed in Fig. 2b. The mean square error loss decreases steadily, which illustrates the stability of the proposed method.
To further verify the effectiveness of the proposed algorithm, we compare the predicted solution with the analytical solution provided in the literature  at four timesnapshots, which are presented in Fig. 3. It seems that there is almost no difference between the predicted solution and the exact solution. Moreover, the sharp gap formed near time is also well captured.
A more detailed numerical result is summarized in Table 1. It has to be noted that the early work  serves as a benchmark. In order to observe the influence of a different number of small sample points in the algorithm, we add 50 small sample points each time to calculate the corresponding results. From Table 1, one can visually see that the error of the LaNets model is one order of magnitude lower than that of PINNs. In addition, we can clearly find that 50 sample points used here can achieve a higher predictive accuracy than the 300 sample points used in the benchmark model. It means that using less label data to get more accurate predicted results is achievable, thereby saving a lot of manpower and material resources and increasing computational efficiency.
3.1.2 Carburizing Diffusion Model
We consider the one-dimensional carburizing diffusion model  in Eqs. (9.1)–(9.4):
where represents the diffusion coefficient and u is the concentration of carbon. Here, l and r denote the left and right boundary of the model. Diffusion is a fundamental process of carburizing, and the diffusion coefficient is related to temperature, the content of alloy elements, systems, etc. Next we consider carburizing diffusion equation with constant and variable diffusion coefficient, respectively.
1. Constant diffusion coefficient
We start with the constant diffusion coefficient according to Eq. (10):
where , Q, R and are already given. In a practical sense, represents the pre-exponential factor, Q denotes the activation energy of carbon, R describes the gas constant and is the temperature during the carburizing process (K).
In this numerical experiment, we take , , and . The corresponding exact solution is written as Eq. (11):
where we have and . The terminal time T of this model is 36000, the left boundary l is 0, and the right boundary r is 2.5.
Regarding the training set, we take . Moreover, we employ a 8-layer LaNets to represent the solution in this simulation. The LaNets model contains one Lagrange block, six hidden layers with 20 hidden neurons per layer and one output layer. Here, the relative error is measured at .
In order to evaluate the performance of our algorithm in multiple ways, we compare the simulation results with the simulation results obtained by the PINNs model and our earlier model. The results for the three models are shown in Fig. 4. We can clearly see that the predicted solution of the PINNs model is not quite consistent with the exact solution, and the differences become more and more obvious over time. And it is here that the LaNets model fits more accurately than the benchmark model. Thus, the proposed method has obvious advantages in the long time simulation of time-dependent partial differential equations. A more intuitive error value obtained by three algorithms is listed in Table 2, from which we find that the predicted error of the benchmark model is one order of magnitude lower than PINNs. Meanwhile, the predicted error of LaNets when using 300 sample points is almost one order of magnitude lower than the benchmark model. The decline curve of the loss function in the training process is shown in the Fig. 5a. It can be seen that the loss has been declined to a small value in few iterations during the training process.
2. Variable diffusion coefficient
In this experiment, the carburizing diffusion coefficient varies with the temperature, systems and ratio of the element. Here, we consider , and add a source term as Eq. (12):
The analytical solution corresponding to this setting is Eq. (13):
In this example, we have , the left boundary , the right boundary and the ending time . Moreover, we use a 8-layer LaNets to denote the spatio-temporal solution . The curve of loss function during training is shown in the Fig. 5b.
Further, we make a contrast between the simulation results obtained by the proposed model and the benchmark model. The detailed results for them are displayed in Fig. 6. While all experimental results seem to be consistent with the analytical results, one can find that the predicted solution of the LaNets model is more closer to the exact solution. A more accurate error evaluation is summarized in Table 3, from which we see that the prediction error of the LaNets model is always lower than that of the benchmark model when using the same number of small sample points.
3.2 Numerical Results for Two-Dimensional Equations
In this section, we consider two-dimensional problems including the time-independent Helmholtz equation and time-dependent Burgers equation to verify the effectiveness of the LaNets model. These two types of two-dimensional problems aim to demonstrate the generalization ability of our methods.
In this example, we consider a time-independent two-dimensional Helmholtz equation as Eq. (14):
with homogeneous Dirichlet boundary conditions and the source function is given by Eq. (15):
Here, we take and the analytical solution is Eq. (16):
The training set of this example is generated according to the exact solution in the above equation. The problem is solved using the 4-layer LaNets model on the domain . And each hidden layer consists of 40 hidden neurons. The relative error measured is . The training set is specified as .
The visual comparison among LaNets, benchmark and PINNs results is displayed in Fig. 7. From Fig. 7, we find that the predicted solution of the benchmark model is not quite consistent with the exact solution. In addition, the proposed model is more accurate than the PINNs model especially on the boundary. Detailed error values for the three models are shown in Table 4, from which we see that the predicted error of the LaNets model is always minimal. The loss curve during the training process is shown in Fig. 8a. From Fig. 8a, we see that the value of the loss decreases continuously and smoothly from a higher value to a lower value, which shows the stability and robustness of the proposed model.
In the last experiment, we consider a two-dimensional time-dependent Burgers equation as Eq. (17):
where u represents the predicted spatio-temporal solution. The corresponding initial and boundary conditions are given by Eq. (18):
In this example, we take and . The training set is generated by the exact solution Eq. (18), which is utilized to assess the accuracy of our method. The computing domain is set to . We apply an 8-layer LaNets model and each hidden layer consists of 20 neurons. The residual training points are 20000 and the initial and boundary points are 150 whereas the points are 300.
The decline curve of the loss function is shown in Fig. 8b. It can be seen that the loss value drops steadily to a small value over fewer iterations. Fig. 9 displays the 3D plot of the solution at , and the relative error calculated is . The experiment of the two-dimensional time-dependent Burgers equation proves that the proposed method can effectively solve high-dimensional time-dependent PDEs. In theory, the LaNets model can solve PDEs in arbitrary dimensions, and the remaining research is left for future work. The detailed relative errors obtained by LaNets, benchmark and PINNs are given in Table 5, from which we can know that the predicted error of LaNets is lower than that of the benchmark model and PINNs model.
In this paper, we propose hybrid Lagrange neural networks called LaNets to solve partial differential equations. We first perform Lagrange interpolation through Lagrange block in front of deep feedforward neural network architecture to make pre-fitting and feature extraction. Then we add the residuals of small sample data points in the domain into the cost function to rectify the model. Compared with the single-layer polynomial network, LaNets greatly increase the reliability and stability. And compared with general deep feedforward neural network, the proposed model improves the predictive accuracy without adding any extra parameters. Moreover, the proposed model can obtain more accurate prediction with less label data, which makes it possible to save a lot of manpower and material resources and improve computational efficiency. A series of experiments demonstrate the effectiveness and robustness of the proposed method. In all cases, our model shows smaller predictive errors. The numerical results verify that the proposed method improves the predictive accuracy, robustness and generalization ability.
Acknowledgement: This research was supported by NSFC (No. 11971296), and National Key Research and Development Program of China (No. 2021YFA1003004).
Funding Statement: The authors received no specific funding for this study.
Conflicts of Interest: The authors declare that they have no conflicts of interest to report regarding the present study.
- Taylor, C. A., Hughes, T. J., & Zarins, C. K. (1998). Finite element modeling of blood flow in arteries. Computer Methods in Applied Mechanics and Engineering, 158(1–2), 155-196. [Google Scholar] [CrossRef]
- Eymard, R., Gallouët, T., & Herbin, R. (2000). Finite volume methods. Handbook of Numerical Analysis, 7, 713-1018. [Google Scholar] [CrossRef]
- Zhang, Y. (2009). A finite difference method for fractional partial differential equation. Applied Mathematics and Computation, 215(2), 524-529. [Google Scholar] [CrossRef]
- Jordan, M. I., & Mitchell, T. M. (2015). Machine learning: Trends, perspectives, and prospects. Science, 349(6245), 255-260. [Google Scholar] [CrossRef]
- Lake, B. M., Salakhutdinov, R., & Tenenbaum, J. B. (2015). Human-level concept learning through probabilistic program induction. Science, 350(6266), 1332-1338. [Google Scholar] [CrossRef]
- LeCun, Y., Bengio, Y., & Hinton, G. (2015). Deep learning. Nature, 521(7553), 436-444. [Google Scholar] [CrossRef]
- Han, J., Jentzen, A., & Weinan, E. (2018). Solving high-dimensional partial differential equations using deep learning. Proceedings of the National Academy of Sciences, 115(34), 8505-8510. [Google Scholar] [CrossRef]
- Weinan, E., Han, J., & Jentzen, A. (2017). Deep learning-based numerical methods for high-dimensional parabolic partial differential equations and backward stochastic differential equations. Communications in Mathematics and Statistics, 5(4), 349-380. [Google Scholar] [CrossRef]
- Han, J., Zhang, L., & Weinan, E. (2019). Solving many-electron schrödinger equation using deep neural networks. Journal of Computational Physics, 399, 108929. [Google Scholar] [CrossRef]
- Chen, Y., He, Q., Mei, M., & Shi, X. (2018). Asymptotic stability of solutions for 1-D compressible navier–Stokes–Cahn–Hilliard system. Journal of Mathematical Analysis and Applications, 467(1), 185-206. [Google Scholar] [CrossRef]
- Sirignano, J., & Spiliopoulos, K. (2018). DGM: A deep learning algorithm for solving partial differential equations. Journal of Computational Physics, 375, 1339-1364. [Google Scholar] [CrossRef]
- Raissi, M., Perdikaris, P., & Karniadakis, G. E. (2019). Physics-informed neural networks: A deep learning framework for solving forward and inverse problems involving nonlinear partial differential equations. Journal of Computational Physics, 378, 686-707. [Google Scholar] [CrossRef]
- Dwivedi, V., & Srinivasan, B. (2020). Physics informed extreme learning machine (PIELM)–A rapid method for the numerical solution of partial differential equations. Neurocomputing, 391, 96-118. [Google Scholar] [CrossRef]
- Pang, G., Lu, L., & Karniadakis, G. E. (2019). fPINNs: Fractional physics-informed neural networks. SIAM Journal on Scientific Computing, 41(4), A2603-A2626. [Google Scholar] [CrossRef]
- Raissi, M., Yazdani, A., & Karniadakis, G. E. (2020). Hidden fluid mechanics: Learning velocity and pressure fields from flow visualizations. Science, 367(6481), 1026-1030. [Google Scholar] [CrossRef]
- Kharazmi, E., Zhang, Z., Karniadakis, G. E. (2019). Variational physics-informed neural networks for solving partial differential equations. arXiv preprint arXiv:1912.00873.
- Yang, L., Meng, X., & Karniadakis, G. E. (2021). B-PINNs: Bayesian physics-informed neural networks for forward and inverse PDE problems with noisy data. Journal of Computational Physics, 425, 109913. [Google Scholar] [CrossRef]
- Meng, X., Li, Z., Zhang, D., & Karniadakis, G. E. (2020). PPINN: Parareal physics-informed neural network for time-dependent PDEs. Computer Methods in Applied Mechanics and Engineering, 370, 113250. [Google Scholar] [CrossRef]
- Gao, H., Sun, L., & Wang, J. X. (2021). Phygeonet: Physics-informed geometry-adaptive convolutional neural networks for solving parameterized steady-state PDEs on irregular domain. Journal of Computational Physics, 428, 110079. [Google Scholar] [CrossRef]
- Pao, Y. H., & Phillips, S. M. (1995). The functional link net and learning optimal control. Neurocomputing, 9(2), 149-164. [Google Scholar] [CrossRef]
- Mall, S., & Chakraverty, S. (2017). Single layer Chebyshev neural network model for solving elliptic partial differential equations. Neural Processing Letters, 45(3), 825-840. [Google Scholar] [CrossRef]
- Sun, H., Hou, M., Yang, Y., Zhang, T., & Weng, F. (2019). Solving partial differential equation based on bernstein neural network and extreme learning machine algorithm. Neural Processing Letters, 50(2), 1153-1172. [Google Scholar] [CrossRef]
- Kitchin, R., & Lauriault, T. P. (2015). Small data in the era of big data. GeoJournal, 80(4), 463-475. [Google Scholar] [CrossRef]
- Shin, H. C., Roth, H. R., Gao, M., Lu, L., & Xu, Z. (2016). Deep convolutional neural networks for computer-aided detection: CNN architectures, dataset characteristics and transfer learning. IEEE Transactions on Medical Imaging, 35(5), 1285-1298. [Google Scholar] [CrossRef]
- Ouyang, W., Wang, X., Zhang, C., Yang, X. (2016). Factors in finetuning deep model for object detection with long-tail distribution. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 864–873. Las Vegas, America.
- Fang, Z., Li, W., Zou, J., Du, Q. (2016).July Using CNN-based high-level features for remote sensing scene classification. 2016 IEEE International Geoscience and Remote Sensing Symposium (IGARSS), pp. 2610–2613. Beijing, China.
- Meijering, E. (2002). A chronology of interpolation: From ancient astronomy to modern signal and image processing. Proceedings of the IEEE, 90(3), 319–342. DOI 10.1109/5.993400. [CrossRef]
- Higham, N. J. (2004). The numerical stability of barycentric Lagrange interpolation. IMA Journal of Numerical Analysis, 24(4), 547-556. [Google Scholar] [CrossRef]
- Berkani, M. S., Giurgea, S., Espanet, C., Coulomb, J. L., & Kieffer, C. (2013). Study on optimal design based on direct coupling between a FEM simulation model and L-BFGS-B algorithm. IEEE Transactions on Magnetics, 49(5), 2149-2152. [Google Scholar] [CrossRef]
- Yang, X., Ge, Y., & Zhang, L. (2019). A class of high-order compact difference schemes for solving the burgers’ equations. Applied Mathematics and Computation, 358, 394-417. [Google Scholar] [CrossRef]
- Li, Y., & Mei, F. (2021). Deep learning-based method coupled with small sample learning for solving partial differential equations. Multimedia Tools and Applications, 80(11), 17391-17413. [Google Scholar] [CrossRef]
- Xia, C., Li, Y., & Wang, H. (2018). Local discontinuous galerkin methods with explicit runge-kutta time marching for nonlinear carburizing model. Mathematical Methods in the Applied Sciences, 41(12), 4376-4390. [Google Scholar] [CrossRef]