|Journal of Renewable Materials|
The Estimation of the Higher Heating Value of Biochar by Data-Driven Modeling
1Key Laboratory of Poyang Lake Environment and Resource Utilization, Ministry of Education and School of Resources, Environmental & Chemical Engineering, Nanchang University, Nanchang, China
2School of Energy Science and Engineering, Central South University, Changsha, China
3Department of Chemical and Biomolecular Engineering, National University of Singapore, Singapore
4Department of Agricultural Engineering, Cairo University, Giza, Egypt
*Corresponding Authors: Wenguang Zhou. Email: firstname.lastname@example.org; Lijian Leng. Email: email@example.com
Received: 06 August 2021; Accepted: 22 September 2021
Abstract: Biomass is a carbon-neutral renewable energy resource. Biochar produced from biomass pyrolysis exhibits preferable characteristics and potential for fossil fuel substitution. For time- and cost-saving, it is vital to establish predictive models to predict biochar properties. However, limited studies focused on the accurate prediction of HHV of biochar by using proximate and ultimate analysis results of various biochar. Therefore, the multi-linear regression (MLR) and the machine learning (ML) models were developed to predict the measured HHV of biochar from the experiment data of this study. In detail, 52 types of biochars were produced by pyrolysis from rice straw, pig manure, soybean straw, wood sawdust, sewage sludge, Chlorella Vulgaris, and their mixtures at the temperature ranging from 300 to 800°C. The results showed that the co-pyrolysis of the mixed biomass provided an alternative method to increase the yield of biochar production. The contents of ash, fixed carbon (FC), and C increased as the incremental pyrolysis temperature for most biochars. The Pearson correlation (r) and relative importance analysis between HHV values and the indicators derived from the proximate and ultimate analysis were carried out, and the measured HHV was used to train and test the MLR and the ML models. Besides, ML algorithms, including gradient boosted regression, random forest, and support vector machine, were also employed to develop more widely applicable models for predicting HHV of biochar from an expanded dataset (total 149 data points, including 97 data collected from the published literature). Results showed HHV had strong correlations (|r| > 0.9, p < 0.05) with ash, FC, and C. The MLR correlations based on either proximate or ultimate analysis showed acceptable prediction performance with test R2 > 0.90. The ML models showed better performance with test R2 around 0.95 (random forest) and 0.97–0.98 before and after adding extra data for model construction, respectively. Feature importance analysis of the ML models showed that ash and C were the most important inputs to predict biochar HHV.
Keywords: Biochar; higher heating value; machine learning; prediction; proximate analysis; ultimate analysis
Thermochemical conversion of biomass is one of the optional pathways to overcome the energy crisis, environmental pollution, and sustainable development issues of the world. There are about 13 billion tons of biomass resources per year on the planet that can be used . Significant momentum has been attained in the use of renewable biomass as an alternative to traditional fossil fuels in the energy application fields . However, the characteristics of raw biomass, such as high moisture content, large volume, low energy density, and low combustion calorific value, are several significant problems upon its use as fuel. As a solid product produced by the thermochemical conversion of biomass at temperatures below 900°C and oxygen-limited environment. Biochar has excellent potential for application in the energy storage , and biochar has a lower moisture content, higher energy density, higher stability, and more accessible transportation than biomass. Moreover, it is necessary to grasp the fuel properties (i.e., higher heating value (HHV)) of biochar for its application in the energy field. Generally, the basic principles of HHV analysis were straightforward, and they could be experimentally determined by the integrated system offered by the manufacturers. The HHV of the biochar samples can be determined by measuring the enthalpy difference of the sample before and after the reaction in an adiabatic oxygen bomb. However, the use of instruments to determine the properties of biochar has some disadvantages, such as high cost and time-consuming. Therefore, it is necessary and economical to develop the HHV prediction model based on some common characteristic indexes.
The ultimate and proximate results have been used to predict the carbon sequestration potential (stability)  and HHV . And a large number of traditional multi-linear regression (MLR) models have been built and studied to predict the HHV of municipal solid waste, coal, biomass, etc. . Mateus and coworkers developed a highly accurate HHV linear regression prediction model (R2 = 0.9997) based on the ultimate analysis of bio-oil produced by liquefaction . In addition to the traditional MLR method widely used by researchers to predict HHV, many researchers have used artificial intelligence algorithm technology to predict HHV. These artificial intelligence algorithms can do well with both linear and nonlinear relationships between the input and target variables. Samadi and coworkers used the gradient boosting regression (GBR) algorithm to predict the HHV of biomass according to different training parameters (i.e., stochasticity, tree size, and learning rate), and the obtained model had good prediction performance (R2 = 0.93) . Xing and coworkers  trained biomass HHV prediction models using empirical correlation, random forest (RF), support vector machine (SVM), and artificial neural network algorithms respectively by proximate and ultimate analysis data. Between the two types of data models of proximate analysis and ultimate analysis, RF (R2 = 0.962) and SVM (R2 = 0.953) have showed satisfactory predictive performance.
However, the studies focused on the prediction of HHV from the basic properties of biochar are limited. In addition, as a kind of solid fuel, biochar is very different from other natural materials, which impedes the application of existing models to predict the HHV of biochar. In this light, the aim of this research was to develop accurate models for the prediction of the HHV of biochar through GBR, RF, SVM algorithm, and linear regression method. In this study, 52 biochar samples were obtained from our experiments to explore the relationship between biochar characteristics and establish the HHV prediction models. Then, models were optimized by adding 97 data points from the published studies into the initial sample dataset. Finally, the predictive performance measures and relative importance analysis were conducted to evaluate the models.
2 Materials and Methods
2.1 Materials and Sample Preparation
A total of 52 biochar samples were produced through the pyrolysis process under various conditions from four different representative biomass species, including agricultural residues, algae, animal manure, and sludge. The rice straw (RS), pig manure (PM), soybean straw (SS), and wood sawdust (WD) were collected from local farmers in Jiangxi Province, China. The Chlorella Vulgaris (CL) and sewage sludge (SW) were provided from a biotechnology company in Shanxi Province of China and a municipal wastewater treatment plant in Jiangxi Province of China, respectively. Two types of biomasses were selected and mixed with a mass ratio of 1:1 to form the mixture, and the mixture was named as an abbreviated combination of the two kinds of biomass. All six biomasses and the mixtures were pulverized to a particle size of fewer than 350 microns and dried at 105°C for two hours to constant weight. During pyrolysis, the biomass was placed in oxygen-free reactors, and the temperature was increased from room temperature to target temperatures (ranged from 300 to 800°C, in 100°C intervals) with the heating rate of 10 °C/min. The residence time at the target temperature was one hour.
2.2 Analysis of Biochar
The ash, volatile matter (VM), and fixed carbon (FC) of biochars were analyzed following the procedures in previous study , which was carried out in an automatic proximate analyzer. The elemental composition (i.e., C, H, N, and S) of biochars was determined using an ultimate analyzer system (Elementar Analysensysteme GmbH, Vario EL III), and the O content was calculated by difference (O = 100 − C − H − N − S − ash). The HHV of biochars was determined by an automatic calorimeter (ZDHW-9000C, HB-Huanuo, China). About 1.0 g of the biochar sample was placed in an oxygen bomb calorimeter filled with excess oxygen. The HHV of the biochar was obtained after comparison with the heat capacity of the standard substance of benzoic acid (GBW130035, National Institute of Metrology, China) before and after burning and correction of additional heating such as point heating .
2.3 Model Construction
The MLR equations based on the ultimate and proximate analysis data were carried out by using Origin 2021b. In this research, three ML models (GBR, RF, SVM) were developed to predict the HHV of biochar through the scikit-learn Python library. GBR is a powerful non-parametric method for prediction. The algorithm principle of GBR is to train a new prediction tree in turn, and learn errors from the previous tree (n – 1) to get a new tree (n) with lower prediction errors [8,12,13]. RF has statistical advantages such as the low risk of overfitting and fewer parameters to be specified and can be used to deal with nonlinear and linear relationships between variables . SVM is a ML algorithm that uses the nonlinear kernel function to map the initial training samples to the high-dimensional feature space, thus transforming the problem from nonlinear to linear and obtaining the optimal solution . All of the input data were normalized according to the studies of Li et al. [15,16]. The ratio of the training dataset to the test dataset was 8:2, and cross validation was carried out to avoid bias in the training process. The performance of MLR and ML models in this study was evaluated in terms of coefficient of determination (R2), mean absolute error (MAE), and root mean square error (RMSE). The calculations of R2, MAE, and RMSE are defined as follows:
where HHVexperimental, HHVpredicted, and are the experimental, predicted, and the average of experimental values of the HHV, respectively, and n is the number of sample data points used for the regression analysis. MAE and RMSE could estimate the value of error between the experimental and predicted HHV. And R2 can be used to determine the degree of goodness of the proposed correlations. The higher the R2 and lower the RMSE and MAE meant the better the model accuracy [15,17].
3 Results and Discussion
3.1 Biochar Production and Characterization
Biochar is the pyrogenic carbonaceous material, and the formation of biochar is a process of continuous decomposition and recombination of macromolecules. With the rise of pyrolysis temperature, the category of biochar gradually evolved from transition char to turbostratic char , and the stability of biochar was also improved . As shown in Fig. 1a, the yield of biochar decreases with the increase of pyrolysis temperature. Similar results were also verified by Zhao et al.  that pyrolysis temperature had a significant (p < 0.05) effect on the biochar yield. The co-pyrolysis of the biomass mixture could improve the yield of most biochar (compared to the theoretical yield calculated by considering no interaction between biomasses), it would provide a feasible scheme for enhancing the yield of biochar. In the data composed of ultimate analysis (Tables 1 and S1 in supporting information), biochars derived from PM at the 400°C∼ had lower C content than that of feedstocks, and the similar results could be found in the study of Gascó et al.  Biochar derived from PM may contain aromatic carbons that were difficult to burn during the ultimate analysis process, so its C results would be underestimated . C was the main component of biochar, with the content ranged from 11.38% to 85.17%. The C content of biochar did not increase with the increase of pyrolysis temperature, which was different from the research results of using three lignin-like biomass . A portion of the unstable C in biochar was converted to stable C during biochar formation, and the variation was not fixed due to the difference in the biomass composition [10,20]. O content ranked the second in most biochar samples with the mean values 7% (Fig. 2), which was related to the complex pyrolysis mechanism of biochar and the biochemical composition of each biomass. The loss of O and H was mainly due to breaking weak O bonds during biochar formation . S was not detected in all samples, so the content of S of all samples below was recorded as 0. The O/C of all biochar was less than 0.4 (Fig. 1b), meaning the half-life of these biochars O/C ≤ 0.6 can exceed 100 years . The greater stable biochar would have the more favorable storage and transport advantages in its use as fuel.
In the ternary diagram (Fig. 1c) of proximate analysis, with the rise of pyrolysis temperature, the FC of biochar increased to close to 100%, which is the same as our previous research results . As expected, ash remains in the solid as the temperature increases despite the thermal decomposition of organic matter . And VM showed a decrease with the increasing pyrolysis temperature. According to the proximate analysis (Fig. 2), FC and Ash were the two main components of the experimental data, with mean values of 43% and 42%, respectively. CL400 had the highest VM, which was 33.83%. The HHV of some biochar (Table S1) produced at low temperature was higher than that of biomass (Table 1), and the high energy density would be one of the advantages for its use as solid fuel. With the increase of pyrolysis temperature, the HHV of biochar would decrease. The average HHV of the 52 biochar samples was 16.38 MJ/kg, and the HHV value of more than half of the samples was greater than 16.38 MJ/kg (Fig. 2). The HHV of biochar derived from PM and SW was low, and the maximum HHV was only 12.86 MJ/kg (PM300), which was related to their high ash content. All biochars produced by the co-pyrolysis of WD and CL have the greater HHV than that of the theoretical value (average HHV of the biochar derived from two feedstocks at the same temperature). These results may be caused by the changes of the ash and indicated that WD and CL had a synergistic effect in the process of co-pyrolysis . Biochar derived by biomass mixture had higher HHV than the theoretical HHV of mixed-biomass biochar, it would provide a feasible scheme for improving the HHV of biochar. Pyrolysis of biomass is a process in which char gradually evolves into pure carbon, and as this process occurs (the increase of temperature), the C–H, C–O, and O–H contained in the biochar were gradually eliminated, and the energy structure of the biochar becomes stable with aromatic resonances and π–π stacking of graphitic sheets .
The Pearson correlation coefficients (r) between the basic experimental data were shown in Fig. 1d. HHV had significant correlations (|r| > 0.36, p < 0.05) with other indicators except VM, and H/C. HHV positively (0.44 < r < 0.94, p < 0.05) correlated with FC, C, H, N and O and negatively (−0.96 < r < −0.48, p < 0.05) correlated with ash, VM/FC and H/C. These result were generally in consistent with the results of previous studies [29,30]. In particular, HHV was strongly (|r| > 0.9, p < 0.05) correlated with ash (r = −0.97, p < 0.05), FC (r = 0.90, p < 0.05), and C (r = 0.94, p < 0.05). Therefore, lower ash content and higher C and FC contents of biomass mean higher biochar HHV.
3.2 HHV Prediction Using Data of This Study
3.2.1 HHV Prediction by MLR Equations for Biomass and Coal Developed in Previous Studies
Many models had been developed in the published studies for the prediction of HHV of biomass and coal, among them the Dulong formula (HHVDulong = 0.3383 × C + 1.443 × H − 0.1804 × O + 0.0942 × S) and the Milne formula (HHVMilne = 0.3410 × C + 1.322 × H − 0.1200 × O − 0.1200 × N − 0.0153 × ash) were widely used for biomass and coal, respectively [31,32]. In this study, the HHV of biochar (test dataset, n = 11) was calculated according to the Dulong formula and the Milne formula, and the performance of predicted equations is shown in Figs. 3a and 3b. Most of the predicted data were within a 20% margin of error. There were some data predicted by the Dulong formula and the Milne formula mapped outside of the region, indicating whose error was more than 20%. Compared with the Dulong formula, the Milne formula introduced ash for HHV calculation, and its prediction performance (R2 = 0.9204, MAE = 0.1280, RMSE = 1.9053) was better than that of the Dulong formula (R2 = 0.8892, MAE = 0.1471, RMSE = 2.2483). Although the Milne formula predicted HHV by combining the results of ultimate analysis and proximate analysis, one of the predicted results of the Milne formula was still outside the prediction error range of 20%. The proximate analysis results from various test methods  and the thermal behaviors differences among biomass, coal, and biochar  may be the two major reasons for the inaccurate prediction results of these . It is necessary to build the prediction model based on the basic property data to predict the biochar HHV more accurately.
3.2.2 Biochar HHV Prediction by MLR Equations Constructed by This Study
As shown in Table 2, Eq. (5) had the best prediction performance (training R2 = 0.9327, training MAE = 0.0876, training RMSE = 1.8385) among the equations based on the training dataset containing ultimate and proximate compositions. And all equations (only except for Eq. (2), test R2 > 0.9191) had good generalization ability (test R2 > 0.9204) compare with that of the Dulong formula (test R2 = 0.8892) and the Milne formula (test R2 = 0.9204). Compared with the test predictive performance of Eq. (1) (test R2 = 0.9449, test MAE = 0.0812, test RMSE = 1.5848), the test predictive performance of Eq. (3) (test R2 = 0.9529, test MAE = 0.0779, test RMSE = 1.4656) with the additional introduction of H as a new independent variable was slightly improved. This was the same as the result of Pearson correlation analysis (Fig. 1d). H had a significantly weak correlation (r = 0.37, p < 0.05) with HHV. The range of the experimental proximate analysis data used as the independent variable for the development of models was 1.87% ≤ ash ≤ 86.11%, 3.17% ≤ VM ≤ 54.22%, and 6.00% ≤ FC ≤ 91.99% (Fig. 2), and these equations are based on the proximate analysis and had good training predictive performance (training R2 > 0.9). The equations (Eqs. (7)–(9), training R2 = 0.9215) had a stronger correlation than the monadic equation (Eq. (6), training R2 = 0.9207) of ash, which was the same as the conclusion of Qian et al. . Eq. (10), which was composed of ash and VM/FC, had the good test predictive ability of HHV and was a little bit worse than Eq. (6) with lower test R2 (0.9694 vs. 0.9726), test MAE (0.0612 vs. 0.0587), and test RMSE (1.1816 vs. 1.1184). In Eqs. (11) and (12), ash, FC, and C were strongly correlated (|r| > 0.9, p < 0.05, Fig. 1d) with HHV, and their training predictive performance was not improved compared with Eqs. (5) and (10) (Table 2).
The comparison of Eqs. (5) and (12) between predicted and experimental HHV is shown in (Figs. 3c and 3d). The data points were basically distributed around the Y = X equation line, and there was no outlier outside 20% error. Though the same test MAE of 0.0741 was achieved from both Eqs. (5) and (12) had the higher test R2 and lower test RMSE than that of Eq. (5), expressing better prediction and generalization ability. It could also be seen that the generalization ability of each equation was different, and its prediction in the test dataset and the training dataset would also show different amplitude changes.
3.2.3 Biochar HHV Prediction by ML Models Constructed in This Study
The performance of the ML models was validated with the test dataset shown in Fig. 4. Obviously, the training data and test data were distributed around the Y = X function graph, which intuitively illustrated the accuracy and reliability of the three ML models. Compared with the MLR prediction models (Eqs. (5) and (10), Table 2), the predictive performance of the training dataset was enhanced when three ML algorithms were applied to predict the HHV of biochar. In the training dataset, the GBR model had great predictive ability (R2 = 1.00, MAE = 0.22, RMSE = 0.24). In the test dataset, the lower R2 (0.93) of the GBR model is found with MAE = 1.33, and RMSE = 1.74. The modeling process of the GBR algorithm based on boost theory was a process in which the prediction error decreases continuously. In the study of Samadi et al. , the prediction performance of HHV prediction model based on GBR (R2 = 0.93) algorithm was better than that of models of genetic programming (R2 = 0.90) and artificial neural networks (R12 = 0.88, R22 = 0.89). For the RF model, the training dataset also showed good predictive performance with R2 = 0.98, MAE = 0.68, and RMSE = 0.88. Moreover, the better performance of the RF model was found in the test dataset with R2 = 0.95, MAE = 1.12, RMSE = 1.45. The excellent prediction performance of the RF model was related to the principle of its algorithm. As a ML algorithm that adopted an integrated learning method, RF had better robustness in the learning process and a lower risk of overfitting and noisy data than other ensemble learning models .
In the analysis of relative importance (RI), FC, ash, and C were the main influencing features among studied factors (Fig. 5). FC (RI = 0.3569) and ash (RI = 0.4993) were the most important features in HHV prediction based on GBR and RF, respectively. This result was consistent with the Pearson correlation analysis results in Section 3.1. The C of biochar was the main component of its ultimate composition, and C was the main energy supplier in the combustion process of biochar. FC and ash content had highly positively correlation (r = 0.90, p < 0.05, Fig. 1d) and negatively correlation (r = −0.97, p < 0.05, Fig. 1d) with HHV, respectively. However, the relative importance of all features is not completely consistent with the results of Pearson correlation analysis, especially the contribution of O to HHV in the GBR (RI = 0.0013) and RF (RI = 0.0035) models. The reason may be that Pearson correlation is the linear correlation between each feature and HHV, while the relative importance obtained from the ML model included both the linear and nonlinear correlations . In order to improve the generalization ability and broaden the applicability of the ML models, the wider range of dataset would help improve the model.
3.3 Optimized HHV Prediction by Additional Dataset
It is worth noting that ML algorithms are data-driven artificial intelligence algorithms. In the application of regression prediction, the more amount of input data, the broader the applicability of the model, and the better the prediction. In addition, the results of the properties of biochar from varied measuring instruments can lead to a significant difference. The models in Section 3.2 showed good prediction performance, but few numbers and small variation intervals of the dataset had been used to develop the models. These characteristics of the dataset would lead to the weak generalization ability of the models. Therefore, new models were built after introducing additional data points from previous studies. The statistical analysis of the new merged input dataset (n = 149) was shown in Fig. 6 and Table S3. Compared with the original statistical results previous dataset (n = 52, Fig. 2), the average HHV value increased to 21.27 MJ/kg, which is 4.89 MJ/kg higher. The variation interval expanded from 5–28 MJ/kg (Fig. 2) to 5–35 MJ/kg (Fig. 6).
The best prediction performance of the MLR models developed from the new dataset was not greatly improved compared with the equations in Table 2. The best equation was Eq. (15) with training R2= 0.9284, training MAE = 0.0654, and training RMSE = 1.8626 (Table 3), but the applicability of the model was reduced (test R2= 0.8749, test MAE = 0.0857, test RMSE = 2.5070) compared with equations in Table 2. However, for the ML models, the GBR, RF, and SVM models all showed good prediction performance, where training R2 was 1.00, 0.99, and 0.98 (Fig. 7), respectively. The GBR algorithm model had the best performance with R2 = 1.00, MAE = 0.32, and RMSE = 0.37 and R2 = 0.98, MAE = 0.83, and RMSE = 1.08 for the training and test datasets, respectively. Overall, the predictive performance of the expanded dataset (n = 149) was better than the original dataset (n = 52), which showed a sizeable dataset could generally represent better prediction performance. Fig. 8 described the relative contributions of each feature based on GBR and RF models. The relative importance of C in both GBR and RF models was the major one with values of 0.7087 and 0.8834, respectively. As a solid product of biomass by thermochemical conversion, the HHV of biochar mainly came from the combustion fracture of the C–H bond, and the contribution of O, H, and N to the HHV of biochar was limited . Ash was the second most important feature of the two models, with values of 0.2343 and 0.0699, respectively. It also can be found that the third important feature of the two models was not the same, but the relative importance of the two features was lower than 0.02 that can be negligible.
Biochars produced from a wide range of biomass were characterized. Compared with biochar from the pyrolysis of individual biomass, biochar with higher yield and HHV could be obtained by the pyrolysis of biomass mixtures. Pyrolysis temperature and biomass mixture can affect the biochar yield and properties. Moreover, the MLR and ML prediction models were successfully developed to predict the HHV of biochar based on 52 experiment data. ML approaches showed better prediction ability (training R2 ≥ 0.96) of the biochar HHV prediction compared with MLR (training R2 < 0.94). The HHV of biochar (test dataset) was successfully predicted from the ultimate and proximate analysis with the GBR algorithm with R2 = 0.98, MAE = 0.83, RMSE = 1.08 trained by the experimental training dataset. The RF and SVM models also had a similarly good performance of the HHV prediction with the R2 = 0.97, MAE = 0.93, RMSE = 1.22 and R2 = 0.97, MAE = 0.93, RMSE = 1.23, respectively. With the expanded datasets (n = 149), the predictive performance of ML models was improved. Feature importance analysis showed that ash and C had the highest relative importance to HHV prediction, while VM and FC had limited effects. The ML approaches can predict the HHV of biochar with high accuracy and play an important role in the development of biochar fuel applications.
Funding Statement: The work was supported by the National Natural Science Foundation of China (No. 51808278) and the Science Foundation for Youths of Jiangxi Province, China (20192BAB213012). This research was also supported by the College Students’ Innovative Entrepreneurial Training Plan Program, China (No. 201910403049).
Conflicts of Interest: The authors declare that they have no conflicts of interest to report regarding the present study.
|This work is licensed under a Creative Commons Attribution 4.0 International License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.|