|Computer Modeling in Engineering & Sciences|
Hybridization of Differential Evolution and Adaptive-Network-Based Fuzzy Inference System in Estimation of Compression Coefficient of Plastic Clay Soil
1University of Transport and Communications, Hanoi, 100000, Vietnam
2Department of Civil, Environmental and Natural Resources Engineering, Lulea University of Technology, Lulea, 971 87, Sweden
3Department of Watershed & Arid Zone Management, Gorgan University of Agricultural Sciences & Natural Resources, Gorgan, 4918943464, Iran
4University of Transport Technology, Hanoi, 100000, Vietnam
5DDG (R) Geological Survey of India, Gandhinagar, 382010, India
*Corresponding Authors: Nadhir Al-Ansari. Email: email@example.com; Binh Thai Pham. Email: firstname.lastname@example.org
Received: 04 May 2021; Accepted: 12 July 2021
Abstract: One of the important geotechnical parameters required for designing of the civil engineering structure is the compressibility of the soil. In this study, the main purpose is to develop a novel hybrid Machine Learning (ML) model (ANFIS-DE), which used Differential Evolution (DE) algorithm to optimize the predictive capability of Adaptive-Network-based Fuzzy Inference System (ANFIS), for estimating soil Compression coefficient (Cc) from other geotechnical parameters namely Water Content, Void Ratio, Specific Gravity, Liquid Limit, Plastic Limit, Clay content and Depth of Soil Samples. Validation of the predictive capability of the novel model was carried out using statistical indices: Root Mean Square Error (RMSE), Mean Absolute Error (MAE), and Correlation Coefficient (R). In addition, two popular ML models namely Reduced Error Pruning Trees (REPTree) and Decision Stump (Dstump) were used for comparison. Results showed that the performance of the novel model ANFIS-DE is the best (R = 0.825, MAE = 0.064 and RMSE = 0.094) in comparison to other models such as REPTree (R = 0.7802, MAE = 0.068 and RMSE = 0.0988) and Dstump (R = 0.7325, MAE = 0.0785 and RMSE = 0.1036). Therefore, the ANFIS-DE model can be used as a promising tool for the correct and quick estimation of the soil Cc, which can be employed in the design and construction of civil engineering structures.
Keywords: Compression coefficient; differential evolution; adaptive-network-based fuzzy inference system; machine learning; vietnam
Soil is a natural resource formed due to weathering of different rocks, comprising of various minerals, air, water, and organic material. The determination of the geotechnical properties of soil is very important for the safe and economic construction of civil engineering structures [1–4]. Soil properties, especially compressibility or compression coefficient (Cc) depend on the types of soils and it’s in filled spaces . Fine-grained soils have a relatively lower load tolerance capacity than coarse-grained soils . Compressibility of soil which includes compaction and consolidation also affects agriculture and plant growth . The conventional methods and tests for determining soil compaction parameters [6–8] and consolidation are costly and time-consuming process and require a great deal of precision . Therefore, various theoretical and experimental models have been developed to establish the correlation between Cc and other soil index properties using minimum data .
Recently, Artificial Intelligence (AI) or Machine Learning (ML) techniques have been used to estimate engineering parameters and in solving geotechnical problems including compressibility, soil classification, and shear strength of soil [11–16]. Single algorithms such as Support Vector Machine (SVM), Artificial Neural Network (ANN) have been used successfully in geotechnical engineering. The quality of the input data is also essential for improving the performance of AI or ML models . The application of ML and AI in predicting Cc, has been attempted by some researchers using ANN with seven input variables including water content, liquid limit, plastic index, specific gravity, and soil types, which revealed that the ANN model could perform better than empirical formulas . In another study using the ANN model in estimating Cc of fine-grain soils, they found that the estimated values were obtained nearly equal to the experimental values . Some studies indicated that Adaptive Neuro-Fuzzy Inference System (ANFIS) and Genetic expression programming performed better than existing empirical equations .
Recently, hybrid models using a combination of ML algorithms such as ANN, ANFIS, SVM, Multi-Layer Perceptron Neural Network (MLPNN), and Particle swarm optimization (PSO) have been efficiently used to predict soil parameters such as Cc [21,22]. Bui et al.  indicated that the hybrid model of Particle Swarm Optimization based Multi-Layer Perceptron (PSO-MLP) has the most accurate prediction of Cc in comparison with the single models of SVM, random forest, and Gaussian process, backpropagation neural network, and radial basis function. Another comparative study between hybrid models and single models also found that a hybrid model of ABC-LM-ANN (Artificial Bee Colony-Levenberg–Marquardt-Artificial Neural Network) could give a better performance compared to other benchmark approaches in predicting Cc for a housing construction project . Moayedi et al.  also confirmed that a hybrid model of League Championship optimization Algorithm (LCA) and ANFIS outperformed the single model of ANFIS. Thus, they have concluded that the hybrid model of LCA-ANFIS could be a promising alternative to empirical methods.
Based on the above literature review on the application of ML and AI in predicting Cc, it can be accepted that both single and hybrid models could predict Cc with high accuracy, but in general, the hybrid models performed better than the single models. However, the application of hybrid models in estimating Cc still remains limited, thus it has been attempted in this study. In addition, model development and improvement are a continuous process. As a result, it is necessary to fill this gap in the literature by developing new hybrid models. Therefore, in this work, the main objective is to develop and use first time hybrid ML model namely: ANFIS-DE which is a combination of Differential Evolution (DE), which is one of the most popular optimization techniques, and ANFIS for better prediction or estimation of the soil Cc. For this, a dataset of 817 soil testing samples was used for the model study. The seven soil parameters which are easily determined in the laboratory such as: Water Content, Void Ratio, Specific Gravity, Liquid Limit, Plastic Limit, Clay Content, and Depth (of Soil Samples) were used as input variables for the estimation or prediction as output (Cc). These parameters of soil obtained from the construction projects in Red River Delta, Viet Nam, were used in the present study. Performance of the models was evaluated using statistical indices namely Mean Absolute Error (MAE), Root Mean Square Error (RMSE), and Correlation Coefficient (R). In addition, REPTree (Reduced Error Pruning Trees) and DStump (Decision Stump), which are popular ML models, were used for the comparison.
2 Materials and Methods
Methodology of this study is presented in Fig. 1, which includes several main steps such as: (i) The data of Cc and relevant soil parameters (factors) obtained from test results were randomly split into two parts: training (70%) and testing (30%) datasets, (ii) Training dataset was then used to construct the hybrid model ANFIS-DE; out of these, DE was used as an optimization technique which used to optimize the weights and bias of the ANFIS predictor through optimization of the main control parameters of ANFIS (γ and σ2), and (iii) The final step was carried out by validating the constructed ANFIS-DE by using a testing dataset and several common statistical indicators such as R, RMSE, MAE.
Detail description of the data used and methods applied is presented in the following sections.
2.1 Data Used
In this study, the soil data of several construction projects located in Red River Delta, Viet Nam namely Thinh Long Bridge (32 samples), Nam Binh Bridge (31 samples), Bach Dang Bridge (34 samples), Ha Noi–Hai Phong National Highway (346 samples), Nam Dinh Coastal Roads (240 samples), Red River Bridge (28 samples), Tra Ly Bridge (25 samples), Van Giang Commercial Complex of Hung Yen province (36 samples), Thang Long Cement Factory of Quang Ninh province (45 samples) were used for the modeling. Red River Delta is formed by Red River (Hong River) and its distributaries in Northern Vietnam. This delta runs along the Gulf of Tonkin in 120 km length and extends 240 km in land. In total data of 817 soil samples (plastic clay) was used for the present study. The soil data was split in a 70:30 ratio randomly for the training (70%) and testing/validation (30%) of the models. In the modeling, geotechnical parameters namely depth of soil samples, clay content (%), water content (%), void ratio, specific gravity, liquid limit (%), and plastic limit (%) were used as input parameters and compressive coefficient (Cc) as output variable. Detail description of these parameters is presented in the following sections.
2.1.1 Compression Coefficient (Cc)
Compression coefficient (Cc) or compression index is an important soil mechanical parameter, which includes compaction and consolidation of soil . It is an important factor in analyzing the settlement of foundation of structure in soft soil. The Cc is computed by the slope of the compression curve in the oedometer test (Fig. 2) . In this study, the compression index of samples varies from 0.018 to 1.37 (Table 1). The data distribution of the compression coefficient parameter is presented in Fig. 3a.
2.1.2 Depth of the Soil Samples
The depth at which soil samples were collected for the determination of engineering properties of soil is considered as one of the important factors in the assessment of consolidation of soil . This is a critical input parameter in the prediction of the Cc. In the present work, samples from a depth varying between 1.1 and 45.85 m were collected and analyzed (Table 1). The data distribution of the depth parameter is presented in Fig. 3b.
2.1.3 Water Content
Water content (w) is defined as the proportion of the specific volume of water to the weight of solids of soil . It is one of the important variables in reducing cohesive forces between soil particles, shear strength of soils, and even causes the saturation of soils . Consolidation of soils occurs when water is expelled from the pore spaces. Thus, it is a critical input parameter for the prediction of the Cc [28–30]. Water content can be calculated as following equation :
where Ww is defined as the weight of water of soil sample, Ws is defined as the weight of solids of the soil sample, mw is defined as the mass of water of soil sample, ms is defined as the mass of the solids of the soil sample, and g is defined as the acceleration of gravity (g = 9.81 m/s2). In the present work, the water content of samples differs from 17.22% to 122.9% (Table 1). The data distribution of the water content parameter is presented in Fig. 3c.
2.1.4 Void Ratio
“Void ratio (e) is defined as the ratio of the volume of voids to the volume of solids” . It has a strong correlation with compression index [30,32–34]. It can be calculated as follows :
where Vv is the volume of voids and Vs is the volume of soil solids. In the present work, the void ratio of samples differs from 0.508 to 3.31 (Table 1). The data distribution of the void ratio parameter is shown in Fig. 3d.
2.1.5 Specific Gravity
“Specific gravity (Gs) is defined as the ratio of the unit weight of a given material to the unit weight of water” . It is an affecting factor to the compression index of soils . The specific gravity of the soil is given by the following equation :
where ρs is the particle density of soil and pw is the density of water (pw = 1000 kg/m3). In this study, the specific gravity of samples differs from 2.5 to 2.78 (Table 1). The data distribution of specific gravity parameter is shown in Fig. 3e.
2.1.6 Liquid Limit
“Liquid limit (LL) is defined as the moisture content at the point of transition from plastic liquid state”. It is strongly correlated with the compression index of soils [37,38]. The values of this factor can be determined by Atterberg tools in the laboratory, and using the following equation :
where Ws is the weight of solids of the soil sample, Wliquid is defined as the weight of water of the soil sample at the point of transition from plastic to liquid state. In the present work, the liquid limit of samples differs from 20.7% to 127.9% (Table 1). The data distribution of the liquid limit parameter is presented in Fig. 3f.
2.1.7 Plastic Limit
Plastic Limit (PL) is the moisture content at the point of transition from semisolid to plastic state, which can be determined using Atterberg tools in the laboratory. It is a critical input factor in the prediction of the Cc, which is calculated using the following equation :
where Wplastic is the weight of water of soil sample at the point of transition from semisolid to plastic state. In the present work, the plastic limit of samples differs from 13.22% to 82.8% (Table 1). The data distribution of the plastic limit parameter is shown in Fig. 3g.
2.1.8 Clay Content
Clays are classified as soil solid smaller than 0.002 mm or between 0.002 and 0.005 mm in size . Clay content (μ) is an influencing factor to the compression index of soils, which can be determined in the laboratory through grain size distribution analysis based on the following equation:
where msum is defined as the total mass of the soil sample and m0.005 is defined as the mass of soil passing through 0.005 mm sieve. In the present work, the clay content of samples ranges from 3% to 76% (Table 1). The data distribution of the clay content parameter is shown in Fig. 3h.
2.2 Methods Used
2.2.1 Adaptive-Network-Based Fuzzy Inference System (ANFIS)
Fuzzy set theory was proposed in 1965 to describe linguistic expressions as a mathematical method. An ANN is a practical method for learning various functions, such as functions with real values, functions with discrete values, and functions with vector values, which are based on the interconnection of several processing units . The fuzzy-neural model is an extended fuzzy model that uses an ANN learning algorithm to teach the model. ANFIS is a hybrid fuzzy-neural network used to model complex systems. The most important reason for combining fuzzy systems with neural networks is their ability to learn . The ANFIS model consists of a hybrid learning algorithm that includes a combination of the least-squares error algorithm and the reduction slope algorithm .
In this model, a set of nonlinear parameters is used in the hypothesis section and a set of linear parameters is used for the result section. Obtaining the value of these parameters is usually done in two steps forward and backward. In the first step, which goes to the fourth layer, the set of nonlinear parameters is assumed to be fixed and the set of linear parameters are calculated using the least-squares error algorithm. In the second step, the set of fixed linear parameters is assumed and the set of nonlinear parameters is obtained using the reduction slope algorithm . The ANFIS output is calculated using the output parameters in the forward step. Output error is used to match the assumed parameters using the standard post-emission algorithm. It has been proved that the hybrid algorithm is very efficient in teaching the ANFIS model . To determine the structure of the model, several methods have been proposed, the most common of which are the network separation method and reduced fuzzy clustering. The main difference between the two methods is in how to determine the fuzzy membership function. In the network separation method, the type and number of membership functions of the input information are determined by the input information .
2.2.2 Differential Evolution (DE)
All-purpose optimization evolution algorithms are known to be able to find near-optimal solutions to mathematical and real problems, while classical and analytical methods are not able to find the optimal solution in a logical computational time. One of these evolutionary algorithms that has recently been proposed is the “differential evolution algorithm” . This algorithm uses a differential operator to generate new answers, which causes the exchange of information between samples. The most important features of the DE algorithm are its high speed, simplicity, and power. This method only starts with setting three parameters . The population number parameter, the mutation weight parameter, and the C parameter are the probability of recombination or intersection, which is multiplied by the difference of the two vectors and added to the third vector. The F parameter is usually set between 0 and 2 and the Cr parameter is between 0 and 1 . In general, this algorithm has different stages, mutation, intersection or recombination, and finally the selection, which is described in Fig. 4 .
2.2.3 Reduced Error Pruning Trees (REPTree)
REPTree algorithm is a type of speed decision algorithm. According to this algorithm, the information obtained and the error due to variance are reduced. In other words, the REPTree algorithm uses two methods in synthetic with Reduced Error Pruning “REP” and the Decision Tree “DT” . The algorithm is generated using the information of two regression and decision trees for the classification standard. It is noteworthy that the difficulty and complexity of decision algorithms using pruning are reduced as well as the error due to model variance. This is why the simple structure of decision tree algorithms has led to consideration for classification purposes . When the output of this process is high, the DT algorithm uses a series of training data to facilitate the modeling process and the REP to reduce the difficulty of the DT framework . There is an overload overfitting-backward problem in the REPTree algorithm. After pruning the trees, the decision is to choose one of the best trees or one of the most accurate versions. The efficiency of this algorithm is according to notifications obtained from entropy or reduction of variance and reduction of error pruning methods . In this study, the REPTree was trained with the optimal hyper-parameters such as: the bathsize is 100, the number of folds is 3, and the minimum total weights of the instances in a leaf are 2.
2.2.4 Decision Stump (Dstump)
Dstump algorithm is a subset of machine learning algorithms . This algorithm is is a type of DT model that includes an interior root or node that straight joint to the end nodes. It should be noted that DTs have an upward trend. It also uses three parameters as input variables. The first variable is the selection threshold “θ”, the second variable is a time period “tp” and the third variable is the sincerity/purity index . Thus, a Dstump, statistical data continues until a limited period of time. The fi ∈ F property is then evaluated based on σ (⋅). On the other hand, and are two indicators of the best ranking. It should be noted that different changes can occur depending on the model inputs. Qualitative variables, for example, may contain one leaf for each attribute value. Or a tree with two leaves, one belonging to some selected group and the other leaf belonging to all other groups. For binary properties these two are the same . The partition test efficiency only one attribute based on the value. When making a decision stump, a threshold value is selected based on the weighting of the training samples to minimize weight categorization error . The optimal threshold value is done by comprehensively examining the possible values for a trained data set. It then examines each unique/inimitable value computed as the mean amongst successively ordered values . In this study, the Dstump was trained with the bathsize of 100.
2.2.5 Validation Indicators
In this work, the statistical measures of “Root Mean Square Error (RMSE)”, “Mean Absolute Error (MAE)” and “Correlation Coefficient (R)” were used to validate and compare the models. Based on the RMSE and MAE indices, the smaller the difference between the actual and simulated data can be, the more reliable the simulation results . Also, according to the RMSE and MAE indices, the closer the results of these are two indices are to zero, the more accurate and efficient the algorithms . R is a statistical tool to determine the type and degree of relationship of one quantitative variable with another quantitative variable . This coefficient is between 1 and −1 and if there is no relationship between the two variables, it is equal to zero . These indicators are calculated based on the following formulas [61–64]:
where N is the total number of data, Xi is the ith simulated data, Yi is the ith observational data, and are the average of the Y and X data.
3 Results and Discussion
3.1 Performance of ANFIS-DE
In this study, the ANFIS-DE was trained with the hyper-parameters such as: the number of populations is 50, lower bound of scaling factor is 0.2, upper bound of scaling factor is 0.8, and crossover probability is 0.5. Fig. 5 shows the optimization process of the ANFIS-DE with 1000 iterations. It can be observed that the R-value of the ANFIS-DE is dramatically increased in about 10 iterations and then stable with the value of approximate 0.8, while the values of RMSE and MAE are dramatically decreased in about 25 iterations and then stable with the values of 0.09 (RMSE) and 0.065 (MAE). These results show that the DE is dramatically optimized the predictive capability of the ANFIS, and thus, the performance of the ANFIS-DE is improved for the prediction of the Cc.
The correlation analysis results of actual (real) and predicted data of soil training and testing samples using the ANFIS-DE model showed that the R values for testing and training samples were 0.825 and 0.813, respectively (Fig. 6), which indicate that the predictive capability of the novel model ANFIS-DE is good for prediction of the Cc (Fig. 6).
Fig. 7 presents the results of the ANFIS-DE model error analysis using soil training samples. In this figure, it can be observed that the predicted value of the Cc obtained from the novel model is closer to the actual Cc value obtained from the experiment tests, which is also indicated by low values of RMSE (0.096) and MAE (0.065). This indicates that the novel model ANFIS-DE has a great goodness of fit with the training data. Fig. 8 shows the error analysis of the ANFIS-DE hybrid algorithm using soil testing samples. As can be seen, the predicted value of the Cc obtained from the novel model is closer to the actual Cc value obtained from the experiment tests, which is also indicated by low values of RMSE (0.094) and MAE (0.064). This shows that the predictive capability of the novel model ANFIS-DE is good for prediction of the soil Cc.
3.2 Comparison of ANFIS-DE with Popular Single ML Models
Table 2 shows the comparison of single ML models (REPTree and DStump) and hybrid algorithm (ANFIS-DE) using R, RMSE, and MAE statistical indices for the training dataset, it can be seen that the accuracy of the ANFIS-DE algorithm based on R, RMSE, and MAE is 0.813, 0.096 and 0.065, respectively. Also, the accuracy of REPTree and DStump single algorithms is R = 0.8315 and 0.6699, RMSE = 0.0952 and 0.1272, MAE = 0.0683 and 0.0949, respectively. Therefore, it can be concluded that all three ML models had a great goodness of fit with the data used.
On the other hand, the comparison of hybrid and single ML algorithms using testing dataset showed that the statistical indices R, RMSE, and MAE in the ANFIS-DE hybrid algorithm are 0.825, 0.096, and 0.064, respectively (Table 3), and the efficiency of REPTree and DStump single algorithms in predicting and evaluating the Cc are (0.7802, 0.7325, and 0.0988) and (0.1036, 0.068, and 0.0785), respectively. Therefore, it can be stated that the hybrid model ANFIS-DE is better than other ML models (REPTree and Dstump) for the prediction of Cc. The result of this study is consistent with previous studies that the hybrid models performed better than the single models . For example, Bui et al.  revealed that the hybrid model of PSO-MLP has the highest accuracy prediction of Cc in comparison with single models of SVM, Random Forest, and Gaussian process, etc. Besides, other authors also indicated that the hybrid model of LCA-ANFIS outperformed the single model of ANFIS .
In general, it can be stated that the novel hybrid model ANFIS-DE has the highest predictive capability compared with other single ML models (REPTree and Dstump). It might be due to the reason that in this hybrid model we have used ANFIS as a base predictor, which is able to model data with more capability. Also, by increasing the number of inputs in the ANFIS model to more than 4 inputs, different combinations of inputs are used in different ANFIS networks and finally enter the final ANFIS to support the output with very high accuracy . In addition, the ANFIS model includes two models: neural networks and a fuzzy model. The fuzzy part establishes the relationship between input and output and the parameters related to the fuzzy part membership functions are determined by neural networks. Therefore, the characteristics of both fuzzy and neural models lie in ANFIS. For this reason, using the capabilities of both models can provide better results in hybrid model . Since the hybrid algorithm has the features and characteristics of both algorithms and even the features of both individual exchange algorithms must be exchanged, generally, it has a very high accuracy and efficiency . Also, the main positive features of hybrid models are high flexibility and reliability .
In the present study, soil compression coefficient (Cc) was estimated using a novel hybrid model namely ANFIS-DE, which is a combination of ANFIS and DE optimization techniques. REPTree and Dstump were also selected as benchmark ML models for the comparison. Soil parameters for the modeling were obtained from the analysis of 817 soil samples collected from various civil engineering projects located in Red Northern Delta area, Vietnam. Out of these, Water content, Void Ratio, Specific Gravity, Liquid Limit, Plastic Limit, Clay content, and Depth of Soil Samples were used as input variables and the Cc was used as output variable. Various statistical indicators namely R, RMSE, and MAE were used for the validation and comparison of the models.
The results show that the novel hybrid model ANFIS-DE has a good predictive capability for prediction of the soil Cc (R = 0.825, MAE = 0.064 and RMSE = 0.094), its performance is even better than other benchmark ML models namely REPTree (R = 0.7802, MAE = 0.068 and RMSE = 0.0988) and Dstump (R = 0.7325, MAE = 0.0785 and RMSE = 0.1036). Therefore, it can be concluded that the novel model ANFIS-DE is a promising tool for quick and accurate prediction of the soil Cc, which can be used in proper designing and safe construction of civil engineering structures.
As can be confirmed and verified by the experimental results, one important advantage of the hybrid ANFIS-DE is that it incorporated two powerful single methods (ANFIS and DE) to obtain a good prediction performance of Cc parameter. Thus, this hybrid model is able to provide a reliable prediction of this soil parameter to quickly support engineering decision-making. Finally, the proposed hybrid ANFIS-DE model is a new alternative model to fill the gap in the literature in the application of the hybrid model for estimating Cc and to assist geotechnical engineers in designing building foundation structures. Future studies could consider new robust optimization techniques to enhance the estimation performance of Cc. Besides, future direction of the present study may include applying this hybrid model of ANFIS and DE in the estimation of other parameters and in solving more civil engineering problems.
However, one limitation of this study is that the present hybrid model does not incorporate the feature selection method, thus adopting metaheuristic-based feature evaluation could be a potential direction of this study. Besides, in this study, we have used only ANFIS combined with DE for the development of hybrid model, it is desirable to explore more combination of other single models for the comparison and selection of best hybrid model. Finally, although this study was conducted with a large number of samples with seven input parameters, more numbers of variable input parameters may be considered in future studies using this hybrid model for further refining performance of the model.
Funding Statement: This research is funded by Ministry of Education and Training of Vietnam, Grant No. B2020-GHA-03, organized by the University of Transport and Communications, Hanoi, Vietnam.
Conflicts of Interest: The authors declare that they have no conflicts of interest to report regarding the present study.
|This work is licensed under a Creative Commons Attribution 4.0 International License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.|