|Computers, Materials & Continua |
Estimating Weibull Parameters Using Least Squares and Multilayer Perceptron vs. Bayes Estimation
1Department of Computer Science, College of Humanities and Science in Al Aflaj, Prince Sattam Bin Abdulaziz University, Al-Aflaj, Saudi Arabia
2Department of Mathematics, College of Humanities and Science in Al Aflaj, Prince Sattam Bin Abdulaziz University, Al-Aflaj, Saudi Arabia
3Laboratory of Electronics & Information Technologies, Sfax University, Sfax, Tunisia
4Department of Administration, Administrative Science College, Thamar University, Yemen
*Corresponding Author: Walid Aydi. Email: email@example.com
Received: 28 August 2021; Accepted: 25 October 2021
Abstract: The Weibull distribution is regarded as among the finest in the family of failure distributions. One of the most commonly used parameters of the Weibull distribution (WD) is the ordinary least squares (OLS) technique, which is useful in reliability and lifetime modeling. In this study, we propose an approach based on the ordinary least squares and the multilayer perceptron (MLP) neural network called the OLSMLP that is based on the resilience of the OLS method. The MLP solves the problem of heteroscedasticity that distorts the estimation of the parameters of the WD due to the presence of outliers, and eases the difficulty of determining weights in case of the weighted least square (WLS). Another method is proposed by incorporating a weight into the general entropy (GE) loss function to estimate the parameters of the WD to obtain a modified loss function (WGE). Furthermore, a Monte Carlo simulation is performed to examine the performance of the proposed OLSMLP method in comparison with approximate Bayesian estimation (BLWGE) by using a weighted GE loss function. The results of the simulation showed that the two proposed methods produced good estimates even for small sample sizes. In addition, the techniques proposed here are typically the preferred options when estimating parameters compared with other available methods, in terms of the mean squared error and requirements related to time.
Keywords: Weibull distribution; maximum likelihood; ordinary least squares; MLP neural network; weighted general entropy loss function
The parameters of the Weibull distribution are widely used in reliability studies and many engineering applications, such as the lifetime analysis of material strength , estimation of rainfall , hydrology , predictions of material and structural failure , renewable and alternative energies [5–8], power electronic systems , and many other fields [10–12].
The form of the probability density function (PDF) of two parameters of WD is given by:
The cumulative distribution function (CDF) and the survival function S of the WD can be expressed as
where the parameters and represent the scale and the shape of the distribution, respectively.
Several approaches to estimating the parameters of the WD have been proposed . They can generally be classified as manual or numerical .
Manual approaches include the ordinary least squares [15,16], unbiased good linear estimators , and weighted least squares . Computational methods include maximum likelihood estimation , the moments estimation method , Bayesian approach , and least-squares estimation with particle swarm optimization .
In addition to computational methods, many studies in the literature have attempted to use the neural network (NN) to anticipate the parameters of the WD in many areas, such as the method developed by Jesus that applies the Weibull and ANN analysis to anticipate the shelf life and acidity of vacuum-packed fresh cheese . In survival analysis, Achraf constructed a deep neural network model called DeepWeiSurv. It was assumed that the distribution of survival times follows a finite mixture of a two-parameter WD . In another work in the field of electric power generation, an artificial NN (ANN) and q-Weibull were applied to the survival function of brushes in hydroelectric generators .
Recently, a few methods have been attempted to combine the robustness of the ANN and some of the above statistical methods. Maria modeled the distribution of tree diameters using the OLS and the ANN . In the same way and based on the ability of the OLS, in its simplest form, which assumes a linear relationship between the predictor and the unreliability function on one hand and the robustness and rapidness of the single-hidden-layer networks to handle the linear functions compared with multiple-hidden-layer  on the other hand, we will propose to combine OLS and a neural network to predict the two-parameter WD.
In the proposed method, we solve the problem whereby the reliability of the OLS method is compromised by outliers through the introduction of a pre-trained neural network after the linearization of the CDF. The remaining sections of this paper are organized as follows: Section 2 provides a review of different numerical and graphical methods for estimating the parameters of the WD, such as the MLE, OLS, WLS, and BLGE. In Section 3 we present the proposed methods. To evaluate their appropriateness in comparison with competing methods, the relevant performance metrics are covered in Section 4. The results are discussed in Section 5. Finally, the conclusions of this study are provided in Section 6.
2 Review of Numerical and Graphical Methods for Estimating Parameters of WD
The most commonly used approaches to estimate the parameters and of the WD are described below.
2.1 Maximum Likelihood Estimator (MLE)
Let the set of n random lifetimes from the WD be defined by Eq. (1). Then, the likelihood function and its corresponding logarithm for the given sample observations are shown in Eqs. (4) and (5), respectively :
The partial derivatives of the equation for with respect to the variables and are given by:
The MLE estimator of is:
The parameter can be obtained by using any numerical method, such as the Newton–Raphson.
2.2 Ordinary Least Squares Method (OLS)
To estimate the parameters of the WD, the OLS method is extensively used in mathematics and engineering problems . We can obtain a linear relationship between parameters by taking the logarithm of Eq. (2) as follows:
Let . Then, Eq. (9) can be written as
Let be order statistics of , and let be the ordered observations in a random sample of size n. To estimate the values of the cumulative distribution function we use the mean rank method as follows:
The estimates and of the regression parameters and minimize the function
Therefore, the estimates and of the parameters and are given by
The estimates and of the parameters and are given by
2.3 Weighted Least Squares Method (WLS)
In the WLS estimate, the parameters and are the values of the parameters that minimize the function:
The biggest challenge in the application of the WLS is in finding the weights in Eq. (15). We use the delta method  to find them:
Hence, the weights can be written as follows:
Minimizing we obtain the WLS estimates of and as
, with and
2.4 Approximate Bayes Estimator
In this section, the approximate Bayesian estimator under a GE loss function of the parameters and of the WD is discussed. We assume a non-informative (vague) prior according to  as
The parameters and are estimated using Lindley's approximation technique. The posterior expectation E is given by Eq. (22) :
Moreover, it can be asymptotically estimated by:
where , represents the prior distribution of , is the likelihood function, and of the covariance matrix of the parameter estimators.
For the two-parameter case Eq. (22) reduces to:
The functions in Eq. (24) are computed using MLEs with respect to and .
To apply the Lindley model of Eq. (24) to estimate the parameters of the WD, the following are obtained from Eq. (23):
The elements of the covariance matrix are expressed by
2.4.1 Estimates Based on General Entropy Loss Function
The general entropy loss function L for , shown in Eq. (24), is expressed by the following form :
where is an estimate of . The Bayes estimator of , denoted by is the value that minimizes Eq. (26):
The BLGE of for from Eq. (24) is found by the following expressions:
In the same way, the BLGE of for is found by the following expressions:
3 Proposed Methods
In the following sections, we describe the proposed BLWGE and OLSMLP methods.
3.1 Weighted General Entropy Loss Function
The WGE loss function was proposed as dependent on the weighted loss GE function as follows:
where represents the estimated parameters that minimize the expectation of the loss function (Eq. (27)), and represents the proposed weighted function as expressed by Eq. (28):
Based on the posterior distribution of the parameter , and by using the WGE function given in Eq. (28), we obtain the estimated BLWGE of the parameter as follows:
Thus, we can find that
Consequently, the BLWGE of parameter , obtained by using the WGE loss function, is as presented in Eq. (29):
provided that and exist and are finite, where represents the expected value.
We note that the GE is a special case of the WGE when in Eq. (29).
3.1.1 Estimates of Parameters of WD Based on Weighted General Entropy Loss Function
Based on the WGE and by using Eq. (29), the approximate Bayes estimator for is shown as:
Thus, the BLWGE for the shape parameter is
Similarly, the BLWGE for , is given by Eq. (34):
Thus, the weighted Bayes estimator for the shape parameter is
3.2 Ordinary Least Squares and the Multilayer Perceptron Neural Network (OLSMLP)
As previous studies have shown [14,33], manual calculations yield the smallest standard deviation (STD) in the parameter λ, and are consequently more accurate than computational methods. Moreover, methods of manual estimation are more accurate for small sample sizes . However, these computational methods, especially the OLS, are sensitive to outliers and specific residual behavior . To solve these problems, many studies have proposed different methods, such as the iterative weighting method based on the modified OLS , the WLS, and many other methods based on the WLS . A major challenge in these methods is determining the weights.
3.2.1 Proposed Method to Estimate Parameters of WD
We now describe the proposed method, which is divided into two main parts: the linearization of the CDF, and the application of a feedforward network with backpropagation to estimate the values of and of the WD.
The OLS method takes the CDF defined in Eq. (2) and linearizes it as described in Eq. (10). It then determines the coefficients and via linear regression by using the slope and the intercept. The principle of the method used by the OLS to compute and can be violated even with a few outliers.
Therefore, instead of using the slope and the intercept, we propose applying Algorithm 1 as described below.
• Application of Proposed Model to Estimate Parameters of WD
The steps used to evaluate the parameters of the WD from the input csv file are described by Algorithm 1.
• Data Normalization
Normalization is an essential preprocessing tool for a neural network [36,37]. Before training a neural network model, the input data are scaled using the RobustScaler norm in a preliminary phase, where each sample with at least one non-zero component is rescaled using the median and quartile range as described by Eq. (38). The RobustScaler norm is used to remove the influence of outliers. Following this, the MinMaxScaler, defined by Eq. (39), is applied to the output of the RobustScaler. The MinMaxScaler scales all the data features to the range [0, 1]:
where is a feature vector, is an element of feature , is the rescaled element obtained by using MinMaxScaler, and is the rescaled element obtained by using RobustScaler.
• Structure of the Proposed Neural Network
To estimate the parameters of the WD, we propose using a multilayer perceptron (MLP), which is a feedforward network with backpropagation . According to the structure of the MLP, the proposed network, as shown in Fig. 1, consists of an input layer (with n neurons), a hidden layer (with k neurons), and an output layer (with m neurons that yield the Weibull parameters as the output of the network).
Various criteria have been proposed in the literature to fix the number of hidden neurons . In our architecture, we use the rule whereby “the number of hidden neurons k should be times the size of the input layer, plus the size of the output layer” [38–40].
The hyperbolic tangent activation function () is proposed here in the input layer, and the sigmoid function in the output layer. They are used frequently in feedforward nets, and are suitable for shallow networks as well as applications of prediction and mapping [38,41].
The objective of our neural network is a model that performs well on the data used in both the training and the test datasets. For this reason, we add a well-known regularization layer as described in the next section.
Regularization is a technique that can prevent overfitting [37,38]. A number of regularization techniques have been develop in the literature, such as L1 and L2 regularizations, bagging, and dropout. In the proposed structure, we use dropout, a well-known technique that randomly “drops out” or omits hidden neurons of the neural network to make them unavailable during part of the training [38,42]. This reduces the co-adaption between neurons, which results in less overfitting .
• Optimization Algorithm
The optimization of deep networks is an active area of research . The most popular gradient-based optimization algorithms are Adagrad, Momentum, RMSProp, Adam, AdaDelta, AdaMax, Nadam, and AMSGrad [38,43,44]. We chose Nadam due to its superiority in supervised machine learning over the other techniques, especially for a deep network . Moreover, it combines the strengths of the Nesterov acceleration gradient (NAG) and the adaptive estimation (Adam) algorithms as described in :
: time step
: learning rate
the exponential average square of gradients
the weight that we want to update
gradient of L; the loss function to minimize.
: momentum decay and scaling decay, respectively
4 Performance Metrics
To evaluate the proposed methods with respect to other methods, we used two statistical tools, the mean squared error (MSE) and the mean absolute percentage error (MAPE) , in addition to the computation time.
5 Results and Discussion
5.1 Dataset Description
We generated 250,000 random data points from the WD for different parameters and different values of ranging from 1 to 299, and those of ranging from 0.5 to 100. For each shape/scale pair, we generated 10,000 samples of different sizes
We used the same dataset for the neural network in the training phase, but applied one sample to each shape/scale pair. This was unlike in the other methods (MLE, OLS, WLS, BLGE, and BLWGE), which used 10,000 samples to estimate the parameters of the WD. This dataset was divided into two subsets. The first subset was used to fit the model, and is referred to as the training dataset; it was characterized by known inputs and outputs. The second subset is referred to as the test dataset, and was used to evaluate the fitted machine learning model and make predictions on the new subset, for which we did not have the expected output. We chose the train–test procedure for our experiments because we guessed that we had a sufficiently large dataset available.
5.2 Experimental Setting
5.2.1 Parameter Selection for OLSMLP
In all experiments, we trained the model with Google Collaboratory (GPU) for 25 epochs. We used the Nadam optimizer with learning rate of ; terms representing the momentum decay, scaling decay, and smoothing were kept at their default values: , , and A dropout with a ratio of 0.6 was applied to the hidden layer. As described in Section 3, the hidden and output layers used the and activation functions, respectively. The error function or loss function was the mean squared error, and was used to estimate the loss of the model.
5.2.2 Parameter Selection of BLGE and BLWGE
In all experiments, the parameters of the BLWGE and BLGE were empirically determined. The values of the weights q and z of the BLWGE were −3 and 6, respectively. For the BLGE, the parameter
5.3 Estimating Parameters of Weibull Distribution
5.3.1 Effect of Sample Size on Estimation of WD Parameters Using Prevalent Methods
Fig. 2 shows the evolution of the average MSE as a function of the sample size n. The MSE decreased quasi-linearly from to for all methods. Fig. 2 shows that the BLWGE, WLS, BLGE, and MLE had the lower MSE values for the different sample sizes compared with the OLS. We can deduce also that the WLS, GE, and MLE gave similar results with a slightly better start for the MLE at .
5.3.2 Effect of Sample Size on Estimation of WD Parameters Using Proposed Method
To illustrate how the sample size affects the calculation of the MSE, Fig. 3 shows the evolution of the latter as a function of the sample size n from 10 to 50.
From Fig. 3, we can deduce that as the sample size increased, the estimate of the MSE by the proposed method decreased and fluctuated. This fluctuation was due to the random nature of the information used and the limited number of samples (one sample) for each pair of shapes/scales.
Tabs. 1 and 2 show the results of the simulation of the proposed method and the other methods considered above. The results show the following:
1. The MLE and WLS behaved similarly as shown in Tab. 1: Their MSE values decreased gradually when their shape values increased at a fixed scale. Conversely, when the scale value increased with a fixed shape, the MSE increased.
2. The behavior of the OLS and GE was the opposite of that of the MLE and WLS. As depicted in Tab. 1, the MSE increased when the shape increased (at a fixed scale), and decreases when the scale increased (with a fixed shape).
3. The BLWGE and the OLSMLP behaved similarly in terms of scale estimation, as shown in Tab. 1.
4. All methods had the same global variation function, as shown in Fig. 4 and Tab. 2.
5. The MLE was slightly superior globally in terms of scale estimation to the other methods, but had the worst estimation of shape, as shown in Tab. 2.
6. The proposed MLP neural network acceptably estimated the scale, better than some methods. By contrast, it outperformed all other methods in terms of shape estimations most of the time.
From Tab. 3, we see that both statistical indicators, MSE and MAPE, yielded different values. The global rank was calculated to evaluate the best method. The results in the table indicate that the proposed method offered the best compromise between shape and scale estimation, as indicated by the global rank. Moreover, it retained the speed of the OLS and enhanced the accuracy of estimation of the parameters of the WD compared with the MLE, BLGE, and BLWGE.
This study proposed a method to estimate the parameters of the WD. This method is based on the OLS graphical method and the MLP neural network. The MLP solves the problems caused by the presence of outliers and eases the difficulty of determining the weights in the WLS method. It yielded acceptable results in simulations, especially in terms of shape estimation. It is also faster than the MLE, BLGE, and BLWGE.
We also proposed a second method (BLWGE), in which we introduced weight to the GE loss function. The results of simulations showed that BLWGE yields good results, especially in terms of shape estimation, compared with the other methods.
Acknowledgement: This project was supported by the Deanship of Scientific Research at Prince Sattam bin Abdulaziz University under Research Project No. 2020/01/16725.
Funding Statement: The authors are grateful to the Deanship of Scientific Research at Prince Sattam bin Abdulaziz University Supporting Project Number (2020/01/16725), Prince Sattam bin Abdulaziz University, Saudi Arabia.
Conflicts of Interest: The authors declare that they have no conflicts of interest to report regarding this study.
|This work is licensed under a Creative Commons Attribution 4.0 International License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.|