|Computers, Materials & Continua
Swarm-LSTM: Condition Monitoring of Gearbox Fault Diagnosis Based on Hybrid LSTM Deep Neural Network Optimized by Swarm Intelligence Algorithms
1School of Electrical and Electronics, Sathyabama Institute of Science and Technology, Chennai, 600119, India
2Department of Computer Science and Engineering, G.B. Pant Government Engineering College, New Delhi, 110020, India
3Artificial Intelligence and Data Analytics (AIDA) Laboratory, CCIS, Prince Sultan University, Riyadh, 11586, Saudi Arabia
4Department of Computer Engineering, J.C. Bose University of Science and Technology, Faridabad, 121006, India
*Corresponding Author: Tanzila Saba. Email: email@example.com
Received: 27 July 2020; Accepted: 11 September 2020
Abstract: Nowadays, renewable energy has been emerging as the major source of energy and is driven by its aggressive expansion and falling costs. Most of the renewable energy sources involve turbines and their operation and maintenance are vital and a difficult task. Condition monitoring and fault diagnosis have seen remarkable and revolutionary up-gradation in approaches, practices and technology during the last decade. Turbines mostly do use a rotating type of machinery and analysis of those signals has been challenging to localize the defect. This paper proposes a new hybrid model wherein multiple swarm intelligence models have been evaluated to optimize the conventional Long Short-Term Memory (LSTM) model in classifying the faults from the vibration signals data acquired from the gearbox. This helps to analyze the performance and behavioral patterns of the system more effectively and efficiently which helps to suggest for replacement of the unit with higher precision. The results have demonstrated that the proposed hybrid modeling approach is effective in classifying the faults of the gearbox from the time series data and achieve higher diagnostic accuracy in comparison to the conventional LSTM methods.
Keywords: Gearbox; long short term memory; fault classification; swarm intelligence; optimization; condition monitoring
Fault diagnosis has a significance in identifying the degrading parts of the rotating machinery and replacement of the same well before a total breakdown to reduce the downtime. Especially in wind turbines, gears, shafts, blades and rolling bearings play a vital role that is widely used in the transmission of power. Any failures within them would introduce unexpected and unwarranted breakdown time, expensive maintenance, loss in production and delayed distribution of power. Hence, it is necessary to identify and predict such faults during the Operation and Maintenance (O&M) at early stages itself and increase the production of power to prevent power disruptions and catastrophic failures.
Condition monitoring facilitates collecting the health information of the machinery through different methods such as vibration, acoustic and thermal imaging analysis. Earlier, the methodology adopted was through the processing of signals to derive deeper insights into different spectra such as time and frequency. The transition of Artificial Intelligence (AI) has been extensively investigated in the rotating machinery devices with versatile machine learning and deep neural network models. AI-led to relate the spectral insights in identifying the defects, further seeking more insights to categorize, to forecast the Remaining Useful Life (RUL) of the machinery components, and to replace the required ones.
Few research studies state how the traditional Convolution Neural Network (CNN) model has effective in the fault diagnosis for the classification of faults based on vibration analysis, by learning the features acquired from rolling bearings  and gearbox [2,3] as well both in time and frequency spectra. The results have been compared with a few of the peer algorithms such as Support Vector Machine and Random Forest.
New emerging methods in condition monitoring arise, to improve the reliability of gearboxes apart from the regular signal processing and applying machine learning models to categorize the faults accordingly. In recent years, Deep Neural Networks such as CNN and LSTM have widely been used to classify the faults and as well predict the RUL of the machinery too. These neural networks, used for classification of faults were Deep-Belief Network, Deep-Boltzmann Machines, Restricted Boltzmann Machines and Auto-Encoders  Support Vector Regression  and Stacked Multilevel Denoising Auto-Encoders  validated on vibration signals; similarly, Deep Random Forest Fusion  applied on both vibration and acoustic signal characteristics simultaneously for the gearbox fault diagnosis. Torsional vibration signals  of gear tooth have been processed in phase domain to localize the defect and derive the type of fault.
Wavelet analysis integrated with CNN model  was used to transform the vibration signals into time-frequency spectral images, and later capturing the features from the images to classify the faults. Cepstrum analysis  of the processed vibration signals with Hilbert and wavelet transform helped together to derive the gear faults and cracks in the bearings. Discrete Wavelet Transform  on time series vibration signals and Continuous Wavelet Transform  performed on acoustic signals to extract the features.
This paper has been organized as follows: Section 2 states the recent related research works regarding gearbox fault classification. Section 3 details the proposed hybrid LSTM gearbox fault diagnosis method along with the optimization techniques utilized for the classification of faults. In Section 4, the evaluated results of the custom hybrid model have been tabulated with metrics and later discussed the comparison of the proposed hybrid method with conventional LSTM methods. Finally, the conclusions followed by the references have been presented.
2 Literature Review
Some of the recent existing methodologies proposed for gearbox fault classification have been reviewed in this section.
Chen et al.  stated three deep neural network models Deep Belief Networks, Deep Boltzmann Machines and Stacked Auto-Encoders to assess rolling bearing fault conditions. Multiple pre-processing schemes time domain, frequency domain, and time-frequency domain have been applied for feature extraction. A single dataset of 7 fault patterns has been considered to test the efficiency of deep learning models to derive the health condition of rotating mechanical machinery. The evaluated results have demonstrated with a reliable model accuracy that was relevant for the bearing diagnostics.
Merainani et al.  performed an in-depth comparison of Hilbert Empirical Wavelet Transform (HEWT) and Hilbert Huang Transform (HHT) on the gearbox vibration signals. HEWT, a self-adaptive time-frequency analysis was applied to the vibration signals to obtain the instantaneous amplitude matrices. The fault feature vectors were acquired on decomposition from the vibration signals and later classified using Self-Organizing Map model to derive the state of gear condition such as healthy gear, tooth cracking, input shaft slant crack and tooth surface pitting.
Liu et al.  proposed the Stacked Auto-Encoders (SAE) model for the resolution of gearbox fault diagnosis. This model was useful in extracting the salient characteristics from the frequency-domain signals. To reduce the challenge of overfitting during the training process of SAE and improve the performance with a tiny dataset, dropout technique and Rectified Linear Unit (ReLU) activation function has been introduced herewith. The effectiveness of this proposed model approach was derived with two gearbox datasets. The capability of the proposed model performed superior in comparison to the existing raw SAE.
Malik et al.  described fault diagnosis one of the widely used technique to identify the faults of gearbox, that would support in minimizing the operation cost and as well improve the reliability and feasibility of the wind turbine gearbox. Here the observed vibration signals of the gearbox were utilized to perform fault diagnosis by extracting the features in an Empirical Mode Decomposition technique and then the Artificial Neural Network model was utilized for the classification of faults.
Medina et al.  addressed the application of symbolic dynamics algorithms for the analysis of features from the vibration signals of gearbox. The key features were extracted from the vibration signals using a peak symbolic dynamics algorithm subdividing the phase space. Two different experiments were evaluated herewith with 10-fold cross-validation. In the first experiment, the dataset was partitioned randomly into 10-fold sets wherein 9 sets were considered for training and 1 set for the validation using the multi-class support vector machine model. In the second experiment, classification was performed by considering the signals at different load conditions.
Johnson et al.  analyzed the classification of faults using six variants of K-Nearest Neighbors (KNN) such as fine, weighted, medium, cosine, coarse and cubic. The processing of signals and analysis was performed on a MATLAB simulated high-voltage DC transmission line data.
One of the researchers, (Tang et al. ) introduced a custom model approach integrating K-Means with bio-inspired optimization models such as Ant Colony, Firefly, Cuckoo, Wolf, and Bat. Comparison of results enlisted by evaluating it on multiple datasets such as iris, wine, Libras, Haberman, synthetic and mouse. This research revealed how to overcome the key drawbacks of standard K-Means wherein the local optima is considered, and considering the global optima integrating with bio-inspired optimization models.
Liu et al.  acquired the vibration signals of multiple gears from an experimental test rig for the diagnosis of gears. This study highlights the application of ensemble empirical mode decomposition method to decompose the signals of multiple gear teeth at different levels to intrinsic mode functions. The extracted feature vectors were distinguished by a multi-class support vector machine to classify the health status of the gears. The resulting outputs with respective inputs were evaluated and compared herewith.
The existing literature has used different approaches in evaluating the gearbox fault diagnosis. Moreover, several kinds of research exist applying the vanilla deep learning models with few having in-depth customization. Hence, proposed herewith a novel approach by fine-tuning the parameters and activation functions of the conventional LSTM recurrent neural network model.
This novel approach constitutes of hybrid LSTM based model wherein optimization has been applied with swarm intelligence algorithms. Detailed evaluations and comparison of the results are performed using a gearbox condition monitoring data set to classify the faults accordingly. This article highlights the following:
• Proposal of a custom hybrid LSTM model with swarm intelligence for fault diagnosis of gearbox.
• In-depth analysis on a restricted subset of gearbox data with hybrid LSTM on 10 different load conditions.
• Evaluation on different LSTM activation function, i.e., Sigmoid, hyperbolic tangent (tanh), Rectified Linear Unit (ReLU); optimized with Particle Swarm Optimization (PSO), Firefly Algorithm (FA), Cuckoo Search Optimization (CSO), and Ant Colony Optimization (ACO) algorithms in combination with specified LSTM activation functions.
• Observations of each group of optimization and activation functions on all different 10 loads for the classification of faults.
• Tabulation of performance metrics such as accuracy, precision, recall, specificity, sensitivity, F-Score, weight, bias and activation functions such as default sigmoid, hyperbolic tangent, and ReLU activation functions along with its customized parameters.
3 Proposed Methodology
This proposed hybrid fault diagnosis methodology explains the approach of how the swarm intelligence algorithms have been combined with LSTM network model to classify the faults of gearbox. The proposed gearbox fault diagnosis method applying Hybrid-LSTM network model comprises of four steps and is illustrated in Fig. 1. Initially, the given dataset has been converted into a time series data and then given as input to the proposed Hybrid-LSTM network model.
3.1 Long Short-Term Memory Network
Long Short-Term Memory (LSTM) network is one of the variants of recurrent neural network. This has been widely applied on time series datasets recently and has been proven as a highly efficient learning neural network model among others.
An LSTM network enables to input sequence data into a network and make predictions based on the individual time steps of the sequence data. For conventional or default activation functions like sigmoid and tan h functions, the gradients decrease quickly with training error propagating to forward layers. The activation functions for sigmoid and hyperbolic tangent and its differential functions are formulated as in Eqs. (1)–(4) as:
Recently, the ReLU activation function gained tremendous recognition, especially in the last few years. This is because its gradient will not decrease with the independent variables increasing. Hence, the network with ReLU activation function can overcome the vanishing of the gradient. The ReLU and its differential functions are mathematically formulated in the Eqs. (5) and (6).
A five-layer LSTM customized neural network is implemented with one sequence input layer, one bi-directional LSTM layer other than the default LSTM layer for standardization, one fully connected layer, one softmax layer and one output layer for classification. Each LSTM unit has an input gate, the forget gate, and the output gate along with the memory unit that is being read and updated periodically . The formulated equations of LSTM have been listed as Eqs. (7)–(12):
wherein is the input of the memory cell, is the input of the memory cell, , and is the output of forget gate, input gate, and output gate respectively, is the state of the memory cell. is the gate activation function, Υ denotes Hadamard product, denotes the output activation function, , , , , , and represent the corresponding weight matrix, , , and are bias vectors. This is pictorially illustrated in Fig. 2.
3.2 Optimization Algorithms
In this research, few of the swarm intelligence algorithms Particle Swarm, Firefly, Cuckoo Search, and Ant Colony in combination with specified custom LSTM activation functions (in multiple combinations as well) by replacing their regular activation functions, i.e., Sigmoid, hyperbolic tangent and ReLU have been discussed and evaluated herewith. Using the parameters achieved by the respective equations mentioned below (13,14,15,16) each for different customized activation functions, defining the layered network and the network is trained, validated, tested and executed for 5 epochs.
3.2.1 Particle Swarm Optimization with LSTM
This optimization method commonly recognized as PSO  is an adaptive computational optimization technique that inspired from the psychological behavior of the birds flocking together. The objective function of PSO is formulated mathematically as in Eq. (13):
3.2.2 Cuckoo Optimization with LSTM
This optimization technique is derived from one of the bird’s species known as Cuckoo following their strategy of laying eggs in the nests of different other bird species . The objective function of cuckoo search is expressed mathematically as in Eq. (14):
3.2.3 Firefly Algorithm Optimization with LSTM
Firefly Algorithm (FA) proposed here is based on the behavior of fireflies employing flashing signals to interact with light emitted by another brighter partner moving towards it. The attraction is commensurate to the brightness, i.e., both increase as their distance decreases [24,25]. Following this principle, the objective function of FA is mathematically formulated as in Eq. (15):
3.2.4 Ant Colony Optimization with LSTM
The behavior of ants moving randomly by laying their pheromone and searing for an optimal path commonly recognized as Ant Colony Optimization. The advantage of ACO is that it shows prior success in evolving general RNNs for time series data prediction . The equation for ACO utilized by the activation functions is formulated mathematically in Eq. (16):
4 Simulation Results and Discussions
In this section, two fault conditional cases such as the healthy tooth and broken tooth have been considered with the gearbox fault diagnosis dataset  for classification. The upcoming subsections describe about the dataset details and the proposed hybrid method performance.
4.1 Dataset Description
This open source gearbox fault diagnosis dataset comprises of the vibration dataset recorded by using Spectra Quest’s Gearbox Fault Diagnostics Simulator. Dataset has been recorded with the help of 4 vibration sensors placed in four different directions. Additionally, the dataset has been observed under variation of loads at a frequency of 30 Hz from ‘0’ to ‘90’ percent with two different scenarios: 1) Healthy condition and 2) Broken Tooth Condition.
4.2 Performance Metrics
Initially, the gearbox fault diagnosis dataset is converted to time series data to perform LSTM classification. The evaluation has been performed considering restricted data available in the dataset. To classify the two health conditions of the gearbox data, 70% samples have been employed to train the proposed hybrid network and the rest have been used for testing. That is for this work; Training and Testing dataset are of the order samples containing and ( samples that have been used as an input to the neural network. The learning rate is 1 and the iteration number or number of epochs is 5.
The proposed hybrid LSTM network has been trained on different conventional activation functions sigmoid, hyperbolic tangent (tanh) and ReLU. Additionally, this training has also been extended to other proposed customized activation functions such as sigmoid PSO, sigmoid Cuckoo, sigmoid FA, and sigmoid ACO; tanh PSO, tanh Cuckoo, tanh FA, and tanh ACO; ReLU PSO, ReLU Cuckoo, ReLU FA and ReLU ACO. The results have been evaluated for load, weight and bias values on a gearbox vibration data acquired at different loads 0, 10, 20, 30, 40, 50, 60 70, 80 and 90. The accuracy of each has been computed and best of the results were obtained taking into consideration every state of load. Especially, the best results were achieved with load 10 and 40. The results observed have been tabulated in Tab. 1 and are also depicted in Fig. 3.
Further, the performance of the gearbox classification is examined by using various performance parameters that presents the predicted and expected/actual classifications. The result of classifying is predicted into two categories such as healthy and broken tooth conditions.
Few of the performance parameters considered here like Accuracy, true positive rate (Sensitivity), false positive rate (Specificity), Precision, and F-Score. A higher value of ‘True Positive’ detection is enviable for vigorous gearbox classification. Accuracy is formulated as the percentage of the number of faults classified correctly versus total faults as in Eq. (17). TPR or sensitivity is defined as the ratio of correctly predicted faults to the size of the actual faults and is formulated in Eq. (18). Precision is defined as the ratio of correctly predicted faults to the predicted size of the faults and is formulated as in Eq. (20). Consequently, the classification accuracy is estimated using the following formulae:
where True Positives (TP) is the number of faults classified as faults, True Negatives (TN) is the number of normal classified as normal, False Positives (FP) is the number of normal classified as faults and False Negatives (FN) is the number of faults classified as normal. Tab. 2 describes the values for various performance metrics such as the sensitivity, specificity, precision, recall, F-score including the execution time for all customized activation functions.
4.3 Comparison Results
The proposed hybrid LSTM network is compared with conventional LSTM network with parameters such as bias, weight, execution time, accuracy, precision and recall. Tab. 3 describes the values of the LSTM network model for conventional LSTM parameters such as the sigmoid activation functions, tan-h activation function without ReLU activation function, weight, bias and accuracy.
The comparison results for accuracy measurement for proposed hybrid network and conventional network are also illustrated in Fig. 4. From the graph it is evident that the proposed hybrid LSTM network shows better accuracy values of 87.5% for sigmoid PSO for load 10 and ReLU-Cuckoo for load 40. Consequently, the second highest value of 75% is identified with the customized activation functions like tanh-Firefly, tanh-Cuckoo, tanh-PSO, and tanh-ACO, Sigmoid-Cuckoo, Sigmoid-ACO, Sigmoid-Firefly, ReLU-Firefly, ReLU-Cuckoo, ReLU-ACO and ReLU-PSO. Thus, from the graph it is evident that the recommended hybrid LSTM network model shows better performance results when compared to the conventional LSTM network.
In this study, the proposed hybrid LSTM network model along with different swarm intelligence algorithms has been evaluated for the fault diagnosis of the gearbox. In order, to address the challenges of over-fitting and enhancing the performance of conventional LSTM with a tiny training set, swarm intelligence optimization algorithms such as PSO, Cuckoo, Firefly and ACO along with ReLU activation function have been considered. From the evaluated results highest accuracy of 87.5% has been achieved with both Sigmoid-PSO and ReLU-Cuckoo customized activation functions. The results highlight that the proposed method would achieve higher accuracy in condition monitoring of gears for fault diagnosis. Comparative studies have also indicated that results of hybridization optimized with swarm intelligence are superior to the conventional LSTM model.
Acknowledgement: The authors would like to acknowledge the Artificial Intelligence and Data Analytics (AIDA) Laboratory, CCIS, Prince Sultan University, Riyadh, Saudi Arabia for the support.
Conflict of Interest: The authors declare that they have no conflicts of interest to report regarding the present study.
Funding Statement: The authors received no specific funding for this study.
|This work is licensed under a Creative Commons Attribution 4.0 International License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.