To solve the problem of difficulty in identifying apple diseases in the natural environment and the low application rate of deep learning recognition networks, a lightweight ResNet (LW-ResNet) model for apple disease recognition is proposed. Based on the deep residual network (ResNet18), the multi-scale feature extraction layer is constructed by group convolution to realize the compression model and improve the extraction ability of different sizes of lesion features. By improving the identity mapping structure to reduce information loss. By introducing the efficient channel attention module (ECANet) to suppress noise from a complex background. The experimental results show that the average precision, recall and F1-score of the LW-ResNet on the test set are 97.80%, 97.92% and 97.85%, respectively. The parameter memory is 2.32 MB, which is 94% less than that of ResNet18. Compared with the classic lightweight networks SqueezeNet and MobileNetV2, LW-ResNet has obvious advantages in recognition performance, speed, parameter memory requirement and time complexity. The proposed model has the advantages of low computational cost, low storage cost, strong real-time performance, high identification accuracy, and strong practicability, which can meet the needs of real-time identification task of apple leaf disease on resource-constrained devices.
According to statistics, apple production in China exceeded 41 million tons in 2020, and the planting area ranked first in the country's fruit industry. However, diseases are still one of the important factors restricting the development of the apple industry in China. Apple diseases can be divided into branch and stem diseases, fruit diseases, leaf diseases, and root diseases, among others. Among them, leaf diseases are widespread in apple production areas and are seriously harmful [
Traditional crop disease identification mainly relies on manual observation and empirical judgment, and there are problems such as the inaccurate identification of disease types and low efficiency. To improve the accuracy and efficiency of disease recognition, researchers use image processing, machine learning and other methods to detect crop diseases. They use image processing technology to obtain some specific disease features, then use a support vector machine (SVM) [
The convolutional neural network (CNN) has become the mainstream algorithm to solve image classification problems because of its powerful feature extraction ability, and it has achieved many excellent results [
At present, smart mobile terminals and embedded devices are widely used in the field of agricultural disease recognition. It is the future development trend to deploy disease recognition models to these devices instead of cloud servers [
In summary, this paper proposes a lightweight model for apple leaf disease recognition. The main contributions are as follows:
A 7-category apple leaf dataset with a complex background is constructed to ensure the robustness of the proposed model in practical applications. Aiming at the characteristics of different sizes of apple disease images and complex backgrounds, an efficient lightweight model LW-ResNet is proposed, which greatly reduces the number of model parameters and computational complexity while improving the recognition performance of the model. The method in this paper can meet the needs of apple disease identification on resource-constrained embedded devices.
The rest of this paper is organized as follows:
The deep learning methods that have emerged in recent years have extremely strong data expression capabilities for images [
Tang et al. [
Zhong et al. [
In this section, ResNet18 is improved based on the characteristics of apple disease images, and an efficient lightweight ResNet (LW-ResNet) is proposed to accurately identify of apple disease in the actual production process.
He et al. [
Residual networks have achieved many results in the field of agricultural image recognition, with high recognition accuracy [
Starting from Stage2 of ResNet18, after each stage, the size of the feature map is reduced to half of the original size, and the number of channels is doubled. After 4 stages, the feature map with the shape of 7 × 7 × 512 is transferred to the final Avgpool and FC. The specific network architecture and internal parameters are shown in
Number of layers | Architecture | Output size | Output channels |
---|---|---|---|
Conv1 | 7 × 7 Conv, |
112 × 112 | 64 |
Maxpool | 3 × 3 Maxpool, |
56 × 56 | 64 |
Stage1 | 56 × 56 | 64 | |
Stage2 | 28 × 28 | 128 | |
Stage3 | 14 × 14 | 256 | |
Stage4 | 7 × 7 | 512 | |
Avgpool, FC | 7 × 7 Avgpool, |
1 × 1 | 1000 |
Due to the single size of the receptive field and the insufficient ability to shield interference information, it is difficult for ResNet18 to complete the task of identifying apple diseases in a complex environment. The purpose of this paper is to improve the application ability of the model in the production environment. From the perspectives of portability, economy, and practicality, embedded devices or mobile devices are more suitable for agricultural disease identification than high-performance computers and servers [
The aim involves the characteristics of apple diseases in natural scenarios and the application requirements of limited storage space and computing resources of mobile devices. Based on ResNet18, the LW-ResNet model is constructed, and the structure is shown in
As shown in
Number |
Architecture | Output |
Output channels |
---|---|---|---|
Conv1 | 7 × 7Conv, |
112 × 112 | 64 |
Maxpool | 3 × 3 Maxpool, |
56 × 56 | 64 |
Stage1 |
56 × 56 | 64 | |
Stage2 |
28 × 28 | 128 | |
Stage3 |
14 × 14 | 256 | |
Stage4 |
7 × 7 | 512 | |
Avgpool, |
7 × 7 Avgpool, |
1 × 1 | 7 |
Only one residual module is retained in each stage of the network, and multiple receptive field sizes are added to the residual module, which reduces the number of model parameters and calculations and obtains a variety of local features. In this paper, the pooling layer and the convolutional layer are connected in series for identity mapping, which reduces the loss of information and strengthens the expression of the details of the lesions. A lightweight and efficient attention module (ECANet) is introduced between the residual modules to suppress the propagation of the environmental noise generated by the complex background during the model learning process.
One of the difficulties in apple disease identification is that the sizes of the lesions are different, especially the small lesions. Because the richness of the detailed features of lesions is basically proportional to the number of pixels occupied, it is difficult to extract the features of small lesions, which leads to a decrease in the overall recognition accuracy of the model. For ResNet18, there are only convolutional layers with a size of 3 × 3 in the residual module. The receptive field of a single-scale convolution kernel is fixed, and the extracted features are limited, which cannot meet the feature information required to identify lesions of different sizes, especially small lesions, and it is necessary to combine features of multiple scales for judgment. Using traditional convolution kernels to obtain multi-scale features also leads to a significant increase in network parameter requirements, which increases the difficulty of model training and deployment. Therefore, this paper chooses the group convolution with lower parameters and complexity to construct a multi-scale feature extraction layer. The second layer of convolution in the residual module of ResNet18 is replaced with four different scale (1 × 1, 3 × 3, 5 × 5, 7 × 7) group convolutions to extract features of different scales.
The group convolution is shown in
It can be seen from
Taking the residual module in Satge1 of LW-ResNet as an example, the input channel of each convolution (1 × 1, 3 × 3, 5 × 5, 7 × 7) in the multi-scale feature extraction layer is 16, the output channel is 16, and the size of the input and output feature maps is 56 × 56.
The number of parameters of the multi-scale feature extraction layer constructed by traditional convolution is 12 × 16 × 16 + 32 × 16 × 16 + 52 × 16 × 16 + 72 × 16 × 16 = 21504. The number of multi-scale feature extraction layer parameters constructed by group convolution with 16 groups (
The Stage2, Satge3 and Stage4 modules in the original residual network are carried out by a convolution layer with a step size of 2 and convolution kernel size of 1 × 1 for channel elevation and feature graph size reduction to ensure feature matrix summative calculation and realize identity mapping. However, the step size is larger than the size of the convolution kernel, and useful features are missed when performing convolution calculations. At the same time, the average pooling layer emphasizes the down-sampling of the overall information features in the pooling area, ignoring the prominent color and texture features of the lesions, and retaining more useless background features, which is not conducive to the extraction of key features in the apple disease data. The max pooling layer outputs the maximum value in the entire pooling area, which can better retain the local details of the lesions and improve the generalization ability of the model. According to the characteristics of the apple disease image, a max pooling layer reduction feature size map with a step size of 2 and a size of 3 × 3 was selected, followed by a convolutional layer dimension raising channel with a step size of 1 and a convolution kernel size of 1 × 1 to compensate for the information loss caused by the original structure while maintaining the same number of parameters.
Since the data used in this paper are data on apple leaf diseases in a field environment, the background is complex and there are many environmental interference factors. There will be considerable noise in the model recognition process, which will be transmitted in the model learning process. As the number of network layers increases, the weight of the noise information in the feature map will also increase, which will eventually have a certain negative impact on the model. The channel attention mechanism weakens the channels that represent background features and reduces their weight [
One-dimensional convolution is used by ECANet [
The structure of ECANet is shown in
ECANet first uses global average pooling to aggregate the spatial information of each channel of the input feature
Then, a one-dimensional convolution with a convolution kernel size of
Finally, to re-encode each channel feature of
In this chapter, the collected apple leaf images are used as research data, and ablation experiments and comparison experiments of different models are carried out under the same parameters to verify the effectiveness of the improved steps and the proposed model. Finally, the robustness of LW-ResNet is analyzed.
In this paper, images of 6 types of diseased leaves (alternaria leaf spot, powdery mildew, rust, scab, mosaic, anthracnose leaf blight) and healthy leaves of apples are taken as the research objects. The data are taken from the Kaggle competition dataset (
The data distribution is shown in
It can be seen from
As shown in
Class | Number of original samples | Number of enhanced samples | Label |
---|---|---|---|
Alternaria leaf spot | 204 | 1224 | 0 |
Powdery mildew | 216 | 1296 | 1 |
Rust | 262 | 1572 | 2 |
Scab | 226 | 1356 | 3 |
Mosaic | 50 | 1200 | 4 |
Anthracnose leaf blight | 42 | 1344 | 5 |
Healthy | 248 | 1488 | 6 |
Total | 1248 | 9480 | / |
The apple disease recognition model was constructed by the deep learning framework PyTorch. All experiments were carried out in the Window 10 environment. The CPU is an Intel(R) Xeon(R) W-2245, and the GPU is an NVIDIA Quadro RTX graphics card with 16 GB memory.
The adaptive moment estimation algorithm (Adam) [
Parameter | Parameter value |
---|---|
Optimizer | Adam |
Learning rate | 0.001 |
Batch-size | 32 |
Epoch | 50 |
The precision, recall, and F1-score are used by us to comprehensively evaluate the recognition performance of the model. Cost and efficiency are important criteria to measure the practicability of a model or algorithm and are considered by many researchers [
Precision (
The parameter memory requirement is determined by the number of parameters. Under the premise of meeting the task requirements, the smaller the parameter memory is, the lower the hardware requirements of the model and the higher the applicability.
FLOPs are used to measure the number of operations of the model, which refers to the number of floating-point operations performed by the model for complete forward propagation after a single sample is input, that is, the time complexity of the model. The lower the FLOPs, the less calculation required for the model and the shorter the network execution time.
FPS is the number of images that can be processed per second and is used to measure the recognition speed of the model. The larger the FPS is, the faster the model recognition speed.
The running time represents the time from when the model is trained to the completion of the training. The shorter the running time is, the faster the model training speed.
The design scheme of the ablation experiment is shown in
Model | Design of receptive field sizes | Design of identity mapping structure | Design of channel attention mechanism | ||||||
---|---|---|---|---|---|---|---|---|---|
Original | Multi-Conv | Multi-GConv | Original | Avgpool +Conv | Maxpool +Conv | Not applied | SENet | ECANet | |
R0 | ✓ | ✓ | ✓ | ||||||
R1 | ✓ | ✓ | ✓ | ||||||
R2 | ✓ | ✓ | ✓ | ||||||
R3 | ✓ | ✓ | ✓ | ||||||
R4 | ✓ | ✓ | ✓ | ||||||
R5 | ✓ | ✓ | ✓ | ||||||
R6 | ✓ | ✓ | ✓ |
The experimental results of each scheme on the test set are shown in
Model | Average precision/% | Average recall/% | Average F1-score/% | Parameter memory/MB | FLOPs | FPS | Running time/h |
---|---|---|---|---|---|---|---|
R0 | 95.76 | 95.65 | 95.66 | 42.65 | 1.82E+09 | 129.29 | 1.23 |
R1 | 96.69 | 96.65 | 96.65 | 9.21 | 4.83E+08 | 141.14 | 0.99 |
R2 | 96.53 | 96.39 | 96.51 | 2.32 | 2.21E+08 | 315.01 | 0.81 |
R3 | 96.51 | 96.51 | 96.51 | 2.32 | 2.21E+08 | 312.37 | 0.81 |
R4 | 96.82 | 96.93 | 96.92 | 2.32 | 2.21E+08 | 315.79 | 0.80 |
R5 | 97.20 | 97.03 | 97.07 | 3.28 | 2.25E+08 | 265.78 | 0.83 |
R6 | 97.81 | 97.86 | 97.83 | 2.32 | 2.21E+08 | 276.41 | 0.83 |
R5 and R6 use SENet and ECANet, respectively. From the experimental results in
The heatmap shows the contribution degree of different areas in the original image to the output category of the model.
The overall recognition performance of R1--R5 combined with the optimization step is improved compared to R0. However, it can be seen from
In general, the R6 model, which combines the multi-scale feature extraction layer, improves the identity mapping structure, introduces the ECANet module, and pays more attention to the lesion area than the R0--R5 models.
To evaluate the performance of the proposed recognition model, we compared the LW-ResNet model with some representative convolutional neural networks under the same parameter settings, including ResNet18 [
Model | Average precision/% | Average recall/% | Average F1-score/% | Parameter memory/MB | FLOPs | FPS | Running time/h |
---|---|---|---|---|---|---|---|
ResNet18 | 95.76 | 95.65 | 95.66 | 42.65 | 1.82E+09 | 129.29 | 1.23 |
DenseNet121 | 96.27 | 95.72 | 95.86 | 26.55 | 2.86E+09 | 34.23 | 3.66 |
VGG16 | 83.87 | 83.45 | 83.34 | 512.27 | 1.55E+10 | 40.46 | 4.80 |
SqueezeNet | 86.83 | 86.74 | 86.60 | 2.82 | 3.12E+08 | 234.88 | 1.16 |
MobileNetV2 | 94.61 | 94.68 | 94.63 | 8.52 | 7.37E+08 | 158.36 | 1.36 |
ShuffleNetV2 | 94.44 | 94.51 | 94.45 | 4.81 | 1.48E+08 | 161.08 | 0.83 |
GhostNet | 94.45 | 94.44 | 94.40 | 19.77 | 1.49E+08 | 94.69 | 1.30 |
LW-ResNet | 97.81 | 97.86 | 97.83 | 2.32 | 2.21E+08 | 276.41 | 0.83 |
As shown in
It can be seen from
The relationship between the verification accuracy and epoch for each model is shown in
As shown in
Model | Category |
Average |
Average |
Average |
---|---|---|---|---|
Coordination attention EfficientNet (CA-ENet) [ |
8 | 98.95 | 98.92 | 98.80 |
Improved VGG16 [ |
4 | 99.35 | 99.34 | 99.34 |
Improved ResNet50 [ |
6 | 94.75 | 94.23 | 94.49 |
LW-ResNet (Our network) | 7 | 97.81 | 97.86 | 97.83 |
Coordination attention EfficientNet (CA-ENet) was proposed by Peng et al. [
VGG16 was improved by Qian et al. [
Part of the data used by Yan et al. [
As shown in
As shown in
In general, the average accuracy of LW-ResNet on the test set is 97.88%, and the precision, recall, and F1-score of each category of images also reach more than 92.5%. These values shows that the LW-ResNet model has high robustness and strong recognition performance.
In this work, we propose a lightweight and efficient model for apple leaf disease identification. We establish a dataset of apple leaf diseases with a complex background, and enhance the data to improve the generalization ability of the model [
From the ablation experimental results, the improvement steps designed in this paper can improve the recognition accuracy and speed of the model, and reduce the number of parameters and complexity of the model, but there are still problems. Using group convolution to build a multi-scale feature extraction layer can compress the model to a large extent and improve the recognition speed of the model. However, group convolution also hinders the flow of information between feature channels. This hindrance is consistent with the conclusions of studies [
Through the performance comparison of different models and the robustness analysis of the LW-ResNet model, it can be seen that the model proposed in this paper has a good balance of recognition accuracy, speed, model size and other indicators. The performance of the proposed model is better than that of the classic network model, and it has a better recognition effect for each type of diseased leaf sample, but the complexity of the model still has room to decrease. Through the confusion matrix of LW-ResNet, it can be found that the difference between the classes is small, and the difference within the class is still one of the main reasons for the images to be misclassified [
From the above discussion, the limitations of the LW-ResNet model can be summarized as follows: LW-ResNet has insufficient ability to distinguish high and low information features with different spatial locations and insufficient ability to extract fine-grained features. This paper aims to construct an efficient lightweight model suitable for real production links. It has become an important task to improve the recognition ability of the model under the background of high complexity and to construct a dataset of disease images with different stages of disease. In the future, the designed model can be optimized by advanced meta-heuristic algorithms [
To improve the application ability of the model in the actual environment, this paper proposes a lightweight LW-ResNet model. The proposed model makes up for the shortcomings of many parameters, high complexity, and poor generalization ability in the real environment of ResNet18 and realizes the rapid identification of 6 kinds of diseased apple leaves and healthy leaves under a complex background. The average precision, recall, and F1-score of the method in this paper are all over 97%, the parameter memory is only 6% of the original model ResNet18, and FLOPs is reduced by 86%. In addition, LW-ResNet also has obvious advantages in recognition speed and training speed. It can process 276 images per second, and the training time is only 0.83 h. It is proposed that the model has strong robustness and is suitable for application in small embedded devices with limited resources. It can play a positive role in the prevention and control of apple leaf diseases in the agricultural production process.
However, there are still some problems with this method that urgently need to be solved. The future work is as follows:
From experiments, it is found that the ability of the model in this paper to suppress background noise still needs to be improved. Future research will focus on improving the sensitivity of the model to the difference between the background and the target in the spatial domain. Differences in different disease stages and similar disease characteristics often affect the recognition performance of the model. In subsequent research, we will collect apple leaves with different stages of disease by shooting, expanding and refining the dataset and combine the dataset design for the apple disease recognition model with a strong ability to extract fine-grained features.