Deep convolutional neural network (DCNN) requires a lot of data for training, but there has always been data vacuum in agriculture, making it difficult to label all existing data accurately. Therefore, a lightweight tomato leaf disease identification network supported by Variational auto-Encoder (VAE) is proposed to improve the accuracy of crop leaf disease identification. In the lightweight network, multi-scale convolution can expand the network width, enrich the extracted features, and reduce model parameters such as deep separable convolution. VAE makes full use of a large amount of unlabeled data to achieve unsupervised learning, and then uses labeled data for supervised disease identification. However, in the actual model deployment and production environment, VAE doesn’t require additional calculation and storage consumption, because it is not used in the calculation of the application phase. Compared with the classification network that only uses labeled data, the generalization effect and identification accuracy of this proposed method are enhanced. Especially in the case of fewer labeled samples, the identification accuracy has increased from 56.13% to 78.03%, and in the case of many labeled samples, the identification accuracy also shows a rise. We have fully confirmed the effectiveness of the lightweight network and VAE enhancement strategy: the correct detection rate of disease category by this method is 94.17%, and only 0.42% of the diseased leaves are misidentified as healthy leaves; the correct detection rate of healthy leaves is 98.27%, and only 1.73% of healthy leaves are misidentified as diseased leaves.
According to Food and Agriculture Organization of the United Nations (FAO), pests and diseases can cause $70 to $90 billion annual losses worldwide. China is no exception. In 2018, crop diseases affected an area of about 100 million
Precision agriculture is an effective way to achieve sustainable development of agriculture with high quality, high yield, low consumption and environmental protection. Disease diagnosis is an important part of precision agriculture. In recent years, neural network technology has been widely used in classification and identification [
In view of the above problems, we propose a method to improve the accuracy of tomato leaf disease identification by applying the lightweight network and VAE to the detection network. Multi-scale convolution is used to expand the network width, which makes the extracted features more abundant; deep separable convolution is used to reduce model parameters to meet the needs of low-cost terminals. In the identification network, VAE makes full use of a large amount of unlabeled data to realize unsupervised learning, and then uses labeled data to perform supervised disease identification. In the detection network, the training results of feature extraction of the identification network are used as initial parameters of the “backbone” network for detection and segmentation training. In the detection and identification of tomato leaf diseases, both labeled and unlabeled data are fully utilized to improve identification accuracy. In this paper, tomato leaf disease identification is used as an example, and we hope the technology can be used in identifying similar crop leaf diseases.
PlantVillage is an internet image library of plant leaf diseases initiated and established by epidemiologist David [
We collected images of real tomato leaves for training and testing to verify the effectiveness of the method. The images include 186 diseased leaves and 463 healthy leaves. There are 5 diseases in the leaf images. As shown in the first row of
In order to reduce computation, the size of the 10 tomato images in the PlantVillage dataset is normalized to 128*128 pixels. Then 10% of the images are randomly divided into verification sets, and the remaining samples of each category are divided into five groups according to the proportion respectively, to train the model in different situations and evaluate the performance of the model. The proportions of the training set-validation set are 10%–90%, 30%–70%, 50%–50%, 70%–30% and 90%–10% respectively, there are five cases of simulated training set with little, less, half, more and many to verify the effectiveness of the method under different conditions. The sample amount of each set is shown in
Serial No. | Validation set-10% | Others-90% (Numbers:16344) | |||
---|---|---|---|---|---|
Numbers | Training set | Test set | |||
Ratio | Numbers | Ratio | Numbers | ||
1 | 1816 | 10% | 1635 | 90% | 14709 |
2 | 1816 | 30% | 4904 | 70% | 11440 |
3 | 1816 | 50% | 8173 | 50% | 8171 |
4 | 1816 | 70% | 11441 | 30% | 4903 |
5 | 1816 | 90% | 14710 | 10% | 1634 |
Convolutional neural network (CNN), a feedforward neural network with deep structure, has become the first solution of image classification. Common CNNs include AlexNet [
The lightweight neural network is mainly composed of 5 stages and 4 reduction layers, include Stage1-5, Reduction1-4, Max-pooling, FC, Dropout, FC-10 and Softmax. Stage1 consists of three 3*3 convolution stages, and the stride of the first convolution is 2. Stage2 and Stage3 are composed of two module1 connected in series respectively. Stage4 and Stage5 include two module2 connected in series. Reduction1-4 are reduction modules, which are used to reduce the image size and expand channels in place of common pooling operations. Reduction module uses group convolution and channel shuffle instead of standard convolution operations. Finally, through the fully connected layer (FC), the Dropout [
Group convolution is an effective sparse connection method, which can divide the input feature map into different groups along the channel dimension, and then perform convolution operations on different groups respectively. MobileNet [
As the number of network layers increases, the receptive field becomes larger, the features become abstract, the number of channels increases, and the number of convolution kernels increases. Therefore, the use of large convolution kernel will inevitably bring more parameters. So, in the deeper layer of the network, large convolution kernel is removed to reduce parameters. In addition, factorizing convolution [
Group convolution is troubled by “poor information flow”, and thus ShuffleNet [
Variational auto-Encoder (VAE) [
Compared with other commonly used classifications, crop disease identification is more professional that requires more experience. In actual research, some disease images are accurately labeled whereas most disease images are not. For this reason, VAE is used to improve the classification accuracy of tomato leaf disease identification model based on deep neural network. VAE includes two steps. The first step is to train the VAE Network to obtain the Encoder Network parameters, and the second step is to train the Classification Network to realize the classification function. The network structure is shown in
Lightweight tomato leaf disease identification network shown in
Layer | Stage1 | Max-Pooling | Reduction1 | Stage2 | Reduction2 | Stage3 |
---|---|---|---|---|---|---|
256*256*3 | 128*128*64 | 64*64*64 | 32*32*128 | 32*32*128 | 16*16*256 | |
128*128*64 | 64*64*64 | 32*32*128 | 32*32*128 | 16*16*256 | 16*16*256 | |
16*16*256 | 8*8*512 | 8*8*512 | 4*4*1024 | 4*4*1024 | 4*4*1024 | |
8*8*512 | 8*8*512 | 4*4*1024 | 4*4*1024 | 256 | 256 |
Decoder Network includes FC-4096, UpsampleX(X=1,2,…,6), ScaleX(X=1,2,…,6) and Conv3-3 structures, the input is a latent vector of length 256, and the output is the reconstructed image with size of 128*128*3, as shown in
In Decoder Network, FC-4096 changes the length of latent vector from 256 to 4096 through the fully connected layer, and then changes the shape to 2*2*1024. UpsampleX uses a 3*3 convolution kernel to perform the expansion convolution, so as to realize the expansion of the input size and the transformation of channel number. ScaleX is a “building block” in Resnet structure consisting of two 3*3 convolution and shortcut, with the same input and output dimensions. Conv3-3 uses a 3*3 convolution kernel to extract the features, reducing the channel from 32 to 3.
Layer | FC-4096 | Upsample1 | Scale1 | Upsample2 | Scale2 | Upsample3 |
---|---|---|---|---|---|---|
256 | 2*2*1024 | 4*4*1024 | 4*4*1024 | 8*8*512 | 8*8*512 | |
2*2*1024 | 4*4*1024 | 4*4*1024 | 8*8*512 | 8*8*512 | 16*16*256 | |
32*32*128 | 32*32*128 | 64*64*64 | 64*64*64 | 128*128*32 | 128*128*32 | |
32*32*128 | 64*64*64 | 64*64*64 | 128*128*32 | 128*128*32 | 128*128*3 |
Classification Network is divided into Encoder Network and Class Network. Encoder network is part of the VAE network. Class Network mainly maps the features learned by Encoder Network to class labels, which are composed of Dropout, FC-10 and Softmax. During the training, Dropout randomly eliminates some neurons with a certain probability, so that the corresponding parameters are not updated in the process of back propagation. The FC-10 layer changes the size of the output feature vector to 10 through fully connection, which corresponds to the category of tomato leaf disease identification task. The Softmax layer maps the output of multiple neurons to the (0,1) interval which can be understood as a probability for multiple classification.
In this paper, we propose four methods for comparison, namely “Class”, “AE-Class”, “Class-z” and “VAE-Class-z”. “Class” represents “Classification Network” in red box shown in
There are two common solutions for target detection tasks. One is two-stage target detection, and the other is one-stage target detection. In two-stage target detection, the target is recognized through a neural network before classification, whereas in one-stage target detection, the network is used directly to detect the target. Two-stage target detection is easy to implement, but the downstream classification depends on the performance of the upstream identification and positioning. However, although one-stage target detection does not need to identify the target first, it makes end-to-end target detection more difficult to achieve. In summary, the two-stage method has higher accuracy but lower speed compared to the one-stage method. When detecting tomato diseases, the speed of the two-stage method can meet the requirements of higher precision, such as Faster R-CNN [
The identification accuracy of the detection model is improved based on deep neural network. It is implemented by following two steps. Firstly, lightweight tomato leaf disease identification network based on VAE (VAE-Class-z) is trained and parameters of Encoder Network are obtained. Secondly, the trained Encoder Network is used as the “backbone” network of Mask R-CNN to train the model with the segmentation data of diseased leaves. In actual use, only the Mask R-CNN network is involved in the test phase. Therefore, no additional calculation and storage consumption is introduced in the actual model deployment and production environment.
The experimental configuration environment of this paper is as follows: Ubuntu16.04 LST 64-bit system, processor Intel Core i5-8400(2.80 GHz), memory is 8 GB, graphics card is GeForce GTX1060 (6G), using Tensorflow-GPU1.4 deep learning framework, using Python programming language. The same training parameters are used in the experiment. For example, the size of the generated latent vector is 256, the epoch is 20, and the Adam optimizer is used to solve the minimum loss.
Models | Accuracy(%) | Fps(Images/sec) | Model loading time(s) |
---|---|---|---|
VGG16 [ |
94.65 | 73 | 1.81 |
VGG19 [ |
95.19 | 64 | 2.15 |
ResNet-34 [ |
97.43 | 233 | 0.48 |
ResNet-50 [ |
96.95 | 119 | 0.85 |
Inception-ResNet V2 [ |
98.24 | 115 | 1.64 |
MobileNet-V1 [ |
96.52 | 291 | 0.59 |
MobileNet-V2 [ |
95.14 | 229 | 0.74 |
Proposed | 98.42 | 278 | 1.05 |
The improved convolutional neural network is compared with several advanced convolutional neural networks, including VGG16/19, ResNet-34/50, Inception-ResNet-V2, MobileNet-V1/V2 in diagnosing and identifying tomato diseased leaves.
It can be seen from
In order to verify the usability of the identification model in different senarios, according to the proportion of training sets and validation sets, the dataset is divided into 5 groups. As shown in
In
As shown in
After expansion, 558 disease images and 463 healthy images are obtained and divided into training set and test set according to the proportion of 7:3. 167 diseased leaf images and 139 healthy leaf images are detected using the Mask R-CNN framework. A total of 818 tomato leaves are detected in 306 images totally, including 240 diseased leaves and 578 healthy leaves. There are 14 identification errors in 240 diseased leaves, and the error identification rate is 5.83%. Only one leaf is identified as healthy leaf, and another 13 diseased leaves are identified as other diseases. Only 0.42% of the diseases are misidentified as healthy leaves, 5.42% of the diseases are mistakenly identified as other diseases. Among 578 healthy leaves, 10 are identified as diseased leaves with an error rate of 1.73%. In conclusion, the correct identification rate of diseased leaves is 94.17%, and only 0.42% of diseased leaves are incorrectly identified as healthy leaves. The correct identification rate of healthy leaves is 98.27%, and only 1.73% of healthy leaves are misidentified as diseased leaves.
We find that the dataset is large, but the amount of annotation is relatively small, and thus how to use these unlabeled disease data is a question worth of research. To this end, we propose a lightweight tomato leaf disease identification network supported by VAE enhancement method to improve the identification and detection accuracy of crop leaf diseases. Multi-scale convolution is used to expand the width of the network to make the extracted features more abundant, and deep separable convolution is used to reduce the model parameters, and the lightweight model is applied to the identification network and detection network. We hope our study can be extended to similar application scenarios for crop disease identification.
In the case of fewer labeled samples, the identification accuracy is improved from 56.13% to 78.03%, in the case of more labeled, the identification accuracy has also been improved. The detection results show that the correct identification rate of the disease species is 94.17%, and only 0.42% of the diseased leaves are misidentified as healthy leaves. The correct identification rate of healthy leaves is 98.27%, and only 1.73% of healthy leaves are misidentified as diseased leaves. According to the analysis, the subsequent detection errors can be screened through the confidence threshold and the proportion of the error leaves, thereby further increasing the accuracy of disease identification. The results also show that VAE can enhance the identification and detection of tomato leaf diseases based on proposed lightweight network by making full use of the unlabeled data to overcome the difficulty of labeling. In the future, we will continue to collect more sample images of crop diseases, and use deep convolutional neural network to develop a complete crop disease identification system for agricultural.