|Computer Modeling in Engineering & Sciences|
COVID-19 Detection via a 6-Layer Deep Convolutional Neural Network
School of Computer Science and Technology, Henan Polytechnic University, Jiaozuo, 454000, China
*Corresponding Author: Ji Han. Email: HanJi@home.hpu.edu.cn
Received: 12 March 2021; Accepted: 05 August 2021
Abstract: Many people around the world have lost their lives due to COVID-19. The symptoms of most COVID-19 patients are fever, tiredness and dry cough, and the disease can easily spread to those around them. If the infected people can be detected early, this will help local authorities control the speed of the virus, and the infected can also be treated in time. We proposed a six-layer convolutional neural network combined with max pooling, batch normalization and Adam algorithm to improve the detection effect of COVID-19 patients. In the 10-fold cross-validation methods, our method is superior to several state-of-the-art methods. In addition, we use Grad-CAM technology to realize heat map visualization to observe the process of model training and detection.
Keywords: COVID-19; deep learning; convolutional neural network; max pooling; batch normalization; Adam; Grad-CAM
COVID-19 is a disease that easily spreads among people. It originated from the severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) . The spread of this disease includes human-to-human contact, or contact with polluted air, as well as respiratory droplets and feces . Therefore, the authorities have adopted a series of measures, including wearing masks in public places, quarantining people entering the country from abroad, and reminding people of the country not to travel to high-risk areas .
Early detection of COVID-19 helps various departments to take preventive and control measures in advance to protect the safety of local residents. The most commonly used testing methods are real-time reverse transcription polymerase chain reaction (RT-PCR) and point-of-care (POC) methods . POC can be used to detect genes encoding viral proteins in respiratory samples, and this method test takes less than an hour to get the result. POC can be used to detect genes encoding viral proteins in respiratory samples. This method only takes a few minutes to get the test results. However, the sensitivity of these two methods may not be sufficient to detect early infections caused by low virus concentrations.
Traditional artificial intelligence (AI) methods  may not work well on handling complicated image processing tasks . Now, many researchers use deep learning (DL) methods to optimize the detection of certain diseases based on medical images. For example, Guo et al.  employed ResNet-18 to detect Thyroid Ultrasound Standard Plane images. Wu  chose to combine wavelet Renyi entropy with their proposed three-segment biogeography-based optimization. Ni et al.  proposed a deep learning approach (DPA) for COVID-19 detection. Wang  combined graph convolutional network (GCN) with convolutional neural network (CNN) using deep feature fusion method. Wang  proposed a novel CCSH network to detect COVID-19. There are many other successful applications of deep learning cases [12–14], which all prove the powerfulness of DL.
In addition, more and more researchers also use transfer learning. Transfer learning is suitable for situations where a large number of source data features in the training model are similar to a small number of target data features in the detection model, so it is not suitable for our experiments. In this paper, we collected CT images of COVID-19 and proposed a 6-layer convolutional neural network method to detect COVID-19. Max pooling proved to perform better than other traditional pooling methods. Batch normalization effectively improves the training speed of convolutional neural networks. Adam algorithm is better than other algorithms in terms of model training effect.
The remaining chapters of this paper are as follows. Section 2 introduces the collection of datasets and the characteristics of the datasets. Section 3 describes the various modules of the convolutional neural network. Section 4 introduces the model we built and analyzes the experimental results. In the last section, we made a summary of our experiments and results.
The image dataset was from . In the experiment, Philips Ingenuity 64-line spiral CT machines were used to collect lung pictures. During the CT scan, keep the patient supine and breathe deeply back, which helps scan from the lung tip to the rib diaphragm angle.
The image slices we collected came from 142 COVID-19 patients and 142 healthy people. From the CT images of each subject, 1–4 slices were selected as experimental data, and the resolution rate of all images was 1,024 × 1,024. Table 1 shows the characteristic data of the collected objects. In Fig. 1, we can find that the lung biopsy samples of COVID-19 patients have obvious white lesions.
Convolutional neural network is a classifier, this method can identify normal images and abnormal images from medical images . A classic neural network consists of an input layer, a convolutional layer, a pooling layer, a fully connected layer and an output layer [17,18]. The convolutional layer and the pooling layer are used to extract image features, and the fully connected layer is used for image classification. Fig. 2 shows the flowchart of CNN.
3.1 Convolution Layer
In the convolutional layer, the input image data and the kernel are convolved to output the feature map. The operation of the convolutional layer contains three hyperparameters , which are kernel size, filter depth, and stride. The kernel size represents the pixel size of the convolution filter. The filter depth controls the number of output feature maps, representing the number of filters in the convolutional layer. The stride determines how many pixels the filter will skip in each convolution . The case of convolution operation is shown in Fig. 3. And the convolution operation is as follows:
where represents the output of the th feature map of the th convolutional layer, is the input feature subset, is the convolution core matrix, b refers to the offset value of the feature map, and f is the activation function.
3.2 ReLU Function
The convolutional layer will be followed by an activation function, such as Sigmoid and Relu, their activation curve is shown in Fig. 4. Sigmoid is a traditional non-linear activation function, and its output is bounded, and the output value ranges from 0 to 1 . The activation formula is as follows:
Since the Sigmoid function encounters an input value that is too large or too small, its curve slope will tend to zero, which is likely to cause the gradient descent of the neural network. However, some of the output values of the ReLU function are 0, which avoids the problems existing in Sigmoid, which can reduce overfitting and solve the gradient descent problem. Therefore, compared with Sigmoid, the ReLU function can speed up model training. The calculation formula of Relu is as follows:
3.3 Batch Normalization
In the experiment, we used batch normalization technology  to solve the problem of internal covariate shift. This technology ensures that the data set distribution after convolution is more uniform, thereby increasing the learning rate of the training model and speeding up the training process . The calculation steps for batch normalization are as follow. First, calculate the average of the minibatch based on the input value .
Second, calculate the variance of the minibatch
Third, in order to prevent abnormal operations, we added a constant to the denominator
Finally, multiply by the scale and add the shift .
The pooling layer is used to reduce the dimensionality of the feature vector output after the convolution operation, which can prevent overfitting. The most common pooling methods are max pooling (Mp), average pooling (Ap) and norm pooling () . Mp calculates the max value of the pooling area, and Ap outputs the average value of the pooling area. The value obtained by is the arithmetic square root of the sum of the squares of the elements in the pooling area. The three pooling operations are shown in Fig. 5.
Suppose the pooling area is G, and the dataset to be activated in G is D. The definition of D is as follows:
The Mp was defined as
The Ap was defined as
The was defined as
3.5 Fully Connected Layer and Softmax
Fully connected layer (FCL)  is used to classify feature images after pooling . And the neurons in the fully connected layer are fully connected to the neurons in the adjacent layer. The flowchart of the FCL is shown in Fig. 6. The calculation formula of the FCL is as follows:
where represents the value input to the FCL, and and are the weight matrix and bias respectively . is the output of the FCL.
When the FCL is used for linear feature extraction, an activation function will follow. The most commonly used is the softmax activation function . Its calculation formula is as follows:
where is the input value and represents the th cluster. is the conditional probability of belonging to the th cluster, and is the prior probability of the cluster. The result of is between 0 and 1.
3.6 Training Algorithms
In the experiment, for the complexity of the deep learning training model, we chose a suitable optimization algorithm to optimize the model. Adam (Adaptive momentum) [29,30] is a gradient descent optimization technique that calculates the learning rate of each step by controlling the first and second moments of the gradient. And it can also correct the deviation and keep the parameters stable . The formula for Adam is as follows:
where represents the calculated gradient, is the first moment of the gradient , and represents the second moment of the gradient . represents the first moment attenuation coefficient, and represents the second moment attenuation coefficient. represents the parameter to be updated, and and are the offset correction of and , respectively.
3.7 Cross Validation
In the experiment, we need to train and test the data set to verify the detection effect of the model. In order to analyze the performance of the constructed model, we adopted the cross-validation technology, which is a widely used method for optimizing and evaluating model performance [32,33].
We chose the K-fold cross-validation method to divide the collected data set into K equal subsets. K-1 equal subsets are trained in the experiment, and the one that is not trained is used for testing. This process is iterated k times, and each subset will be used for testing. In this paper, we used 10-fold cross validation, which has very little error in evaluating model performance. The operation of 10-fold cross validation is shown in Fig. 7.
3.8 Measures and Heatmap
In order to evaluate the performance of the built CNN model in training and testing the data set in the experiment, we selected some ideal indicators, including Sensitivity (), Specificity (), Precision (), Accuracy (), F1-Score (), Matthews correlation coefficient (), Fowlkes Mallows index () . The calculation formulas for these indicators are as follows:
where , , , and represent true positive (TP), true negative (TN), false positive (FP), and false negative (FN), respectively. and indicate the correct classification of COVID-19 patients and healthy people. and indicate the misclassification of COVID-19 patients and healthy people, respectively.
In the deep learning model, the entire training process cannot be visualized intuitively, so it is easy for radiologists to be confused whether the model can accurately detect abnormal areas in the CT image. We applied Grad-CAM technology to our model so that image features can be colored to easily distinguish between normal and abnormal regions in CT images . Grad-CAM technology helps the model to accurately focus on key areas.
4 Experiment Results and Discussions
4.1 Structure of Proposed CNN
In the paper, we built a six-layer CNN. The architecture of the CNN is shown in Fig. 8. This CNN includes three convolutional layers, three max pooling layers and three FCLs. The parameters in the activation map are marked on each layer.
4.2 Statistical Results
Table 2 shows the 7 evaluation index results of the 6-layer CNN we built under 10-fold cross-validation. The results of Sensitivity, Specificity, Precision, Accuracy, F1-Score, Matthews correlation coefficient, and Fowlkes Mallows index are 90.97%, 89.58%, 89.51%, 89.52%, 89.58%, 79.07%, and 89.59, respectively.
4.3 Pooling Comparison
In the experiment, we applied the Mp layer to the six-layer CNN model, and compared with layer and Ap layer. Table 3 shows the comparison of the results of the 3 pooling methods based on 7 indicators under the 10-fold cross validation. In Fig. 9, we can see the performance comparison of different pooling more intuitively, and the results show that using the Mp method can obtain better results than the other two pooling methods.
4.4 Training Algorithm Comparison
In the experiment, we used Adam algorithm to optimize the 6-layer convolutional neural network and compared it with the SGDM and RMSProp optimization algorithms. SGDM is based on first-order momentum to reduce the oscillation in the best direction along the steepest path during the gradient descent process. RMSProp is an adaptive learning rate method that normalizes the gradient by using the exponential moving average of the gradient magnitude of each parameter. The experimental results are shown in Table 4 and Fig. 10. From the perspective of sensitivity (), the results of SGDM and RMSProp algorithms are 83.99% and 88.29%, respectively, but ADAM algorithm reaches 90.97%, so ADAM algorithm performs better than SGDM and RMSProp algorithms. At the same time, the result of ADAM algorithm based on index Matthews correlation coefficient () is 79.17%, which is obviously better than the results 65.98% and 72.92% obtained by SGDM and RMSProp algorithms. Therefore, the performance of the ADAM algorithm in the model we built is significantly better than the other two algorithms.
4.5 Comparison of Different Number of Conv Layers
When using CNN to detect CT images, increasing the number of convolutional layers is beneficial to improve the detection effect. But this does not mean that the more layers of convolutional layers, the better the result of the CNN. In order to select an appropriate number of convolutional layers, we compared the performance of convolutional neural networks with different convolutional layers in our experiments. The experimental results are shown in Table 5. We found that when the number of convolutional layers increased from 1 to 3, the performance became better and better, but when the number of layers continued to increase, the effect began to decrease. Therefore, our model works best when the number of convolutional layers is 3. At the same time, we made a clearer comparison in Fig. 11.
4.6 Comparison of Different Number of FCLs
Most convolutional neural network models contain 2 fully connected layers, which can already achieve good results. But in our experiment, comparing the performance of models containing different numbers of FCLs, the experimental results are shown in Table 6, and Fig. 12 clearly shows their performance differences under various indicators. We found that when the number of FCLs in the model is 3, the performance is best.
4.7 Comparison to State-of-the-Art Approaches
In the experiment, we compared our proposed method with several advanced methods, including ResNet-18 , WRE+ 3SBBO , and DPA . Guo et al. . proposed an 18-layer CNN model ResNet for image classification. Wu . proposed a method based on a feedforward neural network and combining wavelet Renyi entropy and a proposed three-segment biogeography-based optimization (3SBBO) algorithm to detect COVID-19. Ni et al. . used a deep learning method to accurately identify and quantitatively evaluate chest CT image features of patients with COVID-19. The comparison results based on 7 indicators are shown in Table 7, and the difference in the comparison results is clearly shown in Fig. 13. Although ResNet-18  performs a little better on indicators and than our method, the results of our proposed method are better than it under the other five indicators. The result of DPA  under indicator is superior than our proposed method, but the performance under the other six indicators is not as good as our proposed method. Therefore, our method performs best in a combination of 7 indicators.
Fig. 14 shows the heatmap effect produced by using Grad-CAM technology to manipulate the image. Images b and d are heatmaps of the lung CT images of COVID-19 patients and healthy people, respectively. In Fig. 14b, the lung lesion area with COVID-19 is marked in red, while the healthy lung in Fig. 14d is not marked. It is found that the heatmap can clearly and accurately visualize the model detection area, which is beneficial to the guarantee of the model training process.
In this paper, we proposed a 6-layer CNN for the detection of COVID-19 and combined the Mp, batch normalization and Adam optimization algorithms. The effect of our proposed method is better than other state-of-the-art methods. The accuracy (E4) of our method reached 89.52%. Grad-CAM technology makes our models to be displayed more intuitively.
However, there is also a flaw in our research that the dataset is not very large, which will have little impact on the effect of model training. So, in future research, we will collect more data to ensure the adequacy of our proposed method in the training process. At the same time, we will build a more superior model based on DL methods to improve the result of COVID-19 detection. We will also share our methods so that other researchers can conduct research on our basis and accelerate the research speed of COVID-19 detection.
Funding Statement: The authors received no specific funding for this study.
Conflicts of Interest: The authors declare that they have no conflicts of interest to report regarding the present study.
|This work is licensed under a Creative Commons Attribution 4.0 International License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.|