With daily increasing of suspected COVID-19 cases, the likelihood of the virus mutation increases also causing the appearance of virulent variants having a high level of replication. Automatic diagnosis methods of COVID-19 disease are very important in the medical community. An automatic diagnosis could be performed using machine and deep learning techniques to analyze and classify different lung X-ray images. Many research studies proposed automatic methods for detecting and predicting COVID-19 patients based on their clinical data. In the leak of valid X-ray images for patients with COVID-19 datasets, several researchers proposed to use augmentation techniques to bypass this limitation. However, the obtained results by augmentation techniques are not efficient to be projected for the real world. In this paper, we propose a convolutional neural network (CNN)-based method to analyze and distinguish COVID-19 cases from other pneumonia and normal cases using the transfer learning technique. To help doctors easily interpret the results, a recent visual explanation method called Gradient-weighted Class Activation Mapping (Grad-CAM) is applied for each class. This technique is used in order to highlight the regions of interest on the X-ray image, so that, the model prediction result can be easily interpreted by the doctors. This method allows doctors to focus only on the important parts of the image and evaluate the efficiency of the concerned model. Three selected deep learning models namely VGG16, VGG19, and MobileNet, were used in the experiments with transfer learning technique. To bypass the limitation of the leak of lung X-ray images of patients with COVID-19 disease, we propose to combine several different datasets in order to assemble a new dataset with sufficient real data to accomplish accurately the training step. The best results were obtained using the tuned VGG19 model with 96.97% accuracy, 100% precision, 100% F1-score, and 99% recall.
In December 2019, a virulent virus appeared in the city of Wuhan, China, and quickly spread around the world. The virus was called coronavirus, or COVID-19, and is known by medical scientists as SARS-CoV-2. Symptoms in people infected with the virus are very similar to other types of pneumonia. However, the exception is that the virus spreads very rapidly, and it kills people with weak immune systems within a short time of infection. According to the World Health Organization (WHO), by the end of 2020, 85 million people have been infected and almost two million people have died. On average, the mortality rate has been estimated at 3%
Pneumonia is caused by many germs, including bacteria and viruses like the novel coronavirus (COVID-19). Certain symptoms are specific to COVID-19, such as chest pain, breathing difficulties, coughing with mucus, high fever, intense diarrhea, abdominal pain, and extreme fatigue. Once these symptoms are detected by doctors, the patient is suspected of having COVID-19, but the reverse transcription-polymerase chain reaction (RT-PCR) test or radiology imaging is required to confirm the diagnosis before adopting appropriate treatment. X-ray imaging is the most popular and available radiography tool in hospitals and medical clinics around the world. Due to its low cost, it is the most used technique by doctors for diagnosing cases of pneumonia. Types of pneumonia are defined by their cause (bacterial or viral) when analyzing lung X-ray images, but distinguishing between different types is difficult because some types look very similar. Therefore, alternative diagnosis methods are needed to assist traditional manual methods of detecting and differentiating between COVID-19 and other types of pneumonia.
Early detection of COVID-19-infected patients is crucial in saving human lives [
Many medical methods of COVID-19 diagnosis are used in clinical routine. The RT-PCR method, searching for the COVID-19 virus in person’s nose or throat, is the most used, but due to its lack of accuracy and its required delay; cannot handle with the increasing number of infected people over the world. Especially, with appearance of multiple variants of COVID-19 that spreads more rapidly.
Another method that is often used for past infected patients of COVID-19 is serology test. This method consists of looking for antibodies in blood to determine if the patients already infected with COVID-19 virus. Other early detection method consists of analysing chest X-ray images of suspected patients in order to detect the COVID-19 virus effect percentage on patient’s lung. All these methods are manual and time consuming. So, the scientific community is still need to alternative methods that can diagnose COVID-19 virus rapidly and accurately.
Recently, artificial intelligence (AI) tools, such as machine learning (ML) and deep learning (DL), along with the development of other techniques, such as the Internet of Things (IoT), has attracted several researchers because of their efficiency in various fields like rumor detection in social media [
Medical diagnosis support systems (MDSS) have also gained particular interest in recent years. These smart tools constitute an important aid for medical professionals to gain time, effort, and accuracy [
Copious amounts of medical data require high-performance computing (HPC) techniques, such as parallel computing. To overcome the constraint of execution time, graphical processing units (GPU) become crucial devices in medical data processing applications [
Doctors often diagnose pneumonia by clinical examination and analyzing the patient’s symptoms, but for improved diagnosis accuracy, they often conduct a chest X-ray to confirm its cause, whether viral or bacterial. The main treatment is antibiotics and pain-relief medication, drinking more water, and resting. Other types of pneumonia treatment can result in severe complications, as happens with COVID-19 cases. Recent research associated very high risk of complications and increased viral infective load for patients with COVID-19 and taking anti-inflammatory drugs (NSAIDs) as treatment [
Currently, doctors refer to a laboratory test called real-time RT-PCR as a formal and official method to confirm suspected COVID-19 in patients presenting clear symptoms. This method takes 24 hours to provide results, but it is not completely accurate as it presents a high false-negative rate [
X-ray radiation-based imaging is used for many technology industries, including X-ray radiography and computed tomography (CT) imaging. X-ray radiography consists of body organs exploiting absorbed waves to produce a 2-D grayscale image, whereas CT scans use a computer to combine multiple 2-D grayscale images to form a 3-D image similar to MRI technology, which is very expensive. X-ray radiography is available in almost all medical facilities, but CT technology rarely exists in hospitals and needs experts to operate it.
Statistical studies show that CT or X-ray imaging helps doctors diagnose COVID-19 in 89.9% of cases. This is an important factor that motivates us to use X-ray technology to auto-diagnose COVID-19 [
In this work, we focus on automating the detection and classification of COVID-19, other types of pneumonia, and normal cases by analyzing chest X-ray images. The proposed methodology consists of using the power of CNN and Grad-CAM [
The main contributions of this paper are:
– A representative dataset from many sources has been collected to generate a valid and equilibrated X-ray -based dataset of three classes: normal, pneumonia, and COVID-19.
– Pre-trained CNNs (VGG16, VGG19, and MobileNet) are selected based on their performance evaluation in the related literature.
– A CNN-based zero-shot transfer learning technique is applied in the training process to exploit the knowledge of the pre-trained models.
– CNN models will be adapted by adding additional layers before the output layer to suit our specific prediction task.
– A Grad-CAM visual explanation method is applied to debug the prediction process for each model and to highlight the interesting regions in the X-ray image responsible for the final decision.
Experimental results demonstrate the higher performance of the modified VGG19 model in comparison with VGG16 and MobileNetV2.
The remainder of this paper is structured as follows. In Section 2, we present a background on (a) previous similar works on predicting COVID-19; (b) a general overview on the common methodology to establish a CNN model; (c) a specified focus on the pre-trained CNN models that constitute the basis for our proposed transfer learning-based models. Section 3 describes the methodology adopted in this work and the different image resources used to feed the study. In Section 4, we present our findings and results and provide a brief technical comparison of different proposed models with previous ones. Additionally, we will present some Grad-CAM illustrations to explain the image regions of interest that are responsible for the final prediction decisions; we will also discuss these results. Section 5 provides a conclusion by highlighting the limitations of our methodology and describing improvements and perspectives for future work.
Recently, researchers have started investigating the COVID-19 pandemic from different perspectives. Some of their works concern exploiting AI to help diagnose COVID-19 based on different laboratory resources. In this section, we describe some interesting studies related to ours concerning the use of ML techniques to detect COVID-19 using clinical data.
Rahimzadeh et al. [
A new automatic system for diagnosing COVID-19 based on ML is presented in Islam et al. [
Saha et al. [
Wang et al. [
Song et al. [
Ni et al. [
El Gannour et al. [
In our recent work [
A brief overview of this some related works is shown in
Author | Method | Image type | COVID-19 images | Accuracy |
---|---|---|---|---|
Rahimzadeh et al. [ |
– Xception |
X-ray | 180 | 99.51% |
Islam et al. [ |
– LSTM | X-ray | 141 | 97% |
Gonesh et al. [ |
– CNN model | X-ray | 94 | 87.4% |
Wang et al. [ |
– DenseNet121-FPN |
X-ray | 924 | 87% |
Ying et al. [ |
– DRE-Net CNN model | X-ray | 88 | 92% |
Ni et al. [ |
– MVPNET |
CT | 3.854 | 94% |
El Gannour et al. [ |
– Xception | X-ray | 219 | 98% |
The present paper is an extended version of our work presented in Moujahid et al. [
A CNN is a type of deep neural network specializing in handling computer vision and image processing. Its name comes from the famous mathematical operation called convolution applied to matrices. For complex functions f and g, the discrete convolution is given by the following formula:
CNN architecture consists of linking different types of layers: the input layer followed by convolutional layers related to multiple hidden layers, and an output layer called a classifier. The hidden layers are a combination of convolutional layers, pooling layers, and fully connected layers, which are also called dense layers. Each layer is composed of multiple neurons interconnected with other neurons in the previous and next layers [
Each layer is defined by different parameters that specify input dimensions, stride, and padding during the processing step. The input layer takes handles with a limited number of tensors in parallel according to the given configuration, based on the available computing and memory capacity [
The convolutional layer plays a crucial role in how CNNs operate. The layers mean that power comes from its learnable kernels. When data arrives at the input of a convolution layer, the layer convolves the inputs with the configured filter across spatial dimensionality. This operation produces an activation map that can be visualized [
The goal of the pooling layer is to gradually decrease the dimensionality of the inputs, and, thus, optimize and reduce the number of hyper-parameters and the complexity of the model. The main problem with this type of layer is the destruction of some information that could be important to the input features. There are many types of pooling operations, such as average, max, and general, but max-pooling is the main type used for image processing [
The output y of a pooling layer is defined by
where
The output feature of a pooling layer dimensionality ratio depends on the stride and the filter parameters defined during the configuration.
where N is the input size, F is the filter size, and S is the stride size.
Fully connected layers also called dense layers consist of connecting all the neurons in adjacent layers so they are fully connected to each neuron in the previous and next layer. In a CNN, the first layers in the fully connected layers flatten the inputs into one vector and then the next layers apply adequate weights to predict the correct label of each feature according to certain calculated probabilities.
Dropout layer present a technique used to prevent a CNN model from over fitting phenomenon. Its concept works by randomly setting the outgoing edges of neurons in hidden layers to 0 at each epoch of the training phase.
This architecture is based on using small convolutional filters (3 × 3) and having 16 weight layers, as illustrated in
This architecture is similar to VGG16 architecture in that it also uses small convolutional filters (3 × 3), but it has 19 weight layers, as illustrated in
The MobileNetV2 model was originally designed to improve the performance of mobile networks performing multiple tasks, as illustrated in
It is based on an inverted residual structure where bottleneck layers oppose traditional residual models [
When a CNN-based algorithm is trained on a specific dataset for classification purposes, the process generates a model with trained weights ready for classifying any feature similar to the original dataset. This knowledge can be exploited for other classification purposes [
There are many derivatives of transfer learning depending on the nature of the task. In our case study, we used this technique on models that were originally trained on an ImageNet dataset. This is different from the task of classifying.
AI techniques are the global methodology adopted for this study, in particular, an ML concept based on CNN and transfer learning. The proposed CNN models involved removing the layers at the head of the original model then flattening the output of the previous layer. We added 512-sized dense layers followed by a dropout layer and another 256-sized dense layer perceptron. Finally, we added an output layer adapted for three classes. This methodology was applied for all three models: VGG16, VGG19, and MobileNetV2. This method allowed us to assemble a valid, sufficient, and useful dataset to train our proposed CNN models.
The dataset was assembled for pre-processing before it was partitioned into thirds to feed the CNN-based models (VGG16, VGG19, MobileNetV2). Seventy percent was reserved for the training step, 15% for validation, and 15% for the testing step. After finishing all the steps, we used several techniques for evaluation and Grad-CAM visualization to determine the best-adapted and performing model. A global overview of the architecture is illustrated in
Even if publicly available datasets change permanently by adding new features, the amount of data is still insufficient to achieve a good training process. Different sources of COVID-19 datasets were chosen to collect X-ray images, and only the valid ones were selected to build the COVID-19 X-ray dataset.
The selected images were then resized to adapt to the input shape of the model. The collected dataset consisted of three different classes: confirmed COVID-19 cases, other pneumonia cases, and normal cases.
The publicly available sources are described in
Dataset source | COVID-19 images | Valid images |
---|---|---|
https://github.com/ieee8023/covid-chestxray-dataset | 661 | 455 |
https://github.com/agchung/Figure1-COVID-chestxray-dataset | 56 | 35 |
https://github.com/agchung/Actualmed-COVID-chestxray-dataset | 239 | 58 |
https://kaggle.com/tawsifurrahman/covid19-radiography-database | 224 | 224 |
https://github.com/zeeshannisar/COVID-19 | 76 | 68 |
In this experiment, we collected 840 valid thoracic X-ray images from several sources. The final dataset was partitioned into three parts: 70% for the training step, 15% for validation, and 15% for the testing step.
SUBSETS | NORMAL | PNEUMONIA | COVID-19 |
---|---|---|---|
Training set | 939 | 941 | 588 |
Validation set | 201 | 202 | 126 |
Testing set | 201 | 202 | 126 |
Total | 1.341 | 1.345 | 840 |
The gradient-weighted class activation mapping (Grad-CAM) technique is used to debug almost any CNN model to perform and validate its performances
Based on the last convolutional layer feature map, the Grad-CAM method consists of processing the neuron importance weights
where
In ML, metrics are superior evaluation tools for describing a model’s performance, and the number of metrics depends on the type of the model. In our case, we adopted the metrics usually used to evaluate CNN models, which are based on four essential parameters. True positives (TP) represent the number of predicted positive cases that match the ground truth; false negatives (FN) concern the number of predicted negative cases that are positive in the ground truth; true negatives (TN) are the numbers of predicted features that are negative cases that match the ground truths; false positives (FP) are the number of predicted positive cases but are negatives in the ground truth. The following equations present the different metrics used:
• Recall (true positive rate): The number of elements accurately predicted as positives out of total true positive cases.
• Precision: The number of elements accurately predicted as positives out of the total elements identified as positive cases.
• F1-score: This metric combines recall and precision values.
• Accuracy: The number of elements accurately predicted over all of the predicted elements.
• Loss represents the estimated error in the model predictions.
where
CNN model performance depends on several criteria, such as its architecture, the number of trainable parameters, and the configuration and type of layers. In our work, we chose to implement the models that exist in the literature in terms of performance in image processing tasks (VGG16, VGG19, and MobileNetV2). This section presents our findings and the results of our work.
In epochs, the training accuracy of the VGG16 model increases step-by-step, and the validation accuracy increases until the 10th epoch when it becomes constant and stable for the remainder of the time (the architecture is shown in
The loss function was used to optimize the model during training and validation. It shows the model behavior across epochs in terms of updating weights for better predictions. In general, if the loss is low, the accuracy is higher.
The VGG19 model (architecture in
Training accuracy continued to improve across all the epochs, but the validation improvement improved faster in the first epochs then started to decrease. This model needed training over 18 epochs before stopping; more details are shown in
In this work, we did not use augmentation techniques to increase the dataset size. All of the X-ray images used were extracted from different public sources then filtered and validated before pre-processing. Other methods cited in the related literature had very high accuracy results, but the datasets did not represent real case classification tasks. For example, in
Training a CNN model generates a prediction model with correct weights that corresponds to the dedicated task. In our case, the obtained weights were tested and evaluated using a completely independent sub-dataset already prepared as shown in
Metric | VGG16 | VGG19 | MobileNetV2 |
---|---|---|---|
Precision | 0.95 | ||
Recall | 0.97 | 0.99 | |
F1-score | 0.98 | 0.98 | |
Accuracy | 96.22% | 95.84% | |
Loss | 17.33% | 17.40% |
The results of the trained models show that VGG16 and VGG19 obtained 100% precision against 95% for MobileNetV2. VGG19 was the most accurate at 96.97% and obtained 100% for the F1-score parameter but the worst loss value. According to these results, the VGG19 model was judged the best model for predicting COVID-19 cases in terms of accuracy and efficiency. Another interesting tool to obtain a clear view of the models’ performances during the test step was the confusion matrix, where all the predicted labels of X-ray features were compared to the ground truth labels.
Evaluation of a binary classification model can be performed using receiver operating characteristic (ROC) curve methodology. This tool is similar to precision/recall metrics with a visual presentation based on a curve that plots a true positive rate
Another interesting tool for model performance evaluation is the precision/recall ROC curve, which is similar to the true positive rate-based ROC curve. The difference is that it shows the variations of precision and recall across testing results for binary classification tasks.
We adapted our situation to the binary classification task to use this methodology for evaluation as an extra method. We created virtual results considering that COVID-19 was the principal class, and the other two classes (normal and pneumonia) were non-COVID classes; the results are shown in
CNN models are generally a black box tool in terms of image processing across layers. Selvaraju et al. [
The results of the prediction experiment show that the VGG19 and VGG16 models correctly predicted the case, and the Grad-CAM visualization confirmed that the prediction was based on the correct regions in the lungs. The MobileNetV2 model predicted the case as normal, and the Grad-CAM visualization of this model showed that the prediction was based on a small region of the X-ray image, which explains the result.
In
Comparing our results to existing methods cited in the related works section shows that the VGG19 model had a 99% recall, 100% F1-score, and 96.97% accuracy. Additionally, our proposed model demonstrated that the percentage of correctly predicted COVID-19 cases that matched the ground truth in the test sub-dataset reached maximum precision (100%). In general, our method obtained good accuracy results compared to several cited methods; more details are shown in
Method | Recall | Precision | F1-score | Accuracy |
---|---|---|---|---|
Xception & ResNet50V2 [ |
80.53% | 35.27% | NA | 99.51% |
CNN & LSTM [ |
99.3% | 99.2% | 98.9% | 99.4% |
New CNN model [ |
NA | NA | NA | 87.4% |
DenseNet121-FPN & COVID-19Net [ |
98.66% | NA | NA | 87% |
DRE-Net CNN model [ |
NA | NA | NA | 92% |
MVPNET & 3D U-Net [ |
NA | NA | NA | 94% |
In particular, Rahimzadeh et al. [
In this paper, we focused on training three CNN-based models using a zero-shot transfer learning technique to test the models’ ability to diagnose COVID-19 by analyzing a patient’s chest X-ray images.
The results obtained are very interesting in terms of precision and accuracy for all models and especially for the VGG19 model that obtained very satisfying results. Before conducting this experiment, we collected several positive COVID-19 X-ray images that were publicly available and filtered them to use only the valid elements.
The final version of our dataset contained 840 X-ray images of positive COVID-19 cases; however, this is still insufficient and needs to be increased. We trained the three models (VGG19, VGG16, and MobileNetV2) using the assembled dataset then tested the trained models on the test set. The results were used to evaluate the models’ decisions. To debug the prediction process, we implemented the Grad-CAM technique to visualize the region of interest in each X-ray image in each model; in particular, which pixels contributed the most to the model’s prediction decision.
Since the motivation is still present to continue in this topic in order to increase the performance of COVID-19 diagnosis, many improvements could be applied to this work in the future such as: 1) Increasing the dataset size of positive COVID-19 Chest X-ray images would absolutely make the dataset more balanced and improve the accuracy of prediction, 2) An optimization method of CNN hyper parameters could be adopted to increase the accuracy results. The main aim is to combine the best parts of high accuracy models in classifying COVID-19 into one model to achieve better performance.
We are thankful to the “Centre National pour la Recherche Scientifique et Technique” (CNRST) and the Hassan II University of Casablanca, Morocco, for their support to this work a part of a project entitled “Scientific and Technological Research Support Program in Link with COVID-19” launched in April 2020 (Reference: Letter to the Director of “Ecole Normale Supérieure de l’Enseignement Technique de Mohammedia” dated June 10th 2020).