COVID-19 Diagnosis Using Transfer-Learning Techniques

COVID-19 was first discovered in Wuhan, China, in December 2019 and has since spread worldwide. An automated and fast diagnosis system needs to be developed for early and effective COVID-19 diagnosis. Hence, we propose two-and three-classifier diagnosis systems for classifying COVID-19 cases using transfer-learning techniques. These systems can classify X-ray images into three categories: healthy, COVID-19, and pneumonia cases. We used two X-ray image datasets (DATASET-1 and DATASET-2) collected from state-of-the-art studies and train the systems using deep learning architectures, such as VGG-19, NASNet, and MobileNet2, on these datasets. According to the validation and testing results, our proposed diagnosis systems achieved excellent results with the VGG-19 architecture. The two-classifier diagnosis system achieved high sensitivity for COVID-19, with 99.5% and 100% on DATASET-1 and DATASET-2, respectively. The three-classifier diagnosis system achieves high sensitivity for COVID-19, with 98.4% and 100% on DATASET-1 and DATASET-2, respectively. The high sensitivity of these diagnostic systems for COVID-19 will significantly improve the speed and precision of COVID-19 diagnosis.


Introduction
Since preventive or experimental vaccine therapy for extreme acute respiratory syndrome coronavirus (COVID- 19) is not available, its early detection is of paramount importance in enabling infected persons to gain rapid immunity and to reduce the risk of infection for the healthier population. Key diagnosis methods for COVID-19 are reverse transcription-polymerase chain (RT-PCR) and gene sequencing of respiratory or blood samples [1]. However, an overall positive RT-PCR average of 30%-60% is obtained by analyzing throat swab samples, which results in having undiagnosed patients with COVID-19 who may contagiously infect a large, healthy population [2]. Given the high prevalence of COVID-19 and the shortage of qualified radiologists, automatic methods for detecting COVID-19 can assist the diagnostic process and improve high-precision diagnosis at an early stage. Artificial intelligence (AI) and machine learning (ML) techniques are effective tools that can be used to develop methods for the early diagnosis of COVID-19. In this regard, using chest X-ray images, we use an end-to-end deep learning (DL) framework to classify COVID-19. Unlike traditional AI/ML techniques that use a two-stepped process (i.e., the manual extraction of features, which is followed by image recognition) for classifying medical images, we have developed DL-based systems that explicitly predict COVID-19 from raw images without requiring feature extraction. Recently, in most machine vision and medical image processing activities, deep-level learning models, specifically convolution neural networks (CNNs), have outperformed conventional AI models and have been used for several tasks, including image grouping, image segmentation, facial recognition, super-resolution, and image improvement [3][4][5]. In this study, we train three CNNs, including VGG-19, MobileNetV2, and NASNet, which have achieved promising results in several tasks, and we evaluate their success on COVID-19 X-ray-datasets for the detection of COVID-19.
The application of sophisticated AI techniques combined with radiological imaging can help to reliably diagnose COVID-19 and compensate for the shortage of trained physicians in remote areas. DL models have been widely used for many tasks, including classification. In the detection of patient details through image segmentation and lesion, the time between the onset of initial symptoms and the imaging test may be an important factor in the reliability of X-ray results. Although X-rays showed no symptoms of the disease within the first three days of the onset of coughing and fever, these symptoms were more evident in the next 10 to 12 days after the initial 3 days [6]. Consensus guidelines for imaging for children with COVID-19 [7] are now in place. According to these guidelines, if an infant is suspected of developing COVID-19 and shows mild to serious signs of an acute respiratory disorder, X-ray tests should be conducted. Repeated X-ray tests may be necessary to track the progression of the disease if original chest X-ray images yield concrete symptoms of COVID-19, and this will also be warranted, if a patient's state of health deteriorates. With the global prevalence of the COVID-19 pandemic, chest X-ray radiographs for healthcare services should be considered as a valuable method for identifying COVID-19 [8]. The above findings indicate that DL with X-ray imaging can provide significant COVID-19 diagnosis results.
The rapid identification of confirmed COVID-19 cases at an early stage is necessary for quarantine and medical care, as well as the initial diagnosis, management, and public health safety of patients. However, with current conditions, medical staff, especially radiologists, are under tremendous pressure due to the large number of suspected patients having to go through CT scanning; in addition, the lack of scanning devices will increase the chance of failure to detect minor lesions due to visual exhaustion of radiologists. As the primary emerging AI technology used for medical imaging, ML has been effective in the automated detection of lung diseases [9][10][11]. An image classification project achieved human-level success with a further one million training photographs in 2015 [12] and achieved genuinely exciting results on lung cancer testing in 2019 [13]. Most ML techniques for diagnosing diseases enable lesions to be observed, particularly for identifying CT-related diseases.
The motivations of this research are (1) to contribute to overcoming the COVID-19 pandemic; (2) to introduce an automated and fast COVID-19 diagnosis system as a simple alternative diagnosis technique to mitigate the spread of COVID-19; (3) to enhance the accuracy of existing diagnosis systems (see Section 7, Comparison and Discussions) by introducing an accurate COVID-19 diagnosis system. Therefore, in this study, we aim to introduce an accurate COVID-19 diagnosis system using state-of-theart CNN architectures and transfer-learning (TL) techniques. In this study, we investigated two TL scenarios. In the first scenario, we froze pretrained layers in a VGG-19 model, except for the last four layers; in the second scenario, we fine-tuned all layers of the Mobilenetv2 and NASNet models.
The rest of this paper is structured as follows. Section 2 presents the literature review. Section 3 presents the proposed methodology, with a description of the selected CNN architecture. Section 4 describes the two datasets used in this study, and Section 5 proposes two COVID-19 diagnosis systems; Section 6 describes the results of the two proposed diagnosis systems in different metrics; Section 7 presents a comparison and discussion of the proposed diagnosis systems and reference studies, and Section 8 provides the conclusion.

Literature Review
Currently, there is an extreme shortage of radiologists, and radiologists are exhausted by the large number of COVID-19 casualties and cases due to the rapid spread of COVID-19. In this situation, a poorly controlled diagnosis of COVID-19 is possible and can be harmful. A patient's temperature level is one of the easiest detection markers of the disease. COVID-19 belongs to the 2B beta-coV category and has at least 70% resemblance to the genetic sequence of SARS-CoV [13], and it is the 7th RNA coronavirus family member to have infected humans [14]. COVID-19 infection symptoms include respiratory signs, fever, cough, and pneumonia [15]. Its diagnosis has become a major concern in hospitals due to the lack of nucleic acid detection boxes. CT and radiology have proven effective for the early detection and diagnosis of COVID-19 [16][17][18], but due to the insufficient number of radiologists [18], specialized computer-aided lung CT diagnosis systems are required to reliably validate suspected COVID-19 cases, scan patients, and conduct virus supervision.
DL is a common AI research field that enables the development of end-to-end models to achieve promising results with input data, without the need for manual extraction of features [19]. There are several recent studies on COVID-19 where they employ various DL models with X-ray images for COVID-19 detection [20]. They used a DL-based model with a total of 16,756 X-ray images with a multiclass classification (three classes) and a proposed dedicated dataset of COVID-19 X-ray images named COVIDx [21]. Narin et al. [22] proposed a support vector machine (SVM) model that classified characteristics obtained from various CNN models using X-ray images (25 COVID-19 positive and 25 healthy patients). The study claims that ResNet50 with the SVM classifier produces great results. Apostolopoulos et al. [23] used three different CNN models (ResNet50, InceptionV3, and InceptionResNetV2) using 50 open access COVID-19 X-ray images from Joseph Cohen and 50 typical images from a Kaggle repository. El-Din Hemdan et al. [24] deployed DL models to diagnose patients with COVID-19 using chest X-rays. It proposed a COVIDx-Net model that includes seven CNN models with 50 Chest X-Ray images (25 COVID-19 positives, 25 healthy). Ozturk et al. [25] proposed a model based on the DarkNet method that is completely automated with an end-to-end structure without the need for manual feature extraction. The authors used 1,125 images (125 COVID-19 positives, 500 Pneumonia images, and 500 NoFindings images) to experiment with their developed model. In 2020, Al-Waisy et al. [26] proposed a hybrid COVID-19 detection system called COVIDCheXNet. They used the contrast-limited adaptive histogram equalization to enhance the data instead of fussing the results of two pretrained DL models (ResNet34 and high-resolution network model). COVIDCheXNet system achieved an accuracy of 99.99% and a sensitivity of 99.98%. In 2021, Al-Waisy et al. [27] proposed a hybrid COVID-19 detection system called COVID-DeepNet system based on ML. The proposed COVID-DeepNet system was used to diagnose patients into two classes (COVID-19 and healthy). In 2021, Abdulkareem et al. [28] proposed a model based on ML techniques (Naive Bayes, random forest, and SVM) and the Internet of Things (IoT) to diagnose patients with COVID-19 in smart hospitals. The proposed system achieved an accuracy of 95% with the SVM model. Ismael et al. [29] used several DL approaches to classify COVID-19 using healthy chest X-ray images. For feature extraction, they used an SVM classifier with linear, quadratic, cubic, and Gaussian functions, and for fine-tuning procedures, they used ResNet18, ResNet50, ResNet101, VGG-16, and VGG-19. They used a dataset containing 180 COVID-19 and 200 healthy chest X-ray images. The proposed system achieved a maximum accuracy of 94.7% with the ResNet50 model and SVM classifier with the linear kernel function. Jain et al. [30] used DL-based CNN models ( Inception V3, Xception, and ResNeXt) to detect the COVID-19 on chest X-ray images. They used a dataset of 6,432 chest X-ray image samples, and 5,467 samples were used for training and 965 for validation. The Xception model achieved the highest accuracy of 97.97%. Tab. 1 illustrates several COVID-19 classification methods, the accuracy achieved, and the size of the dataset used for their training and validation.
As can be seen in Tab. 1, the existing COVID-19 diagnosis systems have many disadvantages such as only using two classes for classification, having limited positive cases, or achieving low accuracy and sensitivity.

Methodology
Usually, deep neural networks with a larger dataset perform better than those with smaller datasets. For applications where the dataset is not large, TL can be effective. The TL concept uses a well-trained model from large datasets, such as ImageNet, and applies it with comparatively small datasets. This eliminates the need for large datasets, thus decreasing the amount of training that the DL algorithm needs when built from scratch.
In this study, our system is trained, evaluated, and tested with three well-known CNNs-VGG-19 [32], NASNet [33], and MobileNet2 [34]. VGG is developed with minimum preprocessing to identify graphic patterns from pixel images. The ImageNet project has been configured for the detection of visual objects. A VGG network is characterized by its simplicity and is build using only 3 × 3 convolutional layers stacked on top of each other in increasing depth. The reducing volume size is handled by max-pooling. Two fully connected layers, each with 4,096 nodes, are then followed by a softmax classifier. NASNet is a google ML model that generates small neural networks. Google first revealed in May 2017 that its AutoML project represents a significant step forward in the use of state-of-the-art ML models to identify images. The accuracy of the validation system is 82.7%, which is an improvement on all previous models created by the team.
MobileNetV2 is a 53-layer profound CNN in which a retrained network trained on more than one million images from the ImageNet dataset can be loaded. Consequently, the network represents a wide variety of images. The image input size in the network is 224 × 224.
In this study, we investigated two scenarios of TL. In the first scenario, the pretrained layers in the VGG-19 model, except for the last 4 layers, are frozen; in the second scenario, all layers of Mobilenetv2 and NASNet models are fine-tuned. In VGG-19, we freeze all layers except for the last 4 layers. Then, we added the following layers:

Dataset
We used two datasets, which comprise a collection of healthy, pneumonia, and COVID-19 infection X-ray images from a state-of-the-art study [23,35]. The following three datasets were used to train our system with the pretrained CNN models: VGG-19, NASNet, and MobileNetV2. We compared the performance of our system with those of the studies from which these datasets are obtained.
DATASET-1 [35] comprises post-to-antheral chest X-ray (CXR) images as this vision is commonly used in pneumonia diagnosis by radiologists. DATASET-1 comprises four subdatasets; two subdatasets were constructed by the authors, and the other two were collected from two repositories, Kaggle and GitHub, which are publicly accessible. The first subdataset is collected from the "Italian Society of Medical and Interventional Radiology (SIRM) COVID-19 DATABASE" and comprises 330 positive COVID-19 radiography CT and CXR images with different resolutions, where 70 images are CXR and 250 images are CT images of the lung. The second subdataset is collected from "Novel Corona Virus 2019 Dataset" and comprises 179 radiography images of COVID-19, Middle East respiratory syndrome, severe acute respiratory syndrome, and ARDS from written papers and web resources, created by Lan Dao, Joseph Paul Cohen, and Paul Morrison in GitHub46. The third subdataset is collected from "COVID-19 positive CXR images from different articles." Datasets collected from GitHub have motivated researchers to study the literature, and as a result, in less than two months, more than 1200 articles were published. The fourth subdataset is a Kaggle CXR dataset, comprising 5247 CXR images with resolutions ranging from 400 to 2,000 pixels for regular, viral, and bacterial pneumonia. Of these, 3,906 are pneumonia-affected pictures from multiple subjects (25,561 bacterial pneumonia pictures and 1,345 virus pneumonia pictures) and 1,341 are from healthy subjects. Fig. 1 illustrates some samples of DATASET-2.
DATASET-2 [23] comprises three subdatasets. For the first subdataset, GitHub was explored for similar datasets. The authors took a set of X-ray images from Cohen [36]. For the second subdataset, the North American Radiological Society, Radiopaedia, and SIRM were closely examined. These sets can be found Figure 1: Sample X-ray images from DATASET-1 showing COVID-19, viral pneumonia, and healthy cases online [37]. The third subdataset was complemented with a series of common X-ray bacterial-pneumonia scans to enable the CNNs to differentiate COVID-19 from common pneumonia. Kermany et al. [38] made this collection available on the Internet. A total of 700 identified severe pneumonia images, 224 identified COVID-19 images, and 504 healthy condition images were included in DATASET-2 [23]. Fig. 2 illustrates some samples of DATASET-2.

Proposed Diagnosis System
This paper proposes diagnosis systems that will classify COVID-19 infection into two and three classes using DL techniques. The two-classifier system can classify X-ray images into two classes, i.e., healthy and COVID-19 positive cases. The three-classifier system can classify X-ray images into three classes, i.e., healthy, pneumonia, and COVID-19 positive cases.
The two-classifier diagnosis system uses an end-to-end DL framework to classify COVID-19 from the CXR images. Compared to traditional techniques for classifying medical images, which use a two-step process (manual extraction of features and image recognition), our two-classifier ML system explicitly predicts COVID-19 from raw images, without requiring feature extraction, and classifies X-ray images into two categories: COVID-19 and healthy cases. As illustrated in Fig. 3, we started by collecting the image data (thousands of X-ray images) of patients having COVID-19 and healthy persons. Then, we classified the images (dataset) into two types (COVID-19 and healthy cases) and augmented the images by resizing them according to the standard size of their respective CNN models. After that, we divided Figure 2: Sample X-ray image from DATASET-2 shows COVID-19, viral pneumonia, and healthy cases the dataset into training and test sets and applied the pretrained CNN models (VGG 19, NASNet, and MobileNetV2 ) to generate the diagnosis, i.e., detect whether a patient is a COVID-19 patient or a healthy person.

Results
In this study, three well-known pretrained DL CNNs--VGG-19, NASNet, and MobileNet2--were retrained, evaluated, and tested using the KERAS framework to classify the X-ray images. The training of the models was conducted on a machine equipped with the Intel © i9-9880H core @2.3 GHz processor, 16 GB RAM, 2 GB graphics card, and 64-bit Windows 10 as the operating system. We used the ImageData Generator with the following parameters (rotation range = 40, width shift range = 0.2, height shift range = 0.2, shear range = 0.2, and zoom range = 0.2) for augmentation. In addition, we resized all images to 224 × 224 pixels to meet the requirement of the pretrained models. Furthermore, we used five-fold cross-validation for training and testing with the following training parameters: batch size = 16, the number of epochs = 30, and ADAM optimizer with the learning rate = 0.0001.
For each dataset, we tested the three CNN models for the two-and three-classifier scenarios. Different wellknown metrics such as the accuracy, F1 score, precision, recall, and confusion matrix were used to evaluate the models and compare them with other obtained results. The following specific metrics were documented in the classification work of the CNNs: (a) accurately categorized disease cases (true positive, TP), (b) falsely categorized disease cases (false positive, FP), (c) correctly identified healthy cases (true negatives, TN), and (d) incorrectly classified healthy cases (false negative, FN). TP = the correctly diagnosed COVID-19 cases, FP = pneumonia or healthy cases that were diagnosed as COVID-19, TN = pneumonia or healthy cases that were diagnosed as non-COVID-19 cases, and FN = the COVID-19 cases that were diagnosed as non-COVID-19. We calculated the performance metrics using the following equations (references): Furthermore, we used five-fold cross-validation for all experiments and took the overall average of all results. To test the models (GG-19, NASNet, and MobileNet2), we tested the trained models on a new test dataset (20% of the total dataset) and generated results using the same configuration and parameters that were used in training and validating the models.

Two-Classifier System
Tab. 2 summarizes the performance of the CNN models tested on the two different datasets. For DATASET-1, VGG-19 outperformed the other models with the following performance metrics: 99.5% accuracy, 99.2% F1 score, 97.6% sensitivity for COVID-19 cases, and 1% sensitivity for healthy cases.
In DATASET-2, we achieved a similar result, i.e., VGG-19 outperformed the other models with 1% accuracy, 1% F1 score, 1% sensitivity for COVID-19 cases, and 1% sensitivity for healthy cases, as shown in Tab. 3, which also shows the confusion matrix for MobileNet, NASNet, and VGG 19 for the best fold for the two-classifier system, on both datasets.
Tab. 5 shows the confusion matrix for MobileNet, NASNet, and VGG 19 for the best fold for the threeclassifier system for the two datasets.
We performed five-fold cross-validation with 30 epochs for both classifier systems for the VGG-19, MobileNet, and NASNet models and took the overall average of the results. Figs. 5-8 illustrate the learning performance accuracy of VGG-19 in single-fold cross-validation, with 30 epochs of the twoclassifier and three-classifier systems for DATASET-1 [35] and DATASET-2 [23].
In Figs. 5-8, training and validation accuracies show that there is no overfitting; for example, VGG-19 has an excellent fit and stable performance. The training and validation loss decreased to the point of stability with a minimal gap between the two final loss values in all folds.

Comparison and Discussions
In this section, we compare the performance metrics of the models in our study with those of the reference studies that used the same datasets (DATASET-1 and DATASET-2).

Two-Classifier System (DATASET-1)
Tab. 6 summarizes the performance evaluation comparison matrix of the best results of the models in this study and the reference study [35] for the two-classifier system. In this study, VGG-19 outperforms and shows outstanding results with different performance metrics on DATASET-1: 99.5% accuracy, 99.2% F1 score, 100% sensitivity for COVID cases, and 99.6% sensitivity for healthy cases. In contrast, in the reference study [35], four CNN models were used to train their system, where SqueezeNet shows the best results with different performance metrics on DATASET-1: 98.3% accuracy, 98.3% F1 score, 96.7% sensitivity for COVID cases, and 96.7% sensitivity for healthy cases.

Three-Classifier System (DATASET-1)
Tab. 7 compares the performance evaluation metrics of the best models in this study and the reference study [35] for the three-classifier system. In this study, VGG-19 outperformed and showed outstanding results with different performance metrics on DATASET-1; it showed 98.6% accuracy, 98.6% F1 score, 98.9% sensitivity for COVID cases, 98.4% sensitivity for healthy cases, and 99.4% for pneumonia cases. In contrast, in the reference study [35], four CNN models were used to train their system and SqueezeNet showed the best results for different performance metrics; it showed 98.3% accuracy, 98.3% F1 score, 96.7% sensitivity for COVID cases, 96.7% sensitivity for healthy cases, and 96.7% for pneumonia cases on DATASET-1.

Two-and Three-Classifier Systems (DATASET-2)
Tab. 8 summarizes the performances of the best models using the two-and three-classifier systems of this study and a state-of-the-art reference study [23] on the same dataset, which is DATASET-2. For the two-classifier system in our study, the VGG-19 model outperformed the other CNN models with 100% accuracy and 100% sensitivity, while the reference study [23] exhibited 98.75% accuracy and 92.85% sensitivity. Furthermore, for the three-classifier system, the VGG-19 model in our study exhibited 96.2% accuracy and 96.7% sensitivity, while the reference study [23] exhibited 93.48% accuracy and 92.85% sensitivity.
Tab. 8 shows that in the proposed DL-based two-and three-classifier systems, the VGG-19 model achieves accurate results and can be useful for diagnosing COVID-19 cases.

Conclusion
In this paper, we proposed DL-based systems for the automatic identification of COVID-19 in CXR images by retraining three pertained CNN models (VGG 19, MobileNetV2, and NASNet ) on two datasets (DATASET-1 and DATASET-2). CXR and X-ray images were classified into two and three classes (healthy, COVID-19, and pneumonia cases) using the proposed two-and three-classifier systems. In particular, the two-classifier system classified X-ray images into COVID-19 or healthy cases, and the three-classifier system classified CXR images into COVID-19, healthy, or pneumonia cases. We conducted a detailed theoretical study to determine the efficiency of each of the three CNN models and found that the VGG-19 model outperforms the other CNN models. In addition, a detailed comparison study was conducted by comparing the results of this study with those of reference studies, and we found that the proposed classifier systems of this study outperform the related existing systems of the reference studies. In the present study, the VGG-19 model outperformed all other CNN models for the two classification systems. On DATASET-1, the VGG-19 model achieved 99.5% accuracy, 99.2% F1 score, 97.6% sensitivity for COVID-19 cases, and 100% sensitivity for healthy cases; on DATASET-2, it achieved 100% accuracy, 100% F1 score, 100% sensitivity for COVID-19 cases, and 100% sensitivity for healthy cases. Furthermore, in the three-classifier system, the VGG-19 model outperformed the other models. On DATASET-1, the model achieved 98.6% accuracy, 98.6% F1 score, 98.4% sensitivity for COVID-19 cases, 99.4% for healthy cases, and 97.8% sensitivity for pneumonia cases; on DATASET-2, the model achieved 96.2% accuracy, 96.7% F1 score, 100% sensitivity for COVID-19 cases, 95.2% sensitivity for healthy cases, and 95.5% sensitivity for pneumonia cases.
In future work, we plan to extend the dataset by adding more COVID-19 cases and including different types of images like CT scanning images. In addition, we plan to use more epochs. Conflicts of Interest: The authors declare that they have no conflicts of interest to report regarding the present study.