|Computer Modeling in Engineering & Sciences|
A Survey on Machine Learning in COVID-19 Diagnosis
1School of Educational Science, Nanjing Normal University, Nanjing, 210023, China
2School of Informatics, University of Leicester, Leicester, LE17RH, UK
*Corresponding Author: Zhihai Lu. Email: firstname.lastname@example.org
#These authors contributed equally
Received: 30 May 2021; Accepted: 09 August 2021
Abstract: Since Corona Virus Disease 2019 outbreak, many expert groups worldwide have studied the problem and proposed many diagnostic methods. This paper focuses on the research of Corona Virus Disease 2019 diagnosis. First, the procedure of the diagnosis based on machine learning is introduced in detail, which includes medical data collection, image preprocessing, feature extraction, and image classification. Then, we review seven methods in detail: transfer learning, ensemble learning, unsupervised learning and semi-supervised learning, convolutional neural networks, graph neural networks, explainable deep neural networks, and so on. What's more, the advantages and limitations of different diagnosis methods are compared. Although the great achievements in medical images classification in recent years, Corona Virus Disease 2019 images classification based on machine learning still encountered many problems. For example, the highly unbalanced dataset, the difficulty of collecting labeled data, and the poor quality of the data. Aiming at these problems, we propose some solutions and provide a comprehensive presentation for future research.
Keywords: COVID-19 diagnosis; machine learning; deep learning; deep neural network
The novel coronavirus pneumonia broke out in 2019 . The pathogen is identified as a new enveloped ribonucleic acid-β (RNA-β) coronavirus, and it is similar to Severe Acute Respiratory Syndrome Coronavirus (SARS-CoV), now named SARS-CoV-2 . The novel coronavirus pneumonia is transmitted through interpersonal transmission , and the recent emergence of large numbers of infected people without initial symptoms of infection accelerates the spread of the disease , the surge in patients has put a lot of pressure on medical institutions . On March 11, 2020, the World Health Organization (WHO) announced that Corona Virus Disease 2019 (COVID-19), an acute respiratory syndrome, is pandemic . WHO recommended people avoid close contact with infected people and wash their hands frequently, especially after direct contact with patients . At the same time, different countries imposed border restrictions, flight restrictions, social distancing, and increased awareness of hygiene .
COVID-19 is an acute resolved disease with a case fatality rate of 2% . Its clinical symptoms mainly include fever, cough, headache, and breathing difficulty . And studies have shown that patients with high blood pressure are at greater risk of death, followed by those with diabetes or heart disease , and children are usually less symptomatic than adults, but young children and infants are vulnerable . Currently, there is no specific drug. Reverse-transcription polymerase chain reaction (RT-PCR) is the current standard test for the COVID-19 diagnosis , and it is to collect samples by nasopharyngeal swab or laryngopharyngeal swab . RT-PCR is a genetic test in which RNA is reverse transcribed into complementary deoxyribonucleic acid , but this method has some limitations, especially in middle and low-income countries, including time consumption, high cost, and shortage of the kit . Since the viral load of SARS-CoV-2 in respiratory samples decreases with the prolongation of the disease course, RT-PCR may produce false negative results . Some research revealed that RT-PCR positive rate for pharyngeal swabs was about 30%–60% at initial detection , but chest CT images have a sensitivity of 97% and an accuracy of 68% in the diagnosis of COVID-19 . In general, doctors gain significant information and make a diagnosis with CT scan images or X-ray images, and the process is faster, cheaper, and more readily available than RT-PCR . However, a large number of medical images must be evaluated by doctors in a short period of time, which may increase the probability of misclassification . So, artificial intelligence is increasingly used in the diagnosis of reference . Deep learning plays an important role in this diagnosis, which includes plenty of training methods and models. By reading the literature, we know that the learning methods are varied, such as traditional methods, transfer learning, multi-task learning, end-to-end deep learning, and so on. In addition, convolutional neural networks (CNNs), graph neural networks (GNNs), and explainable deep neural networks (xDNNs) have been widely used in researches. However, with so many methods, the classification accuracy of each one is different. Through reading the literature about the diagnosis of the COVID-19, it was found that the following seven learning methods (transfer learning, ensemble learning, unsupervised learning and semi-supervised learning, convolutional neural networks, graph neural networks, explainable deep neural networks, and so on) are more commonly used in the diagnosis of the COVID-19. Moreover, most experiments encountered some problems, such as the highly unbalanced dataset, the difficulty of collecting labeled data, and the poor quality of the data. Therefore, we summarize these seven learning methods in this paper, and propose solutions to these problems encountered.
In this paper, we will introduce the procedure of diagnosis COVID-19 based on machine learning in Section 2. Then we will introduce seven methods and the corresponding literature in detail in Section 3. Finally, the future research directions of COVID-19 diagnosis methods will be concluded in Section 4. To help understand more clearly, we provide a table with all abbreviations and full names as Table 1.
2 Computer-Aided Diagnosis (CAD) in COVID-19
CAD is becoming more and more popular in the field of medical diagnosis. It is a method that uses machine learning to analyze the image or non-image dataset to diagnose the patient's condition. The method can be used as an aid in the decision-making process of clinicians and reduce doctors’ stress in disease diagnosis , particularly when COVID-19 breaks out in the world in a short time. Machine learning algorithms could detect underlying patterns through training datasets, then make some predictions with the best fit parameter . The procedure of COVID-19 classification based on machine learning is shown in Fig. 1.
Despite that CAD has made great contributions to clinical practice, there are still many problems in the COVID-19 diagnosis, leading to the low generalization of models and the failure of diagnostic accuracy to meet the clinical application. For example, during the data collection stage, labeled data are difficult to collect, and the dataset is highly unbalanced. During the data preprocessing stage, the size and protocol of images are inconsistent. In the feature extraction stage, COVID-19 images have similar features to viral pneumonia images, making it difficult to accurately extract feature information. In the classification stage, there are many types of classifiers such as support vector machine (SVM), k-nearest neighbor (K-NN), random forest, Bayesian network, Gaussian network, and so on, so the selection and optimization of classifiers are a challenge .
2.1 Medical Data
In the COVID-19 diagnostic research based on machine learning, X-ray images, CT images, and lung ultrasonography (LUS) images are the main medical images. Chest CT imaging is a widely available, time-saving, and noninvasive method for the detection of COVID-19 , and the characteristic of COVID-19 CT images as ground-glass opacities at the early stage, air space consolidation during the peak stage, bronchovascular thickening in the lesion stage, and traction bronchiectasis are visible during the absorption stage . So, medical practitioners can distinguish different features from early to late stages and distinguish the asymptomatic patients via chest CT images . Despite some advantages of CT scans, CT scan machines are difficult to clean . Compared with CT images, the X-ray images of COVID-19 patients did not show obvious symptoms in the early stage; hence the early chest X-ray images of COVID-19 patients could be easily misdiagnosed . However, with the disease progressing, COVID-19 gradually manifests as a typical unilateral patchy infiltration involving the middle, upper or lower zone of the lungs, occasionally with evidence of a consolidation . And an open image library was set up on GitHub by Mohammadi et al. , consisting of a large number of COVID-19 chest X-ray images with new images added regularly. In recent years, in addition to X-ray images and CT images, LUS images have also been used for medical diagnosis , which is cheap and safe. And LUS images have minimal infection spreading risks since they can be used at the bedside of patients, without the need to go to the public examination room . Carrer et al.  conducted experiments using LUS from Italian COVID-19 LUS Database. They proposed an automatic pleural line location method based on the hidden Markov model and Viterbi algorithm for LUS data. The findings suggest specific LUS characteristics and imaging biomarkers for COVID-19 patients, so the LUS can be used to the COVID-19 diagnosis .
2.2 Images Preprocessing
Due to the short duration of the COVID-19 outbreak and the difficulty of obtaining medical images, many datasets are highly unbalanced. Therefore, most researchers proposed many data augment methods, such as traditional data augment methods of rotation and zoom , conditional generative adversarial nets method , and two-stage data argument technique . Conditional generative adversarial nets were composed of generator network and discriminator network, and the method could enlarge the dataset by ten times. As for the two-stage data argument technique, the shallow image enhancement method was used in the first stage, and the synthesis of a few oversampling methods was used in the second stage. And a federated learning platform and the dual-sampling algorithm were proposed by Wang et al.  and Ouyang et al.  to solve the problem of unbalanced datasets.
In addition, the X-ray images or CT images are acquired from different hospitals and agencies, so the images are different in their sizes and standards. Heidari et al.  generated a pseudo-color image to improve the classification accuracy via two image preprocessing steps. They collected 8,474 COVID-19 chest X-rays from several publicly available image databases. Moreover, they removed the aperture area of the image and divided the original image into the binary image. Then the morphological filter was utilized to smooth the boundary. Finally, the processed binary image was mapped to the original image. Heidarian et al.  used the U-Net network to remove noise and artifacts of CT images. In their experiments, the bilateral low-pass filter was adopted to remove noise, and the Gaussian low-pass filter was used when calculating the weight. Then they used the histogram equalization method to normalize the image. Their experimental results showed that the two preprocessing steps improved the classification performance of the model with a classification accuracy of 98.40%, while the classification accuracy was only 88.00% without two preprocessing steps. Similarly, Togacar et al.  proposed a new preprocessing technique, namely fuzzy technique, and stacking technique. The fuzzy technique played an important role in the image analysis step, and the results depended on the similarity or difference functions used for color separation, and the input values were RGB three variables. And the stacking technique was adopted to improve the quality of images, including original images and fuzzy color images. Zhou et al.  proposed a rapid, accurate, and machine-agnostic quantification method to quantify the CT images from different sources. The method included two steps, the first one was spatial normalization, and the other was signal normalization. The two steps resolved the heterogeneity problem of CT scan images. In addition, contrast limited adaptive histogram equalization and normalization were used frequently as important techniques of image preprocessing, and the image data was preprocessed and local features were extracted by exploiting the frequency and texture regions to generate a feature pool . The literature with images preprocessing is as Table 2.
2.3 Features Extraction
Features extraction is a very important part of medical diagnosis. Gray level co-occurrence matrix, local binary gray level co-occurrence matrix, gray level run length matrix, as well as segmentation-based fractal texture analysis and synthetic minority over-sampling techniques, were used by Ozturk et al. . They proposed a framework, which was based on combining feature vectors produced by four feature extraction methods and then reproduced with over-sampling and augmentation methods.
Since many studies have found that viral pneumonia and COVID-19 are similar in terms of pulmonary infection area and infection characteristics , Haralick feature was adopted to extract texture features from chest X-ray images . Haralick feature could derive textual feature measures from the co-occurrence matrix and could provide information about how the intensity of a particular pixel is related to the intensity of adjacent pixels. Compared with the existing methods, Haralick feature method was relatively easy to extract features. Zargari et al.  used texture, gray level co-occurrence matrix, gray level difference method, fast Fourier transform, and wavelet to calculate image features. A total of 252 chest X-ray image features were extracted by their feature extraction scheme, including 14 features from texture, 14 features from fast Fourier transform, 56 features from gray level co-occurrence matrix, 56 features from gray level difference method, and 112 features from wavelet. Then, the matrix of feature combination was obtained by Pearson correlation coefficient. Finally, the best feature vector was obtained by the principal component analysis method. Ismael et al.  used wavelet transform, shearlet transform, and contourlet transform to decompose the chest radiograph images and then normalized feature extraction from the decomposed chest X-ray images.
Hussain et al.  applied five classifiers (XGB-L, XGB-Tree, classification and regression tree, K-NN, and Naive Bayes) to classify the chest X-ray images of COVID-19, bacterial pneumonia, non-COVID-19 viral pneumonia, and health. In addition, gray level co-occurrence matrix technique was adopted to extract texture features of images, and the morphological feature-extracting method was utilized to obtain morphological features. The results showed that the combined effect of texture features and morphological features made the model classification accuracy higher than that of single feature extraction, and in binary classification, the accuracy of the K-NN classifier was lower than other classifiers. The flow of data and analysis is as Fig. 2.
The literature with features extraction is as Table 3.
There are many types of traditional classification methods, such as SVM, K-NN, decision tree, random forest, AdaBoost, XGBoost, and Bagging. Tabrizchi et al.  utilized EL to improve the classification accuracy of COVID-19. They used SVM, artificial neural network (ANN), Naive Bayes, and CNN for classification. Among them, the SVM is good at classifying non-linear problems, and ANNs can analyze features of samples and make predictions, and multilayer perceptron is distributed in the connected layer of ANN. Naive Bayes is a famous probabilistic classifier, which estimates the previous probability by means of posterior probability and conditional probability density function, and then makes the final prediction with the maximum posterior probability. The results showed that SVM outperformed other classifiers with an accuracy of 99.00%. Likewise, Turkoglu  used AlexNet as a pre-training model for transfer learning, and SVM as a classifier. They were trained and tested in the COVID-19 X-ray dataset and achieved a classification accuracy of 99.18. In addition, K-NN is commonly used in image classification. Arslan  used the K-NN for COVID-19 diagnosis. They utilized genomic sequences as datasets. The accuracy of K-NN mainly depends on the distance measurement. Five groups of measurements were used in their experiments, and the results showed that the best group of distance measurements achieved the classification accuracy of 98.40%. Yasar et al.  also utilized SVM, K-NN as classifiers. Their experimental results showed that SVM was superior to KNN.
The classification effect of a single classifier and multi-mode classification can be compared, and the multi-mode classification effect is better than that of a single classifier. Xu et al.  used 43 combination features to distinguish four types of pneumonia, including non-severe, severe, healthy and viral pneumonia, and completed diagnosis by three classification methods, namely k-nearest neighbor, random forest, and support vector machine. Mohammed et al.  also compared five classification methods, including SVM, linear kernel and radial basis function, K-NN, decision tree, and CNNs. The results showed that CNNs completed good performance, among which ResNet50 achieved the optimal classification accuracy of 98.80%. Secondly, compared with the traditional classification methods, the SVM method achieved a classification accuracy of 95.00%. Abraham et al.  used a Bayesian classifier to classify COVID-19. In their experiments, several pre-trained networks were applied to extract features from chest X-ray images, and then the images were classified by combining correlation-based feature selection and a Bayesian classifier.
Besides these traditional classifiers, most novel classification methods were proposed by researchers. Pokkuluri et al.  proposed a hybrid nonlinear cellular automata classifier and compared it with traditional methods, such as long-short term memory (LSTM), Adaboost, SVM, regression, etc. An extreme learning machine (ELM) has a strong anti-overfitting ability and can be used as a kernel-based support vector machine with the structure of neural networks. The hybrid nonlinear cellular automata classifier had reported an accuracy of 78.80%. The proposed classifier can also predict the rate at which this virus spreads, transmission within the boundary. Albadr et al.  used an Optimized Genetic Algorithm-Extreme Learning Machine for diagnosis. Their experimental result showed that the Optimized Genetic Algorithm-Extreme Learning Machine achieved 100.00% accuracy with fast computation time. Moreover, El-Kenawy et al.  proposed a voting classifier based on particle swarm optimization algorithm, which aggregated the prediction results of different classifiers to select the category with the highest voter turnout.
Medical image segmentation plays a critical role in the training of the models. Extra areas are removed from the lung image so that only infected parts of the lung can be treated effectively . In image segmentation, the selection of the best threshold value is very important to the filtering process . However, the threshold value varies from image to image. Therefore, Shankar et al.  applied the Gaussian Filtering method to preprocess medical images. Gaussian Filtering is a common method to improve image quality by smoothing images in medical image classification . Elaziz et al.  proposed a hybrid approach that combines the features of the marine predator algorithm and the moth-flame optimization. The hybrid approach performed better in image segmentation. Tiwari et al.  proposed an image segmentation approach based on the marine predator algorithm. In their research, they conducted X-ray segmentation experiments with this approach and found that the approach achieved good performance. El-bana et al.  used multi-modal learning to fine-tune the InceptionV3 architecture and proposed pulmonary nodule detection to improve the segmentation accuracy of lung infections in CT scans. Furthermore, the contrast limited adaptive histogram equalization method was used for enhancing small details, textures and local contrast of the images. Their experimental results demonstrated an increase of approximately 2.5% and 4.5% for the dice coefficient and mean intersection-over-union, respectively, while achieving 60% reduction in computational time, compared to the recent literature. Wang et al.  proposed a new noise reduction algorithm. This algorithm was the extension of the dice-loss algorithm and the mean-absolute error loss algorithm. In addition, they built a novel COVID-19 Pneumonia Lesion segmentation network to deal with the lesions with various scales and appearances. Their experiments indicated that the proposed network achieved higher performance than state-of-the-art image segmentation networks. The literature with classification is as Table 4.
3 Diagnosis Techniques Based on Deep Neural Networks in COVID-19
3.1 COVID-19 Diagnosis Based on Transfer Learning
In many studies, deep learning models have shown superior performance to classical machine learning models . But, deep learning models have some drawbacks. For example, they need a large number of labeled images for training that is costly and time-consuming. Transfer learning just makes up for this shortcoming ; this is a technique that reuses pre-trained models for a new related problem with fewer data and low complexity . Mohammadi et al.  utilized four pre-trained networks to construct models, including VGG16, VGG19, MobileNet, as well as InceptionResNetV2, and compared their classification performance. These models were trained and tested in the X-rays dataset, and the results showed that the MobileNet provided the highest classification accuracy.
Ardakani et al.  compared ten well-known convolutional neural networks, including AlexNet, VGG16, VGG19, SqueezeNet, GoogleNet, MobileNetV2, ResNet18, ResNet50, ResNet101 and Xception. In their experiments, ResNet101 and Xception worked well. ResNet101 and Xception had the same area under the curve (AUC) of 99.40%, and Xception performed better than ResNet101 in specificity, but ResNet101 achieved a higher sensitivity. Chowdhury et al.  also trained networks using transfer learning (TL) and compared the performance of networks with and without data augment. These deep learning networks were MobileNetv2, SqueezeNet, ResNet18, InceptionV3, ResNet101, CheXNet, VGG19, and DenseNet201. These data augment methods were rotation and translation. The dataset consisted of 423 COVID-19 chest X-ray images, 1485 pneumonia chest X-ray images, and 1579 healthy chest X-ray images. The results showed that DenseNet201 performed better than other networks while with data augmentation, and CheXNet achieved the highest AUC and sensitivity when without data augment. Narin et al.  proposed an end-to-end structure, which did not require manual feature extraction, selection, and classification. Five pre-trained networks were trained, namely ResNet50, ResNet101, ResNet152, InceptionV3, and InceptionResNetV2. Through their experiments, it was clear that ResNet50 performed better than other networks. Ismael et al.  collected 180 COVID-19 chest X-ray images and 200 healthy chest X-ray images to train CNN. This CNN training process consisted of three parts: deep feature extraction, fine-tuning of pre-trained convolutional neural networks, and end-to-end training of CNN, as Fig. 3.
Five pre-trained deep networks were used for feature extraction, including ResNet18, ResNet50, ResNet101, VGG16, and VGG19. SVM with quadratic, linear, cubic, and Gaussian kernel functions were used for classification. The results showed that the classification performance of the SVM with cubic kernel function was better than other kernel functions.
Murugan et al.  proposed a new framework based on an ELM classifier. In their experiments, ResNet50 was used as an image preprocessing network, which would ignore some details in the processing of small-size images, thus reducing the diagnostic accuracy. Marques et al.  proposed a CNN based on EfficientNet. EfficientNet can extend the baseline ConvNet to any target resource constraint while maintaining the model efficiency for transmitting learning datasets. In general, the EfficientNet model offers more accuracy and efficiency than existing CNN models such as AlexNet, ImageNet, GoogleNet, and MobileNetV2 . EfficientNet includes models ranging from B0 to B7, each with different parameters. The author used EfficientNetB4 with 19 M parameters. The results were that the proposed CNN achieved 99.62% accuracy in binary classification. Tammina  built a novel network called CovidSORT. This network was developed by the TL with six pre-trained networks (InceptionV3, VGG16, VGG19, ResNet50, DenseNet121, and MobileNetV2). The novel network achieved an accuracy of 96.83% based on the majority voting method of these models. Sun et al.  proposed a model based on adaptive feature selection guided deep forest. Adaptive feature selection guided deep forest is a method of obtaining a high-level representation of specific positional features from CT images. Each layer of the deep forest was composed of N independent random forests and a feature selection unit, among them, each random forest produced a probability distribution vector of COVID-19 and community acquired pneumonia (CAP). Then, they concatenated the N probability distribution vectors with the input feature vector, and calculated the feature importance for each feature within the input feature vector to reduce the redundancy of the features. The feature importance for x-th feature is as follows:
where is the feature importance for x-th feature in n-th random forest. Then average the x-th feature of all random forests. They discarded the features with low feature importance by a specific ratio based on the calculated feature importance. They collected 2522 CT scan images, including 1495 COVID-19 images and 1027 CAP images. First, they used VB-Net  to divide these images into infected regions and normal fields. Then, adaptive feature selection guided deep forest was adopted to learn the features from COVID-19 patients’ images. Finally, a diagnostic prediction value is generated at the last layer of the model. In the last layer, each forest would produce a probability distribution p for the identification of COVID-19. For each subject, they used the following equation to ensemble the predicted value for COVID-19,
where is the probability of subject belongs to category c (i.e., COVID-19 or CAP) that was provided by the n-th forest in the last layer. They evaluated the model in accuracy, sensitivity, specificity, AUC, precision as well as F1-score, and the results were 91.79%, 93.05%, 89.95%, 96.35%, 93.10%, and 93.07%, respectively, but the model is only made available for COVID-19 and CAP images. Horry et al.  adopted VGG19 as the backbone architecture and collected CT, X-ray as well as LUS from several agencies. An image preprocessing stage was utilized to normalize these images and reduce unwanted signal noise such as non-lung area visible in X-rays, and thereby reducing the impact of sampling bias on their experiments. The results were that LUS provided superior detection accuracy compared with X-rays and CT scans. The experimental results showed that most deep learning networks were difficult to train well under the condition of limited data and provided poor consistency across the three imaging modes.
Do et al.  used five architectures as the model backbones, including VGG16, VGG19, InceptionV3, InceptionResNetV2, and Xception. The results indicated that the VGG16-based model performed best with an accuracy of 97.00%, and these performances of the Xception-based model and InceptionV3-based model were slightly lower. Chaudhary et al.  decomposed the chest X-ray images into subband images via the Fourier-Bessel series expansion-based dyadic decomposition method. These subband images were then fed as input to ResNet50, a pre-trained network that was trained on ImageNet. Their experimental results indicated that the proposed method increased classification accuracy, and the ResNet50 performed better with a classification accuracy of 98.66%. Karar et al.  proposed a cascaded deep learning classifier. First, they used a series of binary classifiers to simplify the complex multi-label classification of X-ray images, then fine-tuned VGG and ResNet by stochastic gradient descent optimizer. 306 chest X-ray images were collected, including bacterial pneumonia, COVID-19, viral pneumonia, and healthy chest X-ray images. In the preprocessing step, perceptual adaptation of the image was applied to improve the quality of images. Their experiments indicated that the performance of the cascaded deep learning classifier was superior to other multi-label classifiers for COVID-19 and pneumonia in previous studies. Sixteen pre-trained networks were trained through transfer learning , including SqueezeNet, GoogLeNet, InceptionV3, DenseNet201, MobileNetV2, ResNet18, ResNet50, ResNet101, Xception, InceptionResNetV2, ShuffleNet, NasNetMobile, NasNetLarge, AlexNet, VGG16, and VGG19. In their experiments, CT images were used. Among these, 80% of the images were used for training and the rest for testing. The results showed that the DenseNet201 was the deepest, and it achieved the best performance in accuracy, sensitivity, and F1 score. Polat et al.  used TL to retrain three networks, including ResNet, DenseNet and VGG. And, the class activation mapping (CAM) method was applied to create the activation map that highlights key areas of the radiographs to improve causality and comprehensibility. The experimental results showed that the final model optimized by DenseNet161 structure had the best performance and achieved the classification accuracy of 97.10%. A distant domain transfer learning diagnosis method was proposed by Niu et al. , which consisted of two parts: the U-Net segmentation model and the distance feature fusion classification model. Distant domain transfer learning is a newly introduced transfer learning method that differs from the traditional TL method in that it mainly solves the negative transfer problem caused by the loose relationship between the source domain and the target domain . The proposed algorithm achieved 96.00% classification accuracy, which was 13% higher classification accuracy than “non-transfer” algorithms, and 8% higher than existing transfer and distant transfer algorithms. Do  proposed a bundled transfer learning to detect COVID-19 cases from pneumonia and healthy patients. Features extracted from each pre-trained model were collected and used to train new layers. Ibrahim et al.  used the pre-trained AlexNet to classify X-rays and performed binary classification and multi-classification for COVID-19 pneumonia, non-COVID-19 viral pneumonia, bacterial pneumonia, and healthy patients. For X-rays for classifying COVID-19 pneumonia and non-COVID-19 pneumonia, the accuracy of the network was 99.62%, and the accuracy of the multi-classification was 93.42%. Dastider et al.  used the pre-trained ResNet152V2 as the backbone architecture of the ResCovNet. The dataset consisted of 7400 chest X-ray images, including COVID-19, viral pneumonia, bacterial pneumonia, mycoplasma pneumonia and healthy chest X-ray images. They used Otsu's thresholding to enhance the characteristics of the classification network on the preprocessing stage, and used ReLU function as the activation function and softmax function as the prediction function. Later, the model obtained 88.00% accuracy in multi-classification. Canayaz  proposed MH-COVIDNet, which applied four pre-trained networks (AlexNet, VGG19, GoogleNet, and ResNet) to complete feature extraction tasks. The binary particle swarm optimization algorithm and the binary gray wolf optimization were adopted to select the best potential features. The overall accuracy of this network was 99.38%.
Above all, these studies are only made available for classifying COVID-19 from other pneumonia patients or healthy cases, but cannot distinguish the stage of the COVID-19. Yu et al.  used the TL method to diagnose the stage of COVID-19 cases. Taking the advantages of pre-trained deep neural networks, InceptionV3, ResNet50, ResNet101, and DenseNet201 were exploited to extract the features from CT scans. A total of 729 2D axial plan slices with 246 severe cases and 483 non-severe cases were employed. Then, these features were fed to classifiers, and classifiers include linear discriminant, cubic SVM, linear SVM, K-NN, and Adaboost decision tree. The experimental results demonstrated that the DenseNet201 with cubic SVM achieved the best performance in COVID-19 severity screening. The DenseNet201 with cubic SVM achieved the highest severity classification accuracy of 95.20% and 95.34% for tenfold cross-validation and leave-one-out, respectively. The literature with transfer learning is as Table 5.
3.2 COVID-19 Diagnosis Based on Convolutional Neural Networks Training from Scratch
CNNs play an important role in clinical diagnosis. In the diagnosis of COVID-19, CNNs training from scratch have been favored by medical researchers. Wang et al.  introduced a new network—Covid-Net, which is an opening source for the general public. The Covid-Net combined lightweight projection-expansion-projection-extension with selective long-range connectivity. These methods enhanced representational ability while maintaining reduced calculation complexity. Leveraging a large number of long-range connections in densely connected deep neural networks would lead to increased computational complexity. Therefore, long-range connections were adopted in a sparing manner, and four convolution layers were leveraged as central hubs of long-connected much later layers in the network. Covid-Net was said to be the first open-source network design to detect COVID-19 from X-rays, which encouraged repeatability. Waheed et al.  designed CovidGAN to generate more images, which was based on an auxiliary classifier generative adversarial network (ACGAN). The ACGAN is as Fig. 4.
ACGAN used the class label c and noise z to each produced sample, and then the generator G leveraged them to produce images. Later, a distribution of both class labels and sources was given by the discriminator D. They utilized ACGAN transforms to predict a particular image's class labels and allowed the generation of high-quality images while learning a representation independent of the class labels. Then they built the CovidGAN, which was a complete architecture with a generator and a discriminator. The structure map of CovidGAN is as Fig. 5.
CovidGAN utilized two loss functions, with binary-CrossEntropy in the first layer and sparse-categorical-CrossEntropy in the second layer. The dataset, 1124 chest X-ray images, were obtained from three publicly accessible datasets. The CovidGAN produced more composite images, which improved the classification accuracy to 95.00%. While without these composite images, the accuracy was only 85.00%. A novel CNN namely CoroDet, was proposed by Hussain et al. , which achieved 99.10% and 91.20% classification accuracy in the binary classification (COVID-19 and healthy chest images) and the multi-classification (COVID-19, non-COVID bacterial pneumonia, non-COVID viral pneumonia, and healthy chest images). They used a flatten layer that transforms the entire pooled feature map matrix into a single column. Moreover, three activation functions were used in their network, sigmoid function, ReLU function, and leaky ReLU function. Singh et al.  discussed the multi-objective differential evolution–based CNNs to classify COVID-19-infected patients from chest CT images. Their research designed a multi-objective fitness function as:
here, defines the sensitivity, and defines the specificity parameters. And the sensitivity is evaluated by confusion matrix and estimated as:
here, and define true-positive values and false-negative values, respectively. And defines the proportion of actual negatives, and it is mathematically evaluated as:
here, defines true-negative values and defines false-positive values. In their network, the ReLU activation function was utilized to learn complex functional mappings among the inputs and response variables. And the proposed CNN was compared with an ANN. Extensive experimental results revealed that the proposed model outperformed competitive models. Mahmud et al.  built a network based on residual units and shifter units. They applied stacking of multiple networks to improve the accuracy and used gradient-guided class activation mapping (Grad-CAM)  to locate the infected region of COVID-19 cases. In binary classification, the network achieved an accuracy of 97.40%. Abbas et al.  proposed a new model, called Decompose, Transfer, and Compose, which could handle any data irregularities (e.g., overlapping classes) by studying the class boundaries using a class decomposition mechanism. First, the local features of chest X-ray images were extracted by the pre-trained networks, and the data structure was simplified via the class decomposition layer of Decompose, Transfer, and Compose. Next, a gradient descent method was used for optimization. Finally, a class composition layer was adopted to refine the final classification. The model achieved a classification accuracy of 93.10%. Using the optimized CNN, the image was tested in less than 5 s with an accuracy of 97.78%. Ahmed et al.  proposed an end-to-end network based on a residual-structure network. The network consisted of a multilevel preprocessing filter block, a multilevel feature extractor, and a classification block. And the global average pool was applied to compress the feature space in the classification block. Since the small number of COVID-19 chest X-ray images in the dataset, their experiments adopted the weighted classification CrossEntropy as the loss function of the classification network when dealing with a multi-classification problem. The calculation of the function is as follows:
where represents the basic truth value of class j image i with a value range of 0 to 1, and represents the softmax prediction for the image i of class , represents the weight of the image i of class, represents the number of images of class , represents the number of images in a batch, and represents the total number of classes. They used two common data sets for training and testing, CheXpert Dataset and COVIDx Dataset. The experimental results showed that the classification accuracy of the model was 97.48%, and the sensitivity was 96.39%. Karthik et al.  proposed a new CNN architecture, which could learn the unique convolutional filter pattern for each kind of pneumonia and could automatically learn the features between infected and healthy chest X-ray images. The architecture consisted of two structures, channel-shuffled dual-branched CNN and channel-shuffled dual-branched CNN with a distinctive filter learning paradigm. They used the weighted gradient of the target class filter to identify the filter set that was most responsive to a particular class. It was reported that the proposed architecture was the first attempt to learn custom filters in a single convolutional layer, and the proposed CNN achieved an accuracy of 99.80%. Siddhartha et al.  came up with a way called COVIDLite, which combined the depth-wise separable convolutional neural network with white balance, and contrast limited adaptive histogram equalization. Among them, the depth-wise separable convolutional neural network was applied to classify the images, and it divided the convolution operation into two independent operations: deep or spatial convolution and sequential point-by-point convolution . The contrast limited adaptive histogram equalization method was adopted to enhance the visibility of the X-ray images in the preprocessing step. The depth-wise separable convolutional neural network trained using sparse cross-entropy was used for image classification with lesser parameters and significantly lighter in size, i.e., 8.4 MB without quantization. The way helped doctors shorten the diagnosis time and achieved a good classification performance. The proposed method achieved higher accuracy of 99.58% for the binary classification, whereas 96.43% for the multi-class classification and out-performed various state-of-the-art methods. Keles et al.  developed two new diagnostic networks named COV19-CNNet and COV19-ResNet. They collected 910 chest X-ray images, including COVID-19, viral pneumonia, and healthy chest X-ray images. Experiments showed that COV19-CNNet achieved the multi-classification accuracy of 94.28% and COV19-ResNet achieved the multi-classification accuracy of 97.61%. Two diagnostic COVID-19 architectures were proposed and compared by Aslan et al. . One was an image segmentation architecture based on ANN. The other was a hybrid architecture based on bidirectional long-short term memories. The results showed that the classification accuracy of the first architecture was 98.14%, and the second hybrid structure classification accuracy was 98.70%. Similarly, Pustokhin et al.  proposed a new residual network-based Class Attention Layer with Bidirectional LSTM. The proposed model involved a series of processes namely bilateral filtering based preprocessing, residual network-based Class Attention Layer with Bidirectional LSTM based feature extraction, and softmax based classification. Once the bilateral filtering technique produces the preprocessed image, residual network-based Class Attention Layer with Bidirectional LSTM based feature extraction process takes place using three modules, namely ResNet based feature extraction, Class Attention Learning, and Bidirectional LSTM modules. Finally, the softmax layer is applied to categorize the feature vectors into corresponding feature maps. The model achieved a multi-classification accuracy of 94.88% and a sensitivity of 93.28%. Nour et al.  proposed a CNN based on deep feature extraction and Bayesian optimization. It was a novel serial network consisting of five convolutional layers. They used SVM, K-NN, and decision tree methods as classifiers. The dataset included chest X-rays from the public dataset, and the best classification accuracy of 98.97% was obtained by SVM. Since the variance of a single CNN classifier is usually too high, which leads to poor generalization in practical application, Singh et al.  built a COVIDScreen network. The COVIDScreen was based on the pruning learning algorithm, which solved the problem of generalization and complexity based on multiple CNN learners. And the combination of learners was adopted to improve the inductivity; that is, four CNN learners and Naive Nayes learners were applied to form meta-learners. In addition, they used Grad-CAM visualization to integrate the interpretation to build trust in the medical artificial intelligence system, and the accuracy rate of the model was 98.67%. The literature with CNNs is as Table 6.
3.3 COVID-19 Diagnosis Based on Ensemble Learning
Since many deep learning models work well only under a common assumption: the training and testing data are drawn from the same feature space and the same distribution. When the distribution changes, most models need to be rebuilt from scratch via newly collected training data. However, in many practical applications, it is expensive or impossible to recollect the needed training data and rebuild the models . So, ensemble learning (EL) is a significant technique for enhancing the model classification performance . Wang et al.  used two pre-trained architectures for COVID-19 diagnosis via transfer learning and model integration. The two pre-trained architectures were ResNet101 and ResNet152. After training, the model achieved 96.1% classification accuracy on the testing dataset. Rajaraman et al.  obtained the optimal model by iteratively pruned method and combined the prediction results of the optimal pruning model through different ensemble strategies. Four datasets, PEDIATRIC X-ray dataset , RSNA X-rays dataset , TWITTER COVID-19 X-rays dataset, and MONTREAL COVID-19 X-rays dataset , were collected, and 90% of these images were used for training, and 10% of these images were used for testing. Through their experiments, it could be observed that the combination of iterative model pruning and ensemble learning could improve the prediction accuracy. Attallah et al.  proposed a novel CAD system based on the fusion of multiple CNNs for detecting COVID-19. The CAD system employed four types of CNNs, including AlexNet, GoogleNet, ResNet18, and ShuffleNet. First, they used the end-to-end classification. Then, depth features were extracted separately from each network, and principal component analysis was applied to each depth feature set extracted from each network. Afterward, a certain number of principal components were selected from each depth feature set for fusion. The results showed that the system could reduce the computational cost of the final model by nearly 32%. Wang et al.  proposed a CAD framework based on two deep learning models: the discrimination-DL and the localization-DL. The discrimination-DL used the feature pyramid network as the backbone to compute a convolutional feature map of the input images, and COVID-19 chest X-ray images would be recognized automatically. Due to the imbalanced dataset, they obtained the multi-focus function through the class imbalance focus loss function of binary classification to realize recognition, and the function was calculated as follows:
here, i is the class of X-ray, and is the prediction probability of class, the weighting factor α is 0.25, the tunable focusing parameter is 2, and N is all the number of classes. Then they combined the multi-focal loss function with the softmax function to get an approximate maximum function. The approximate maximum function is defined as:
where x is a vector, and the individual values are the elements of the input vector and can take any real value. The term in the denominator is the normalization term, which ensures that the sum of the output values of the function will equal 1, thus constituting a valid probability distribution. Then, the obtained COVID-19 chest X-ray images would be fed as input data to the localization-DL for automatic detection of the left lung, right lung, or bipulmonary. The DL-based framework could achieve high diagnostic accuracy compared with results from radiologists but lacks interpretability. Compared to the radiologists’ discrimination and localization results, the accuracy of COVID-19 discrimination using the Discrimination-DL yielded 98.71%, while the accuracy of localization using the Localization-DL was 93.03%. Oliveira et al.  combined ResNet50, EfficientNetB7, MobileNetV2, DenseNet121, and MobileNet to build a model. They collected chest X-ray images from COVIDx  and then preprocessed these images with augmentation techniques such as rotation, zoom, vertical, and horizontal. And the results indicated that the model achieved an accuracy of 92.00% in the multi-classification (COVID-19, pneumonia, and healthy chest X-ray images) and achieved an accuracy of 93.50% when distinguishing between COVID-19 chest X-ray images and non-COVID chest X-ray images. Qjidaa et al.  adopted an ensemble classification method for COVID-19 diagnosis. They chose VGG16, VGG19, DenseNet121, MobileNet, Xception, InceptionV3, and InceptionResNetV2 to train, and each network produced a prediction. Then, they leveraged the seven pre-trained networks’ predictions to make a prediction vector and used the majority voting to come up with a final prediction. The results indicated that the model with the ensemble learning technique could improve the classification accuracy, and the proposed classification method achieved the best performance with an accuracy of 99.00% and a precision of 98.60%. Gupta et al.  proposed an integrated CNN, namely InstaCovNet19, which consisted of MobileNet, Xception, InceptionV3, ResNet101, and NASNet. In the feature extraction, these pre-trained networks were used as feature extractors. Since each network had its own unique feature extraction technique, this integrated method was helpful to improve the classification performance. The results showed that the classification accuracy of this model was 99.08% in the multi-classification task and the classification accuracy of 99.53% in the binary classification task. Kechagias-Stamatis et al.  proposed a new structure-fusion deep learning network, which integrated the layers of GoogleNet and ResNet18. In addition, they evaluated the model in two experiments, binary classification, and multi-classification. The classification capability of the structure-fusion deep learning network was evaluated on CT and X-ray datasets with 99.30% and 100.00% classification accuracy, respectively. The results showed that the structure-fusion deep learning network performed better than both single networks. The literature with ensemble learning is as Table 7.
3.4 COVID-19 Diagnosis Based on Unsupervised Learning and Semi-Supervised Learning
The COVID-19 outbreak has placed tremendous pressure on radiologists to read these medical images. Medical practitioners are on the front lines of the epidemic, and studies of their mental health have shown that a significant number of medical practitioners show symptoms of depression, anxiety, and insomnia . Although many models based on supervised learning achieved high classification accuracy, a large number of images with radiologist labels are necessarily needed in the process of training models. However, in such an outbreak situation, clinicians have very limited time to perform the tedious manual labels, which may lead to not adequate labeled images. Therefore, unsupervised learning and semi-supervised learning are becoming more and more popular, which do not need a large number of labeled images.
Li et al.  proposed a dual-track ranking method to train a model based on the self-supervised learning method. They extracted features from negative CT images and COVID-19 CT images and calculated the earth mover's distance between these two kinds of image features for making “difficulty” and “diversity” soft labels. Through the dual-track ranking method, they only used half of the negative samples for training, which reduced the training time and achieved superior classification performance. However, the method is only applicable to binary classification tasks, which is an area that needs to be improved in the future. Roy et al.  raised a novel network, which derived from spatial transformer networks  for the analysis of LUS images. They solved several limitations in some previous researches. First, they used a spatial transformer network to learn a semi-supervised learning localization policy and leveraged an ensemble of multiple state-of-the-art convolutional networks for image segmentation. Second, they predicted the presence of COVID-19 artifacts and a score connected to the disease severity via ordinal regression. The results showed that the spatial transformer network could improve positioning accuracy, and the segmentation network could segment the LUS images from the background with high accuracy. Wang et al.  put forward the DeCoVNet, which is a 3D deep convolutional neural network with semi-supervised learning. The 3D deep convolutional neural network architecture is as Fig. 6.
The proposed network took a CT volume with its 3D lung mask as input and directly output the probabilities of COVID-positive and COVID-negative. First, CT volumes were generated by U-Net. In the first stage, rich local visual information was retained. And in the second stage, 3D feature maps were generated to generate feature maps via two 3D residual blocks. In the third stage, the information in the CT volume was extracted, and then the probability was output. The network was a weakly supervised learning network that only utilized small amounts of images with labels for training. Also, Hu et al.  proposed a new model with semi-supervised learning. A multi-view U-Net was used for lung segmentation, and the segmentation network could deduce the delineation of the lung anatomy of COVID-19, CAP, and nosocomial pneumonia. According to the infected areas between COVID-19 and CAP cases, they proposed a multi-scale learning scheme to cope with variations of size and location of the lesions and then applied a spatial aggregation with a global max pooling operation to obtain categorical scores. Their experiments utilized representational learning on multiple feature levels and explained which features can be learned at each level. Calderón Ramírez et al.  proposed a semi-supervised deep learning model based on Mix Match , in which they used labeled and unlabeled chest X-ray images. The labeled images observed and corresponding labels set was . The unlabeled set was . Then regularized the data and used an unlabeled dataset , and the regularization function was calculated as the following equation:
where w corresponds to the weights of the model to estimate, and correspond to the labeled and unlabeled loss terms, and corresponds to the unsupervised term weight and controls the influence of the unlabeled data during training. In their experiments, they combined regularization and Mix Match for COVID-19 diagnosis. The results showed that when the proportion of unlabeled data is high, the accuracy of the model was improved by about 15%, which suggested that a semi-supervised framework could improve the level of performance of COVID-19 detection when the quantity of high-quality label data was small. King et al.  established an unsupervised network based on self-organizing feature mapping, which could show the influence weight of the image features in classification. Fang et al.  extracted 77 features from the CT lesions. Then, unsupervised clustering and multiple cross-validations were adopted to select key features as input to SVM for classification. However, their experiments used CT images from only one hospital, which reduced generalization. Abdel-Basset et al.  proposed an innovative semi-supervised minority lens segmentation method to segment efficiently from only a small number of annotated CT scans. They used encoder-decoder architecture to extract high-level information. The architecture was mainly composed of a feature encoder module, a context enrichment module, and a feature decoder module. They used the ResNet34 as the encoder backbone for feature extraction. Then, they used a Smoothing Atrus Convolution block and a multi-scale Pyramid Pool block to represent the context enrichment module and proposed an adaptive recombination and a recalculation module that allowed intensive knowledge exchange between paths. The literature with unsupervised learning and semi-supervised learning is as Table 8.
3.5 COVID-19 Diagnosis Based on Graph Neural Networks
The data used in traditional machine learning is Euclidean data, and the most significant feature of Euclidean data is regular spatial structure. However, a large number of images are not following this rule, so that traditional deep learning is not available for this type of data. In recent years, graph neural networks have been used to solve this problem . Its core is that each node is connected with its adjacent nodes , and the change of the node can cause the corresponding changes of the adjacent nodes . For example, the risk of infection can be inferred using temporal and spatial information between people . The GNNs include several types, such as graph convolutional network (GCN), graph attention network (GAT), graph autoencoder, graph generative network, and so on.
Wang et al.  proposed a framework, namely FGCNet, based on CNN and GCN. The CNN provided the individual image-level representation and the GCN produced the relation-aware representation. In addition, they used the Grad-CAM method to provide a visual explanation about the medical diagnosis and increase the reliability of the framework. Most of the hyperparameter values in their study were set by trial-and-error method. The stability factor was set as 10−5. The retention probability was set as 0.5. The pooling size was set to 2. The rank threshold was set to 2. The maximum shift factor was 25, the mean and variance of noise injection were set to 0 and 0.01, respectively. The number of conv layers and fully connected layers were set as 7 and 2, respectively. The number of cluster centroids was set to 256. The number of neighbors in KNN was set to 7. The framework was compared with state-of-the-art methods, and the results indicated that the framework outperformed other methods with high classification accuracy. Sehanobish et al.  proposed a baseline architecture that proved to be better than GAT and GCN. The architecture was composed of transformers and GAT layers, and it can produce new edges via unsupervised learning and self-supervised learning. Their experiments indicated the baseline architecture could provide an explanation about the genes and cells in all phases of the COVID-19 cases. Yu et al.  proposed a new model, namely ResGNet-C. In ResGNet-C, two by-products, named NNet-C and ResNet101-C, were also produced, which showed high classification accuracy in COVID-19 detection. ResNet101-C could extract more representative features for the rest two models NNet-C and ResGNet-C. NNet-C, a one-layer neural network, was a simple classifier that took features extracted by ResNet101-C as input. The ResGNet-C framework is as Fig. 7.
Fully connected (FC)-128 was a transitional layer that prevented significant information loss, which could otherwise happen when inputting the features directly into the final FC-2 layer. In the graph construction, they assumed an edge when a node falls into the first k nearest neighbors of another node, based on the Euclidean distance shown in Fig. 8. Instead of iteratively updating the nodes in the graph, they predefined the number of nodes based on the batch size N, while updating the edges about the adjacency matrix.
The model was evaluated on 296 CT scans, and the results showed that the ResGNet-C achieved a high classification accuracy of 96.62%. They also proposed a network, which was called CGNet . The network consisted of three parts: feature extraction, graph-based feature reconstruction, and classification. They used the pre-trained CNNs for feature extraction and then used the extracted features for feature reconstruction. Finally, they used a single-layer graph neural network, GNet, as a classifier for classification, and the inputs were nodes in the graph and the graph representation. The proposed network was tested on a public CT dataset, and the results showed that the CGNet had a good classification performance and achieved a classification accuracy of 99.00%. Shaban et al.  proposed a hybrid diagnosis strategy, that is, to construct a connected graph of features to represent the weight of this feature and the degree of combination of this feature with other features. The approach could sort selected features by projecting them onto the proposed patient space and inferred predictions by the ranking of features and the degree of combination of this feature with neighborhoods. The hybrid diagnosis strategy consisted of five steps, including fuzzification, normalization, fuzzy rule induction, highly defuzzification, and inference decision making. Normalization was the multiplication of the output of the fuzzification step by the rank of the relevant features calculated in the characteristic rank stage as:
where is the degree of membership corresponding to x fuzzy set, is the rank of the feature corresponding to x fuzzy set, and is the ranking membership degree of the fuzzy set x. Through the normalization step, the ranked membership values for each fuzzy set is normalized to obtain a value between 0.0 and 1.0. The normalized membership value for a fuzzy set x, denotes as can be calculated by.
where is the ranked membership value for the fuzzy set x, and is the ranked membership value for the fuzzy set . Then, the research leveraged the center of gravity method to defuzzification. They adopted numerical laboratory tests as the dataset and obtained an accurate diagnosis result in a shorter waiting time. Their experiments showed that the minimum error value of the method was 2.342%. The literature with DNNs is as Table 9.
3.6 COVID-19 Diagnosis Based on Explainable Deep Neural Networks
Although deep learning networks have made great contributions to medical diagnosis, many networks are black boxes that do not provide credible information . Therefore, explainable deep neural networks are more and more used by researchers. Explainable deep neural networks can provide explanations for the obtained results, and they have more credibility in clinical application.
Angelov et al.  proposed an explainable deep neural network for COVID-19 diagnosis. The proposed xDNN offers a new deep learning architecture that combines reasoning and learning in a synergy. It was non-iterative and non-parametric, which explained its efficiency in terms of time and computational resources. The network was composed of a features layer, a density layer, a typicality layer, a prototypes layer, and a mega clouds layer. In the first layer, they adopted the pre-trained VGG-VD-16DCNN as the backbone for feature extraction. And the core of the xDNN was the prototype layer because it could provide an explainable information. The xDNN performed better in classification and achieved a high classification accuracy. Karim et al.  proposed a novel model, DeepCOVIDExplainer. First, the Perona-Malik filter, histogram equalization, and unsharp were adopted to preprocess chest X-ray images. Then Grad-CAM and layer-wise relevance propagation were utilized to highlight the classification recognition regions. As a result of their experiments, the framework could recognize the COVID-19 cases with a positive predictive value of 89.61%. Brunese et al.  utilized the pre-trained VGG16 to build an explainable model. They provided reliability of model predictions via gradient weighted Grad-CAM algorithm. They collected 6523 chest X-ray images for training and testing, and their experiments showed that the model performed well in COVID-19 classification, and the average accuracy was equaled to 97.00%. Alshazly et al.  proposed a new model, CheXNet, which consisted of 121 layers. The dataset was composed of SARS-CoV-2 CT scans and COVID-19 CT scans. The model employed five pre-trained architectures based on transfer learning, which were SqueezeNet, Inception, ResNet, ResNeXt, Xception, ShuffleNet, and DenseNet. The explanation of CheXNet was reflected in t-distributed stochastic neighboring embedding and Grad-CAM, where they utilized t-distributed stochastic neighboring embedding to project two dimensions of the learned features and then presented the infected areas of COVID-19 cases. The CheXNet displayed the infected localization and provided a better explanation. Ahsan et al.  proposed a framework. The framework could accurately locate the infected area and provided an interpretable connection between inputs and prediction results by the local interpretable model-agnostic explanations. The results showed that the framework achieved an accuracy of 82.94% in CT scans and the accuracy of 93.94% in X-ray images. Panwar et al.  proposed an xDNN based on a VGG19. The use of Grad-CAM increased the visualization and explainability of the xDNN. In addition, two experiments were conducted with pneumonia vs. COVID-19, and COVID-19 vs. healthy cases. In all experiments, they used a color visualization method based on grayscale images to clearly interpret the detected radiological images. The proposed xDNN could detect COVID-19 positive cases within 2 s. 5G networks are on the rise, and 5G is well-known because of its fast features. Hossain et al.  proposed a model based on 5G network features, namely beyond 5G (B5G). The model could diagnose COVID-19 and predict the likelihood of infection in their social groups. The B5G architecture was divided into three layers, a stakeholder layer, an edge layer, and a cloud layer. The B5G achieved a faster diagnosis speed than the 4G network and made a contribution to medical diagnosis.
Since CT scan images are time-consuming for analysis, Wu et al.  proposed a Joint Classification and Segmentation system, which could be used for real-time and interpretable diagnosis. They increased the interpretability of the system by the activating mapping method and used image blending techniques to help the classifier focus on the lesion area in the COVID-19 cases to reduce the probability of overfitting. Their experimental results showed that the sensitivity of the Joint Classification and Segmentation system on the test set was 95.00%. Khincha et al.  proposed visual interpretation and textual interpretation. They used the COVIDx dataset to train and test. The learning rate used was 10−4 and the model was trained for 500 epochs. Their experimental results showed that text representation was more relevant to diagnosis. An explicable risk prediction system based on Additive Trees was built by Casiraghi et al. . The system could support physicians in the early COVID-19 risk assessment through a set of simple and human-interpretable decision rules. Multiple imputation techniques, random forest-based techniques, and maximum likelihood estimation methods were applied to process and analyze the missing data in their experiments. The comparison results showed that the maximum likelihood estimation produced some noise estimation during processing, and the technique based on random forest yielded better performance. In addition, the prediction performance of the explicable risk prediction system was compared with that of the generalized linear model, and the results showed that the proposed system was effective and robust. Jin et al.  established an interpretable diagnosis system, which consisted of five parts, a segmentation network, a slice diagnosis network, a slice location network, a visualization network, and an image phenotypic analysis network. The last two networks were mainly utilized to provide an explicable region through Grad-CAM. The interpretable system provided effective help for doctors’ work, and the system was tested on two public datasets with AUC of 92.99% and 93.25%, respectively. Born et al.  used LUS to diagnose COVID-19 and collected 202 videos from different hospitals and medical facilities. They presented a VGG16-based model and used VGG-CAM to spatiotemporal localization of lung biomarkers. The model was tested on an independent dataset and achieved a sensitivity of 80.60% and a specificity of 96.20%. The literature with xDNNs is as Table 10.
In addition to the above methods, there are some models based on capsule networks or recurrent neural networks (RNNs), as well as new algorithms that are applied to optimize the existing networks. Recurrent neural networks are usually utilized to process time-series data to make predictions about what will happen in the future . One of these structural functions is memory, linking contextual relationships . So, by keeping an eye on the patient's progress, RNNs can predict the severity of the patient and make predictions about the future progress of the patient's disease, which allows doctors to better diagnose the diseases .
The capsule neural network is a kind of neural network that constructs and abstracts subgrids from the neural network. Each capsule focuses on several independent tasks while maintaining the spatial characteristics of the image. In addition, capsule neural networks can complete network training with very little training data, which is incomparable to the traditional neural network . Toraman et al.  came up with a new model based on a capsule network. CNNs use scalar activation functions, while capsule networks use vector activation functions. The accuracy of the proposed model for binary class and multi-class was 97.24% and 84.22%, respectively. CNNs cannot capture the spatial relationship between image instances. However, capsule networks can capture the spatial relationship, and capsule networks require smaller datasets as well as fewer parameters. Heidarian et al.  proposed a novel framework that consisted of a 2D Capsule Network. This framework was initialized by a stack of four convolutional layers along with a batch-normalization and one max pooling layer. The last convolutional layer was then used to feed the subsequent Capsule Layers to extract deeper and smaller feature maps. Two more capsule layers were subsequently added to the framework, where the amplitude of the last one represents the probability of the input image belonging to each target class. In the next step, they aggregated slice-level features extracted by intermediate layers of the described network to move on to the patient-level domain. In their regard, the capsule layer before the last one was used as the representative feature map of the slices. Their framework could be applied for feature extraction of CT images, and it did not need complex image labeling processes. The proposed framework achieved a classification accuracy of 90.80%.
CNN typically performs better with larger datasets than with smaller ones; however, most datasets contain only a few COVID-19 samples, so to solve the dataset imbalance problem, Sakib et al.  proposed an adaptive data augmentation algorithm, being referred to as data augmentation of radiograph images. The algorithm worked by synthesizing many chest X-ray images from a generative adversarial network model, which used two competing neural networks to create new virtual data instances that could be transmitted as real data. The data enhancement method improved the classification accuracy of COVID-19 from 54.55% to 93.94%. Oh et al.  built a patch-based network with a relatively small number of trainable parameters for COVID-19 diagnosis and obtained the final classification result through voting of multiple patch positions on inference results. They designed a segmentation network based on FC-DenseNet103, to extract lung and heart contours from chest radiography images. The CrossEntropy loss was used as the objective function in semantic segmentation, and the calculation formula is as follows:
where represents the indicator function, is the softmax function of the j th pixel in the X-ray image x, s represents the category, represents the weight of the category, and denotes the corresponding ground-truth label. The model obtained good classification results by analyzing the potential imaging biomarkers in X-rays. With the same dataset, the proposed method showed an overall accuracy of 91.90% which was comparable to that of 92.40% for COVID-Net. Furthermore, the proposed method provided significantly improved sensitivity to COVID-19 cases compared to the COVID-Net. In addition, it was also remarkable that the proposed method used only about 10% number of the parameters (11.6 M) compared to that of COVID-Net (116.6 M). Bridge et al.  proposed an activation function for the highly imbalanced dataset. The function was named as generalized extreme value activation function, which was based on generalized extreme value distribution in extreme value theory. The proposed activation function could be added to any CNN model, and their research chose InceptionV3 as the training model because it has high generalizability. The generalized extreme value distribution cumulative distribution function is given by.
The generalized extreme value function is based on three extreme value distributions, when = 0, the function becomes the Gumbel distribution, when > 0, the generalized extreme value distribution becomes the Frechet distribution, and when < 0, the generalized extreme value distribution becomes the Weibull distribution. Moreover, the research compared differences between generalized extreme value function and traditional sigmoid function. And sigmoid function was given by:
where x is the input data, represents the classification result. The function is more suitable for binary classification. They did three experiments: COVID-19 vs. healthy cases, COVID-19 vs. pneumonia vs. health, and COVID-19 vs. no-COVID-19. The results showed that there was no difference in performance between the sigmoid function and the proposed generalized extreme value activation function when the data was balanced, or the imbalance ratio was 1:10 and 1:25. But at the ratio of 1:50, the proposed generalized extreme value function performed better than the sigmoid function in terms of AUC and sensitivity. Rasheed et al.  used two classifiers for COVID-19 diagnosis, including logistic regression and CNN. In order to reduce the overfitting problem, they used a GAN to expand the dataset. In addition, a dimensionality reduction square based on principal component analysis was applied to improve the classification accuracy. Both CNN and logistic regression showed encouraging results for COVID-19 patient identification. The logistic regression and CNN models showed 95.20%–97.60% overall accuracy without principal component analysis and 97.60%–100.00% with principal component analysis for positive cases identification, respectively.
Researchers have proposed many hyperparameter optimization algorithms because the selection of hyperparameters plays a crucial role in classification. For instance, Ezzat et al.  proposed an optimization algorithm called the gravitational search algorithm and built a new network was based on the DenseNet121 network. The gravitational search algorithm could determine the optimal hyperparameters for network architecture training. The results showed that their network achieved a classification accuracy of 98.38%. Ucar et al.  proposed COVIDiagnosis-Net. The COVIDiagnosis-Net was a SqueezeNet, which used fewer parameters and consumed less time in training, and it obtained an optimal hyperparameter by Bayesian optimization. The proposed network achieved a multi-classification accuracy of 98.30% and a binary classification of 100.00%.
Zhang et al.  replaced average and maximum pooling with random pooling, and the conv layer was combined with the batch normalization layer, and the leaky layer was combined with the full connection layer to get the full connection block. Their experimental results showed that random pooling had a better performance than average and maximum pooling, and their network achieved the classification accuracy of 93.64% in the COVID-19 and healthy chest X-ray images classification. A hybrid 2D and 3D network was proposed by Zhang et al. , and the name of the network was Dual spatial and channel Attention Bidirectional ConvLSTM Net. In addition, U-Net was applied to process in-plane context and LSTM was leveraged to integrate cross-plan context. Their network showed an excellent performance in the image segmentation and the AUC of the proposed network for disease progression prediction reached 93.00%. Javor et al.  developed a new deep learning method derived machine learning model with the same architecture and hyperparameters as the original model to identify the image sources, and the model evaluated a possible bias from recognition of the various images sources. The original model was ResNet50. The proposed model was tested on an independent dataset with an accuracy of 95.60%. Victor et al.  compared the performance between CNN and ResNet. The results showed that a good learning effect could be obtained by training ResNet from scratch without the aid of transfer learning. Amyar et al.  proposed an automatic COVID-19 classification and segmentation tool based on multi-task learning to identify COVID-19 lesion segments from the chest CT images. The proposed tool was evaluated using a dataset of 1369 patients. The results showed that the area under the receiver operating characteristic curve of classification was higher than 97.00%. Goel et al.  proposed a new architecture that was composed of optimized feature extraction and classification components. The Grey Wolf Optimizer algorithm was used to optimize the hyperparameters for training the CNN layers. The proposed model was tested and compared with different classification strategies utilizing an openly accessible dataset of COVID-19, pneumonia, and healthy chest images. The presented optimized CNN model provided accuracy, sensitivity, specificity, precision, and F1 score values of 97.78%, 97.75%, 96.25%, 92.88%, and 95.25%, respectively, which were better than those of state-of-the-art models. The literature with other optimizations is as Table 11.
4 Limitations and Conclusion
In this paper, the diagnostic methods of COVID-19 are reviewed and summarized. Through the comparison of diagnostic methods, the following conclusions can be drawn. (i) Transfer learning and ensemble learning play an important role in medical diagnosis, which can achieve high classification accuracy on small datasets and save training time. Unsupervised learning is rarely used, and although it can save time in the process of labeling the dataset, the classification effect is not very good for COVID-19 diagnosis. (ii) It is also clear from the specific details that CT scans and X-ray images are the main datasets. In terms of data sets, many experiments are faced with the problem that it is difficult to collect data sets. This paper summarizes several open data sets, such as GitHub, CheXpert Dataset, and COVIDx Dataset, and so on, which are only used to protect privacy. For the problem of unbalanced data set, traditional data enhancement methods can be used to increase the number of data, such as rotation, translation, etc., new algorithms can be proposed to enhance the data set. (iii) In the aspect of image preprocessing, because the data may come from different medical institutions or machines, and the protocols used are different, so the image imaging methods are also different. After the image preprocessing, it will be conducive to the next step of feature extraction and can improve the classification effect. The feature extraction is mainly focused on texture analysis and intensity features, and the traditional classification method that is the most widely used is SVM. (iv) GNNs and xDNNs are becoming more and more popular. GNNs can analyze the population infection according to the relationship between nodes. Through GNN, it can effectively and timely find the transmission group of the epidemic and carry out quarantine and other measures. xDNNs can provide visual explanatory information for the classification results, while other networks are like a black box. Using xDNNs can improve the interpretation of the network. As for the comparison of seven deep learning methods, this paper summarizes the advantages and disadvantages, as shown in Table 12.
However, labeled COVID-19 images are difficult to capture, resulting in an unbalanced dataset that reduces the degree of generalization of the model. Thus, future research on COVID-19 diagnosis can focus on the following aspects. (i) Establishing suitable COVID-19 images benchmark databases, which can be consolidated, improved, and allowed to compare various techniques. (ii) The new technique is based on deep neural networks such as CNNs, RNNs, GNNs, and xDNNs, which have the potential to improve classification accuracy, and these are propellers to enhance diagnosis.
In general, this paper summarizes COVID-19 diagnostic methods, including traditional methods and seven deep learning methods, and proposes solutions to some problems encountered in their experiments. In addition, seven deep learning methods are represented in tables in this paper. The classification results realized in each experiment and the advantages and disadvantages of each method can be seen. In a word, this paper hopes to make contributions to the diagnosis of COVID-19.
Funding Statement: The paper is supported by Priority Academic Program Development of Jiangsu Higher Education Institutions (PAPD).
Conflicts of Interest: The authors declare that they have no conflicts of interest to report regarding the present study.
|This work is licensed under a Creative Commons Attribution 4.0 International License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.|