VGG-CovidNet: Bi-Branched Dilated Convolutional Neural Network for Chest X-Ray-Based COVID-19 Predictions

: The coronavirus disease 2019 (COVID-19) pandemic has had a devastating impact on the health and welfare of the global population. A key measure to combat COVID-19 has been the effective screening of infected patients. A vital screening process is the chest radiograph. Initial studies have shown irregularities in the chest radiographs of COVID-19 patients. The use of the chest X-ray (CXR), a leading diagnostic technique, has been encouraged and driven by several ongoing projects to combat this disease because of its historical effectiveness in providing clinical insights on lung diseases. This study introduces a dilated bi-branched convoluted neural network (CNN) architecture, VGG-COVIDNet, to detect COVID-19 cases from CXR images. The front end of the VGG-COVIDNet consists of the first 10 layers of VGG-16, where the convolutional layers in these layers are reduced to two to minimize latency during the training phase. The last two branches of the proposed architecture consist of dilated convolutional layers to reduce the model’s computational complexity while retaining the feature maps’ spatial information. The simulation results show that the proposed architecture is superior to all the state-of-the-art architecture in accuracy and sensitivity. The proposed architecture’s accuracy and sensitivity are 96.5% and 96%, respectively, for each infection type. of PPV Comparisons with the state-of-the-art models VGG-CovidNet yields promising results with 96.5% accuracy, 96.0% sensitivity, and 98% PPV.


Introduction
Coronavirus disease 2019 (COVID-19) has been spreading exponentially; thus, it has become a pandemic. Its transmission process has not been fully comprehended. The virus often causes few or no symptoms; however, in 2%-8% of infected individuals, rapidly advancing and, frequently, deadly pneumonia can occur [1][2][3]. The precise prevalence, mortality, and transmission approaches are still not completely understood partly because of the distinct issues posed by severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) infection. They include peak infectiousness at or immediately preceding symptom onset and a poorly understood multi-organ pathophysiology Rapid training: Through CXR images, patients with suspected COVID-19 infection can be rapidly trained. This can be performed along with viral testing (the results of which take time) to bring relief to the large number of patients, particularly those in the highly affected regions (e.g., Spain and Italy) where hospitals have exceeded capacity. CXRs can also be performed alone when viral testing is not possible because of low supplies. In addition, CXR imaging is highly effective for triaging in geographical regions (e.g., in New York City) where patients are given instructions to stay at home until they start experiencing more advanced symptoms. The reason is that abnormalities are frequently observed in patients presenting with suspected COVID-19 at clinics [13].
Availability and accessibility: Many clinics and imaging centers have easily available and accessible CXR imaging because most healthcare systems consider it standard equipment. Particularly in developing countries, CT imaging is less available than CXR imaging because of the high acquisition and maintenance costs.
Portability: Imaging can be performed in isolation rooms because of the availability of portable CXR systems. This considerably decreases the risk of COVID-19 transmission in rooms with fixed imaging systems or during patient transport to these rooms [13].
Therefore, it is possible to perform radiographic assessments more rapidly. In addition, these systems have become more widely available, as is clear from the availability of chest radiographic systems and portable equipment in modern healthcare systems. Hence, computer-assisted diagnostic systems that can improve the accuracy of radiologists' interpretations of images to determine the status of COVID-19 cases are highly desirable. Several models have been proposed and implemented. Most have limitations, such as computational overhead, a limited number of training or testing sets, the requirement for a separate training session, and the need to train a large number of parameters. Thus, the development of accurate diagnostic methods is important.
Consequently, the present study has proposed VGG-COVIDNet, a dilated bi-branched convoluted neural network (CNN) system for COVID-19 detection on the basis of CXR images.
The motivation is the growing need for solutions to combat the pandemic. A review of the literature indicates that COVIDx is the most extensive open-access benchmark dataset on the number of publicly available COVID-19 positive cases. Furthermore, the predictive effectiveness of VGG-COVIDNet was investigated by leveraging the transfer-learning capability of the first 10 layers of VGG-16 and dilated convolution to capture the feature maps' spatial information to reduce the complexity of the model.
With the rapid increase in the number of suspected cases, artificial intelligence methods may play a role in the identification and characterization of COVID-19 through imagery. CXR gives this process a clear and expedited window. Deep learning from large CXR multinationals could provide automated reproducible biomarkers to classify and to quantify COVID-19. This study aimed to develop and to evaluate a CNN-based model to classify three types of infections. The focus was the high-accuracy detection of normal, COVID-19, and non-COVID-19 cases. The rationale behind the choice of these categories was the opportunity for clinicians to prioritize patients for COVID-19 testing and to determine the appropriate care on the basis of the cause of infection because of the differences in the treatment of COVID19 and non-COVID-19 infections. The main contributions of this work are as follows: 1. A dilated convolution and branching strategy model are proposed. The dilated convolution aims to grab spatial information from the feature maps by avoiding pooling layers, which increase complexity. 2. Given the nature of the problem, the first 10 layers of the VGG-16 model [14] are used as the front end of the architecture because of the model's flexibility and transfer-learning capability. These first 10 layers extract and use the high-level feature and provide the generated outputs to the two branches to further extract and to incorporate low-level features throughout the training process.
This paper is organized as follows. Section 2 discusses the methods employed in the detection of COVID-19 and other diseases. Section 3 explains the use of the dataset for infection detection. In Section 4, the benefits of the proposed architecture and dilated convolution are discussed. Section 5 explains the efficiency of the architecture through evaluation matrices and provides a comparison with the state-of-the-art methods. Finally, conclusions are drawn, and future directions are discussed in Section 6.

Related Work
Studies have examined the potential of CXR and CT scans in the detection of COVID-19 [15][16][17]. Most have sought to generate automatic COVID-19 classification systems that typically employ CNNs [18]. Xu et al. [19] initially used a pre-trained three-dimensional CNN to extract potentially infected regions via CT scans. The samples were then added to another CNN and sorted with 86.7% accuracy into three categories: non-infection, influenza viral pneumonia, and COVID-19. Wang et al. [20] first extracted candidates by employing a thresholdbased strategy. Next, for every case, a random selection of two or three regions was used to create the dataset, which was used to fine-tune a pre-trained CNN. Finally, features were removed from the CNN and added to a group of classifiers for predicting COVID-19 with 88% accuracy. The authors in [21] used CT scans to identify positive COVID-19 cases in which all the slices were separately fed into the model and a max-pooling function was used to combine all the outputs, thereby achieving 90% sensitivity. In [22], a pretraining CNN model was initially trained on the ImageNet dataset [23]. Next, it was refined through the use of a CXR image dataset to classify the samples as normal, bacterial, COVID-19 viral infection, and non-COVID-19 viral infection with 83.5% accuracy. Sethy et al. [24] conducted similar research in which they trained different CNN models on CXR images, then an SVM classifier used to detect positive COVID-19 cases with 95.38% accuracy. In [25], capsule networks, COVID-CAPS, were proposed. This model, which includes capsule and convolution layers, attained an accuracy of 95.7%, a sensitivity of 90%, a specificity of 95.8%, and an area under the curve (AUC) of 0.97%.
CXRs have been widely used to identify COVID-19 positive cases [22,24,26,27]. Chen et al. [28] found bilateral pneumonia in patient CXRs in a sample of 99 cases. Interestingly, the link between CXRs and CT images was noted by Yoon et al. [26] in a study of nine COVID-19 positive cases. Other studies have focused on neural network-tailored deep-learning (DL) models, such as ResNet50 [24] and COVID-Net [22]. ResNet50 was tested on 25 positive COVID-19 cases, and COVID-Net was used on just 31 positive cases. A classic deep-learning model was employed by Zhang et al. [27] on 100 COVID-19 samples to detect positive cases. This method exhibited a high accuracy of 90%. To detect COVID-19 positive cases from CXRs, Ozturk et al. [29] developed a deep neural network (DNN)-based method that achieved 98.08% accuracy in 125 positive cases. Moreover, the model also achieved 87.02% accuracy by employing the multi-class scenario. Narin et al. [16] used three types of DNN architecture (Inception-ResNetV2, ResNet50, and InceptionV3) to identify COVID-19 positive cases from a total of 50 samples. They achieved accuracy levels of 97%, 98%, and 87%, respectively. Using a sample of 155 COVID-19 positive cases, Mangal et al. [30] proposed a DL-based system, CovidAID, to determine positive cases from CXRs. The study achieved 90.5% accuracy and 100% sensitivity.
To detect COVID-19 cases from CT images with weak labels, Wang et al. [31] proposed a DLbased method, which was trained by using 499 volumes and tested with an additional 131 volumes. The authors achieved specificity and sensitivity values of 0.911 and 0.907, respectively. Farooq et al. [17] proposed a DL-based approach to distinguish COVID-19 cases on the basis of CXRs. Using eight COVID-19 positive cases, the study achieved 96.23% accuracy. A DL-based method was used by Hall et al. [32] to determine COVID-19 cases from CXRs. From 135 COVID-19 positive cases, the authors achieved 89.2% accuracy, exhibiting a true positivity rate of 0.8039 and an AUC of 0.95. An ensemble-based method that was used to test a group of 33 CXRs achieved 91.24% accuracy and an AUC of 0.7879. The authors in [33] employed CXRs and CT scans to test a proposed CNN-tailored DNN, with 96.28% accuracy (false − negativerate = 0.0208, AUC = 0.9808). While the authors in [34] used traditional machine learning and DL to facilitate COVID-19 detection. The authors claimed that this was the first approach to combine both types of learning to detect COVID-19 infection by learning CXR image representations of highly discriminatory features. They stated that the DL models had achieved the optimum accuracy of 98.8% in the ResNet50 model. In [35], the authors proposed a parallel architecture namely COVID-CheXNet to provide a high degree of confidence in discriminating between healthy and COVID-19 infected individuals. The study used two pre-trained DL models based on the incorporation of a ResNet34 and a high-resolution network model. Given the 99.99% accuracy rate, the authors claimed that the proposed system had correctly diagnosed the COVID-19 patients.
Some images from the COVIDx dataset are shown in Fig. 1. All the combined images from the above archives were divided into the following three categories. The data distribution by infection type is presented in Fig. 2. It illustrates the paucity of COVID-19 patient data in the publicly available repositories. Fig. 2 also indicates the need for additional COVID-19 data to improve the results and to reduce the false-negative and falsepositive rates in the testing phase. Tab. 1 presents the distribution of the images in the training and testing sets.

Methods
This study presents a hybrid bi-branched CNN model based on dilated convolution to classify CXR images in three infection types: COVID-19, non-COVID-19, and normal. The first 10 layers of the VGG-16 architecture were used as the front end of the proposed model, VGG-COVIDNet. The back end was based on dilated kernels, which are explained in the succeeding sections. The main idea of the proposed solution was the deployment of a deeper CNN to produce high-level features without losing detailed information in the images. In addition, it has large receptive fields and classifies the CXRs in the respective categories without the negative exploitation of network complexity. This section describes the proposed architecture and presents the training method.

Pre-Processing
Initially, the input images were converted to the JPEG format and resized to 480 × 480 × 3 dimension. Similarly, the normalization of images was performed in the 0 to 1 range based on the formulation given in Eq. (1). Next, the power law transformationwas performed in accordance with Eq. (2) with γ = 0.6 to make the dark areas of the images more prominent to reveal additional information about the corresponding disease.
Here, I out and I in are the output and input images, respectively, where c = 1 and γ = 0.6.

Architecture of the Proposed VGG-CovidNet Model
The proposed model consists of VGG-16 [14] and some CNN layers with expanded convolutional layers to classify the CXR images in the respective classes. VGG-16 was chosen as the front-end architecture on the basis of [41][42][43] because of its strong transfer learning ability and the flexibility to easily concatenate the back end for the desired task. In the absence of a back end, the VGG-16 performs supportive tasks without boosting performance, and this results in very low accuracy. In the present study, the fully connected VGG-16 layers (classification part) are removed, and the proposed VGG-CovidNet is made by appending the convolutional layers in VGG-16. The front-end network's output size is one-eighth of the original input size, and upon stacking multiple layers, the information in the images is lost. In turn, it affects classification performance. Inspired by the work of [41][42][43], the study incorporated dilated convolutional layers as the back end to extract deeper patency information, which is crucial for classification tasks (Fig. 3).

Dilated Convolution
In the proposed model, one of the crucial components is the dilated convolution inspired by [44]. The mathematical formulation of the dilated convolution can be given in Eq. (3).
O(m, n) is the resultant output of the dilated convolution, where I(.) and kernel K(i, j) are the inputs to convolution with the length and width of m and n, respectively. The stride s determines the dilation size, where for s = 1, the convolution becomes the normal dilation. An alternative to pooling layers, dilated convolution for segmentation tasks has been found to yield significant improvements in accuracy [39][40][41]. Pooling is used to maintain invariance and to avoid overfitting in the learning model. However, it reduces the spatial resolution of the input, and this in turn causes the spatial information of the feature map to be lost. This information loss can be ameliorated through the deconvolutional layers but at the cost of increased complexity and latency, which may not be suitable in every case. Dilated convolution is the solution to avoid the disadvantages of using the pooling and deconvolutional layer. In the dilated convolutional layers, sparse kernels that replace the pooling and deconvolutional layers are used by enlarging the receptive field to avoid information loss and to control complexity. In addition, they reduce the number of operations by performing down-sampling followed by up-sampling. Thus, it facilitates the flexible aggregation of the multiscale contextual information with the same resolution. The dilated convolution demonstration is illustrated in Fig. 4, which shows an enlargement of a small p × p kernel to p + (p1)(s1).

Network Configuration
This subsection describes the proposed VGG-CovidNet architecture, which has the same front end, which consists of the first 10 layers of the VGG-16 model. The back end is two branches with different dilation rates. Each branch extracts pieces of spatial information missed by the other processes during dilation. In the end, both outputs are combined to enhance classification accuracy. Moreover, the pooling layers in VGG-16 are reduced to two instead of three layers to overcome the detrimental effects on output accuracy.

Implementation Details
To train the VGG-CovidNet, the procedure of [22] was followed. However, instead of being pretrained on the ImageNet [45] dataset, the VGG-CovidNet was trained afresh with the Adam optimizer The hyperparameter values were adjusted, as indicated below: learning rate =2e−4, number of epochs = 500, and batch size = 64. Data augmentation was performed through the use of the following augmentation types: intensity shift, horizontal flip, translation, zoom, and rotation. The proposed VGG-CovidNet architecture was implemented with the highly recommended PyTorch [46] framework.The training process for the model was performed on a system with 64 GB memory, a quad-core processor, and a GeForce GTX 1080 Ti graphics card with a large 11 GB frame buffer. The duration of the process was approximately 161 h, and the testing phase lasted 3.5 h.

Hyper-Parameter Adjustment
The hyper-parameters of a neural network are variables that represent the network structure and training methods on the conditions provided. To avoid the fluctuation of the weight values and direct them toward the minimum solution, the learning rate was set to 2e−4. The dropout rate at the last layers was set to 20% because the high value would lead to underfitting, and the low value would have a minimal effect. Based on the network results during training, the number of layers after VGG-16 was set to 6. The activation function is also an important factor in network architecture. Therefore, in the hidden layers, the rectified linear unit activation function was used to avoid negative values during the process, and in the last layer, the softmax entropy was used for multi-class predictions.

Experimental Results
To evaluate the effectiveness of the proposed architecture, the results were analyzed to better observe the classification performance and to evaluate it against various types of DNN architecture.

Network Configuration
In this section, the effects of the VGG-CovidNet architecture are analyzed in terms of the performance balance in the classification process on the resulting network architecture. For comparison, an in-depth analysis and evaluation of the performance of the following DNN architecture were performed: VGG-19 [14]: A DNN architecture with low architectural diversity that does not use residual design principles.

ResNet-50 [47]:
A DNN architecture that uses lightweight patterns and moderate architectural diversity with minimal long-range connectivity (it does not leverage lightweight PEPX patterns).

COVID-Net [22]:
A deep convolutional multistage neural network design. In the first stage, the design principle was adopted to enable reliable high-performance trainable neural network structures. In the second stage, a design pattern for projection expansion and projection was used to provide a better representation while retaining computational efficiency.

Analysis of Results
To analyze the quantitative performance of the proposed model, the test accuracy, positive predictive value (PPV), and sensitivity were recorded for each infection type on the COVIDx dataset. In addition to these performance measures, the architectural complexity regarding the parameters and computation complexity in terms of the multiplier-accumulator (MAC) numbers were also measured (Tab. 1). The values of the number of parameters and MAC of VGG-16 of all 16 layers [14] were 138 M and 15.4 G, respectively, because of the large number and channels of the convolutional layers. However, in the present study, they were reduced to 13.24 M and 14.3 G, respectively, by decreasing the number of convolutions in each layer to two instead of three and by using just 10 layers of VGG-16. One of the main reasons for these lower values was dilated convolution, which reduces the number of parameters and MACs. The accuracy achieved through the proposed model is 96.5%, which is superior to that of all other types of architecture (Tab. 2). However, the architectural and computational complexity of the proposed model are slightly higher than that of [22]. Unlike the model in [22], this model did not require continuous retraining whenever new data were collected. Tab. 2 indicates that the complexity of the proposed model was the result of the higher transfer learning based on VGG-16. Therefore, it will incur higher computational costs than other models during retraining. However, if the first 10 layers of the model are kept frozen, then the computational complexity in retraining the model will become reasonable.
To gain deeper insights into the proposed model, the sensitivity and PPV were computed (Tabs. 3 and 4, respectively). Figs. 5 and 6 illustrate the comparison of the PPV and sensitivity of the proposed model with those of other models. These measures, sensitivity and PPV, were derived from the confusion matrix (Fig. 7) and compared with the confusion matrix for COVID-Net in Fig. 8, which resulted from the testing phase. The proposed model's sensitivity is 96%, which is the highest value achieved so far for COVID-19 cases. This high sensitivity suggests that the rate of missed COVID-19 positive cases is lower than that in the other state-of-the-art methods (Tab. 3). The PPV value is comparable to that of [22], which shows fewer incorrectly classified COVID-19 cases. As shown in Fig. 7, two of the four patients were classified as non-COVID-19 patients, and two were classified as normal. In sum, the results indicate that the proposed model outperformed the state-of-the-art method in terms of accuracy and sensitivity on the COVIDx dataset.       The comparison of VGG-COVIDNet and the latest models to detect COVID-19 is shown in Tab. 5. The proposed model is superior to the other models because it identifies the features that distinguish COVID-19 from other classes. The same dataset (COVIDx) and ImageNet were used in most of the models. The DeepCOVIDExplainer method employed two ensemble strategies, and ECOVNet used an EfficientNet base model with ImageNet pre-trained weights. Similar to the present work, PDCOVIDNet is a dilated convolution-based COVID-19 detection network. The accuracy of PDCOVIDNet has been found to be comparable to that of the proposed method. However, in PDCOVIDNet, training and testing were performed on a limited dataset. The other ensemble models have limited applicability because of the computational cost of training multiple DL models for ensemble prediction. COVID-Net used many parameters that produced a high computational overhead; therefore, it is not very practicable.

Limitations
The proposed study has a few limitations, such as model complexity and the unavailability of a large set. The minimal number of images in the dataset restricted system performance in terms of the high false-positive rate. The false positive rate can be reduced with the availability of additional COVID-19 images. The proposed model has a higher degree of complexity than the network proposed by [22], as can be observed from the number of parameters shown in Tab. 2. There is a need to reduce the complexity without affecting or degrading performance; however, there is a trade-off. Regardless, model retraining will be required with the availability of more images to further increase performance.

Conclusion
This paper has introduced VGG-CovidNet, a dilated convolution-based bi-branched architecture for classifying infectious images into corresponding classes. The front end of VGG-CovidNet is integrated with the first 10 layers of the VGG-16 model. In these layers, the convolution is reduced to two instead of three to reduce computational complexity. The back end of VGG-CovidNet consists of six parallel convolutional layers with different strides of the convolutional kernel. Using the dilated convolution facilitated the capture of the feature maps' spatial information and the reduction of the model's complexity. A benchmark dataset (COVIDx) was used to train and to test the proposed model. Further, VGG-CovidNet was quantitatively analyzed on the basis of accuracy, sensitivity, and PPV matrices. Comparisons with the state-of-the-art models indicate that VGG-CovidNet undoubtedly yields promising results with 96.5% accuracy, 96.0% sensitivity, and 98% PPV.

Conflicts of Interest:
The authors declare that they have no competing interests.