|Computer Systems Science & Engineering |
Prediction Model for Coronavirus Pandemic Using Deep Learning
1Department of Information Systems, College of Computer and Information Sciences, Jouf University, KSA
2Department of Computer Science, College of Computer and Information Sciences, Jouf University, KSA
*Corresponding Author: Mamoona Humayun. Email: firstname.lastname@example.org
Received: 08 April 2021; Accepted: 14 May 2021
Abstract: The recent global outbreak of COVID-19 damaged the world health systems, human health, economy, and daily life badly. None of the countries was ready to face this emerging health challenge. Health professionals were not able to predict its rise and next move, as well as the future curve and impact on lives in case of a similar pandemic situation happened. This created huge chaos globally, for longer and the world is still struggling to come up with any suitable solution. Here the better use of advanced technologies, such as artificial intelligence and deep learning, may aid healthcare practitioners in making reliable COVID-19 diagnoses. The proposed research would provide a prediction model that would use Artificial Intelligence and Deep Learning to improve the diagnostic process by reducing unreliable diagnostic interpretation of chest CT scans and allowing clinicians to accurately discriminate between patients who are sick with COVID-19 or pneumonia, and also empowering health professionals to distinguish chest CT scans of healthy people. The efforts done by the Saudi government for the management and control of COVID-19 are remarkable, however; there is a need to improve the diagnostics process for better perception. We used a data set from Saudi regions to build a prediction model that can help distinguish between COVID-19 cases and regular cases from CT scans. The proposed methodology was compared to current models and found to be more accurate (93 percent) than the existing methods.
Keywords: Artificial Intelligence (AI); deep learning (DL); COVID-19; pandemic
COVID-19 pandemic first appeared in December 2019 when many Chinese people were affected by pneumonia. In a short period, this disease spread in more than 170 countries of the world, and a lot of fatalities were also reported. Till March 2021, more than 1.2 billion people all over the world are infected by this disease and about 2.7 million of them have died in the last few months. All the countries of the world are striving to cope up with this disease by following various practices but still, no one can control its spread completely. WHO has declared it as a pandemic [1–3], however; this is not the only pandemic that the world is facing, rather many outbreaks occurred in the past, and it may be expected in the future. When any such outbreaks occur, appropriate vaccines, drugs and infrastructure remain unavailable for certain stages of outbreaks. Therefore mitigation of such outbreaks with existing capacity is a challenge [4,5]. Researchers from all around the world are working to combat this pandemic, and well-known corporations such as Pfizer, Novavax, AstraZeneca, Johnson & Johnson, and others have developed vaccines to combat it. Nevertheless, there is no assurance that the pandemic can be entirely stopped . Further, inaccurate diagnostics and perception are also challenging, there is a need to take the benefit of the latest ICT technologies for accurate perception and diagnostics. Many researchers are using the techniques of AI and ML for COVID-19 diagnostics from chest X-ray images [7–9].
AI and DL are widely important fields on which the modern era is somehow dependent. DL is a subset of ML in AI that is mainly inspired by the functions and structure of the brain named as an artificial neural network (ANN). DL is nowadays used in various computer vision tasks and can be a natural candidate for accurate diagnostics of COVID-19 or any other upcoming pandemic [10–12]. A few of the studies have implemented these techniques and proven them successful e.g. Alibaba developed a solution based on AI and ML to assist china in fighting against COVID-19. It is claimed to be introduced in different regions of China with 98 percent accuracy, and it aids in forecasting the scale, peak, and length of the epidemic. Similarly, a DL-based image analytics solution can be used to fight different forms of pneumonia. Further, the Vaccination process can also be accelerated using DL and AI techniques [13,14].
For the last couple of months, a lot of research has been going on to provide different solutions for early and effective identification and mitigation of COVID-19. But what about the losses suffered by various countries across the globe? In the future, there is a need to provide such solutions that may predict these pandemics accurately so that early measures could be taken in advance. To do so, we have provided a solution in this research paper that will help in accurate diagnostics of the COVID-19 pandemic. This will be done using the techniques of AI and DL. For this, existing data related to COVID-19 will be analyzed using DL and AI techniques to forecast that when and where the pandemic exists, and those patients may be notified to make the required arrangements. Fig. 1. illustrates the working of our project.
The remaining paper is organized as: the second section offers a summary of existing studies to include the most up-to-date detail about how ML and DL are used for vision and diagnostics. The suggested technique is outlined in detail in Section 3. The study’s findings, as well as comparisons to other research, will be presented in Section 4. The discussion of these observations will take place in Section 5. Section 6 concludes the paper by providing directions for future work in Section 7.
2 Literature Review
The current pandemic had not only crashed the economies of various countries; rather, it has highly impacted the morals and strength of various nations. The aftermath of COVID-19 is likely to have a huge impact on the world. A lot of research work is going on an urgent basis to provide some solutions for managing COVID-19. However, still, the problem is not solved. Below we provide some latest studies to provide the state-of-the-art view
Keeping in view the drastic effects of COVID-19, the study  provides a detailed overview of AI and Big Data (BG) by identifying its application in the current pandemic situation. This study also highlights the issues and challenges associated with existing COVID-19 solutions and provides recommendations for effective control of COVID-19. The paper provides new insights into the use of AI and BG in the COVID-19 situation by providing various real-life examples of AI and BG in outbreak prediction and mitigation. However, the study just explores the existing solutions without any novel contribution.
The study  discusses the importance of ML, cloud computing, and mathematical modeling for the prediction of epidemic growth proactively. A case study is presented to highlight the severity of COVID-19 globally. A Weibull model has been proposed based on iterative weighting that is proven to be better in prediction than the baseline. The study claims that the existing baseline model gives an over-optimistic view of the COVID-19 scenario that can lead to inefficient decision-making and may affect the overall health situation. The study also provides various future directions and claims to set up the ground for further practical implications.
In this research study , a detailed survey is provided on existing forecasting techniques along with their pros and cons. Two types of datasets have been used in this study, i.e. big data of WHO and data shared through social media. Various forecasting techniques and parameters have also been provided, along with associated challenges. In the end, the study provides a set of recommendations that might be followed by victims of COVID-19. The paper provides a good overview of forecasting techniques and associated challenges; however; no validated solution is provided for the prediction of COVID-19 and upcoming pandemics.
In the study , a method of X-ray screening is proposed by using DL artificial neural networks that are efficient and accurate in terms of processing time and memory. A dataset of 13569 X-ray images was used for analysis. The data set was divided into three categories, namely healthy, COVID-19 patients, and non-COVID pneumonia patients. The data set was trained using various approaches, and finally, 231 images from three classifications were used for quality assessment of the proposed method. The study results show that the proposed approach produces high-quality models with 93.9% accuracy, positive prediction of 100%, and COVID-19 sensitivity of 96.8% by using the least number of parameters. However, there is a need to apply the proposed method to large and heterogeneous data set for further validation.
In a study , a systematic study of various COVID-19 mitigation approaches is provided along with their potentialities and associated challenges. The authors argue that deep transfer learning (DTL) is more suitable than DL in the absence of a large dataset. Further, DTL is more suitable to be used with resource-constrained 5G technologies. After providing the overview of existing AI and DL techniques in mitigating COVID-19, a precedent pipeline model of DTL is drawn for outbreak mitigation. This survey is helpful for researchers and practitioners of DTL and Edge Computing (EC) in the development of tools and applications for mitigation of COVID-19 and any future pandemic.
DL techniques were used in this study  to establish an early screening model that used pulmonary CT images to differentiate COVID-19 cases from IAVP and stable cases. There were 618 CT samples obtained in total, with 219 from 110 COVID-19 patients, 224 from 224 IAVP patients, and 175 from 175 stable cases. These CT samples were given by three COVID-19-designated hospitals in Zhejiang Province, China. The candidate infection regions were first segmented out of the pulmonary CT image spectrum using a 3D-based DL algorithm. Using a location-attention classification model, these segregated images were classified into the COVID-19, IAVP, and unrelated to infection classes, along with the corresponding confidence scores. Finally, the Noisy-OR Bayesian function was used to measure the infection form and overall confidence score for each CT event. According to the experimental findings of the benchmark dataset, the average accuracy level for all CT cases combined was 86.7 percent. The DL models built in this study were found to be useful for early COVID-19 patient screening.
A novel DL algorithm for an automatic breakdown of various COVID-19 infection regions was proposed in this article . The soft-focus mechanism was used to help the model discriminate between various COVID-19 symptoms. The soft attention mechanism was used to enhance the model's ability to differentiate several COVID-19 signs, and aggregated residual transformations were used to learn a strong and easy-to-read function representation. Using a publicly available CT image dataset, the success of the proposed algorithm was compared to that of other competing methods. Experiments show that the proposed algorithm for COVID-19 Chest CT image automated segmentation performs extremely well. The researchers use a DL-based segmentation technique to pave the basis for a systemic diagnosis of COVID-19 lung infection in CT images.
The paper  suggests an Adaptive Feature Selection Guided Deep Forest for COVID-19 Classification based on Chest CT images (AFS-DF). First, it disables location-specific functions in CT archives. Then, using a deep forest model to learn high-level representations of the features with the comparatively small-scale details, a high-level representation of the features was captured. To minimize feature duplication, a feature selection technique based on the qualified deep forest model was proposed, with feature selection being adaptively paired with the COVID-19 classification model. From the COVID-19 dataset, 1495 COVID-19 patients and 1027 community-acquired pneumonia patients were used for validating the proposed AFS-DF (CAP). The proposed method achieves consistency, sensitivity, specificity, AUC, precision, and F1-score of 91.79 percent, 93.05 percent, 89.95 percent, 96.35 percent, 93.10 percent, and 93.07 percent, respectively. Experiments on the COVID-19 dataset indicate that the proposed AFS-DF outperforms four commonly used machine learning models in terms of COVID-19 vs. CAP classification.
The above discussion shows that a lot of research efforts are going on globally for early detection and mitigation of COVID-19, however, to the best of our knowledge, there are several models presented to predict the cases of cases with different aspects for the different regions globally. However, very few or almost non-dominant studies for the Saudi region are presented, which may forecast the patient based on the Chest X-ray data. Forecasting of a pandemic can help in designing management strategies and solutions for managing pandemic and reducing its disastrous impact globally. Since this pandemic is overnight and still the world is not prepared fully to push back completely, in this scenario timely information or prediction can be more important than the vigilance policies to cater to the pandemic situation in a better way.
3 Proposed Methodology
The goal of this research was to strengthen the diagnostic procedure by reducing the inaccurate diagnostic perception of chest CT scans and encouraging clinicians to easily differentiate against patients who are sick with COVID-19 or pneumonia, and also empowering health professionals to distinguish chest CT scans of healthy people. The steps of the proposed methodology are described in Fig. 2.
Tab. 1 displays the specifics of the dataset we used for our experiment, which was representing the Saudi region, and collected after the Pandemic situation was there. We used a good number of Chest X-ray, and also used the augmentation approach to further enhance the dataset, to get the possible better results, and to get control over the overfit and outfit.
This research has used a total of 5856 chest CT scans, comprising 4273 CT scans of infected patients with COVID-19 and 1583 CT scans of healthy individuals with no measurable COVID-19. The dataset is divided into three subsets namely: training dataset, testing dataset, and validation dataset as shown in Fig. 3. To avoid model overfitting and to properly test the proposed model, dataset separation is performed. More training data means the model examines more data, while validation data aids decision-making, and test data gives a better understanding of how well the model generalizes to unknown data.
Fig. 4, shows the CT scans of COVID-19 infected patients and the healthy patients that are randomly selected from the dataset for visualization. Health professionals need to spend less time for identification by examining the varied representations. The convolutional neural network (CNN) is used to grab features and classify CT scans as COVID-19 affected and normal.
Data preprocessing includes the dataset preparation and data augmentation to diversify the training data collection to boost the efficiency and ability of the model to generalize. In the Keras DL repository, digital image augmentation is facilitated through the ImageDataGenerator class. Augmentation attributes are vertical flip, horizontal flip, height shift range, width shift range, and zoom range. We fill up images, resize them, convert them to grayscale, normalize and reshape them to the appropriate dimensions for preprocessing. Although features are on the boundary, feature engineering is necessary to obtain their values and normalize them, helping the classification model to classify more accurately. Fig. 5 depicts the pixel distributions of a CT scan.
Our proposed model is an attention-based DL framework using transfer learning with VGG16 with pre-trained weights from a large benchmark, it is adapted from . VGG16 is a CNN model that achieves 92.7% top-5 accuracy in the ImageNet dataset.
The essence of CNN consists of the layered structure by convolution can obtain confined characteristics (e.g. edges) through the provided data as input. Every node is equipped with a small subclass of spatially weighted connections in a convolutional layer. The max-pooling layer is followed by the convolutional layers to minimize convergence speed, which decreases the feature size vectors by choosing the highest feature vector solution in a confined region [24,25]. Fig. 6 presents the architecture of the VGG deep learning model that offers improvements on AlexNet by sequentially take over large kernel-sized filters with multiple 3X3 kernel-sized filters. The model depth is increased by the many non-linear layers, allowing it to procure more advanced features at a lesser cost. A set of fully-connected layers being preceded by combinations of pooling layers and convolutional blocks, where a node in one block will have associations with all the activations that are in the preceding stage.
Fig. 7 demonstrates the classification probabilities of normal and COVID-19 infected patients; classification probabilities can be used to compute performance. The proposed approach is influenced by its deep structure and broad training parameters. It is necessary to identify the local optimal and over-fitting conditions. The transfer model's network connectivity properly sized parameters are frozen, and the model's structural level parameters are thoroughly learned and modified, and the goal characteristics derived are more distinctive. During training, we have noticed the Loss and accuracy score that’s the number of epochs and batch size for the model of optimization. The model achieved 0.93 accuracy
The accuracy pattern across both datasets can be derived from the accuracy map. We could see that the framework has not yet over-learned the training dataset, demonstrating equal scores on both datasets. The proposed model also has comparable performance for loss on both train and testing datasets.
Analyzing the training loss vs. test loss as shown in Fig. 9, and training accuracy vs. test accuracy as shown in Fig. 8 over a series of epochs is an immeasurable way to see whether the model has been adequately trained. This is necessary to ensure that the model is neither under- nor over-trained to the extent that it begins to memorize the training data.
3.1 Proposed Model Parameters
Total parameters: 21,178,754
Trainable parameters: 6,464,066
Non-trainable parameters: 14,714,688
4 Experiment and Results
Performance metrics have been used to test and validate our proposed framework on the COVID-19 dataset. Primarily, the following measurement criteria were used to evaluate the proposed framework.
where refers to precision, refers to true positive and represent false positives
where refers to recalls
Fig. 10 presents the results of our model graphically. Model results are also shown in Tab. 2.
The obtained results of the proposed model were compared with few existing latest studies to know the performance of the model. The comparison is shown in Tab. 3, the results of Tab. 3 show that the proposed scheme is efficient in terms of diagnostics accuracy. Besides, precision, recall, and F1 score values for the normal patient and COVID-19 patient are also satisfactory and show improvement.
The Confusion matrix for measuring the efficiency of a classification model about the number of prospective groups. The matrix correlates the real goal values to those expected by the model. This gives us a balanced perspective about how well the model classification is doing as shown in Fig. 11.
The findings obtained by the CNN model were compatible with the newly proposed DL approaches for the automatic diagnosis of COVID-19 through chest X-ray images. It’s observed that the proposed methodology has achieved greater efficiency as compared to existing frameworks. The multi-class classification of COVID-19 is much more critical and complex due to the common frequency of COVID-19 with various forms of pneumonia. As a result, efficiency in these situations is relatively lower and further can be improved using a more efficient DL model.
Fig. 12 shows the predicted CT scans results that ensure that the model is classifying the data accurately between the normal/healthy and the COVID-19 infected patients.
In this research, we introduced a DL approach capable to detect COVID-19 from CT scans. On a dataset of CT scan images from the Saudi region, we examine the performance of a DL framework for automated COVID-19 detection. The suggested DL framework has the better COVID-19 classification accuracy in this region, the system effectiveness improves as the number of CT scans increases.
The proposed methodology is based on the attention-based DL framework using transfer learning with VGG16. In preprocessing the data augmentation is performed to avoid over-fitting and to generalize the model. The neuronal architectures are more conducive to using transfer learning towards achieving a better classification score.
In comparison to other comparable experiments, this one benefits the use of a transfer learning strategy with a pre-trained CNN. Our framework can better differentiate COVID-19 cases from CT scans, as well as provide guidance on patient severity to help with triage and therapy. Experiment results reveal that the proposed methodology achieves better accuracy, precision, recall, and F1 score in the Saudi region dataset.
Moreover, from a clinical point of view, the joint sensitivity is helpful because it provides a reliable estimate of the proportion of infected patients, which is an important factor that physicians consider when determining the seriousness of a COVID-19 case. Based on the potential descriptive and analytical findings, we can see our developed approach being widely used in broad-scale clinical studies that would be helpful for future decision making. Our long-term goal is to broaden the experimentation space by gathering more training data and applying the established methodology to other regions’ datasets.
COVID-19 is one of the critical challenges the world is facing these days. Because of its high transmissibility, and multiple phases, it has killed millions of people all around the world as per the Worldo meters recent data . Despite the great efforts and availability of various vaccination, the problem persists. Saudi Arabia has taken timely measures to cope-up this problem and it flattened its fatality curve to some extent but still, the kingdom is not free from this pandemic. Accurate perception and diagnostics is a key challenge that needs to be addressed for the management and control of COVID-19. As a contribution to research, we have taken a dataset of 5856 CT scans from the Saudi Arabia region. The selected dataset consists of 4273 CT scans of COVID-19 infected patients while 1583 CT scans of healthy people. The data was divided into three parts namely: training, testing, and validation. The CNN technique is used to extract features and classify CT scans as COVID-19 affected and normal. A diagnostic efficient model was proposed in this study, the proposed model was validated using performance metrics. The accuracy, precision, and recall values were calculated for each part of the dataset, and results were compared with existing studies as well. The results show that the proposed model provides an efficient way of COVID-19 diagnostics from existing CT scans with better accuracy of 93%. The limitation of our work is that the method of evaluating and benchmarking AI strategies used in the classification of COVID-19 conceded multi-complex attribute concerns. To investigate various areas of assessment and benchmarking, such as testing deficiencies and problems.
7 Future Work
In the future, new methods will be investigated, or multiple criteria will be consolidated into the proposed model. Refer to the other COVID-19 practices as well, categorization strategies can aid in the refinement and generalization of the model, as well as training on other COVID-19 datasets.
Acknowledgement: The authors extend their appreciation to the Deanship of Scientific Research at Jouf University for funding this work through research grant no (DSR2020-04-1533)
Funding Statement: The authors extend their appreciation to the Deanship of Scientific Research at Jouf University for funding this work through research grant no (DSR2020-04-1533).
Conflicts of Interest: The authors declare that they have no conflicts of interest to report regarding the present study.
|This work is licensed under a Creative Commons Attribution 4.0 International License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.|