|Computer Modeling in |
Engineering & Sciences
ANC: Attention Network for COVID-19 Explainable Diagnosis Based on Convolutional Block Attention Module
1Jiangsu Key Laboratory of Advanced Manufacturing Technology, Huaiyin Institute of Technology, Huai’an, 223003, China
2Department of Medical Imaging, The Fourth People’s Hospital of Huai’an, Huai’an, 223002, China
3School of Informatics, University of Leicester, Leicester, LE1 7RH, UK
*Corresponding Authors: Yudong Zhang. Email: firstname.lastname@example.org; Xin Zhang. Email: email@example.com
Received: 15 January 2021; Accepted: 24 February 2021
Abstract: Aim: To diagnose COVID-19 more efficiently and more correctly, this study proposed a novel attention network for COVID-19 (ANC). Methods: Two datasets were used in this study. An 18-way data augmentation was proposed to avoid overfitting. Then, convolutional block attention module (CBAM) was integrated to our model, the structure of which is fine-tuned. Finally, Grad-CAM was used to provide an explainable diagnosis. Results: The accuracy of our ANC methods on two datasets are , and , respectively. Conclusions: This proposed ANC method is superior to 9 state-of-the-art approaches.
Keywords: Deep learning; convolutional block attention module; attention mechanism; COVID-19; explainable diagnosis
COVID-19 (also known as coronavirus) pandemic is an ongoing infectious disease  caused by severe acute respiratory syndrome (SARS) coronavirus 2 . As of 7/Feb/2021, there are over 106.22 m confirmed cases and over 2.31 m deaths attributed to COVID-19 (See Fig. 1). The main symptoms of COVID-19 are a low fever, a new and ongoing cough, a loss or change to taste and smell .
In UK, the vaccines approved were developed by Pfizer/BioNTech, Oxford/AstraZeneca, and Moderna . The joint committee on vaccination and immunization (JCVI)  determines the order in which people will be offered the vaccine. Currently, people aged over 80, people living or working in the care homes, and health care providers are being offered.
Two COVID-19 diagnosis methods are available. The first method is viral testing to test the existance of viral RNA fragments . The shortcomings are two folds: (i) the swab may be contaminated and (ii) it needs to wait from several hours to several days to get the results. The other method is chest imaging. The chest computed tomography (CCT)  is one of the best chest imaging techniques. CCT is operator independent. Besides, it provides the highest sensitivity compared to ultrasound and X-ray . It provides real 3D volumetric image of the chest region .
Nevertheless, the manual labelling by human experts are time-consuming, tedious, labor-intensive, and easily influenced by human factors (fatiguing or lethargic emotions). In contrast to manual labelling, computer vision techniques  are now gaining promising results on automatic labelling of COVID-19 and other medical images with the help of artificial intelligence (AI) .
For non-COVID-19 images, Lu  proposed a bat algorithm-based extreme learning machine (BA-ELM) approach, and applied it to pathological brain detection. Lu  presented a radial basis function (RBFNN) to recognize pathological brain types. Fulton et al.  employed ResNet-50 for identifying Alzheimer’s disease. Guo et al.  utilized ResNet-18 to identify Thyroid ultrasound plane images. Although those above methods were not directly developed for COVID-19 diagnosis, they are chosen and used as comparison methods in this study.
For COVID-19 images, Yu  proposed GoogleNet-COD model. The authors first replaced the last two layers with four new layers, which included the dropout layer, two fully connected layers and the output layer. Satapathy  proposed a five-layer deep convolutional neural network with stochastic pooling (DCNN-SP). Yao  combined wavelet entropy (WE) and biogeography-based optimization (BBO) techniques. Wu  used wavelet Renyi entropy (WRE) to extract features from chest CT images. Li et al.  presented a COVID-19 detection neural network (COVNet). Akram et al.  proposed a four-step procedure to handle COVID-19: (i) data collection & normalization; (ii) feature extraction; (iii) feature selection;, and (iv) feature classification. Khan et al.  used one class kernel extreme learning machine to predict COVID-19 pneumonia. Khan et al.  used DenseNet and firefly algorithm for classification of positive COVID-19 CT scans.
To further improve the COVID-19 diagnosis performance, this paper proved a novel AI model, attention network for COVID (ANC), for providing an explainable diagnosis. Here “attention” means it can tell neural network which region should focus. Compared to ordinary neural networks, the advantages of ANC are four folds:
i) A novel 18-way data augmentation was proposed to avoid overfitting;
ii) Convolutional block attention module was integrated so our model can infer attention maps;
iii) Our model was fine-tuned, and its performances were better than 9 state-of-the-art approaches;
iv) Grad-CAM was used to provide an explainable heatmap so the users can understand our model.
Two datasets are used in this study. The first dataset contains 148 COVID-19 images and 148 healthy control (HC) images . The second dataset contains 320 COVID-19 images and 320 HC images . Tab. 1 provides the descriptions of these two COVID-19 CCT datasets, where a + b stands for a COVID-19 subject/images and b HC subjects/images.
Preprocessing were applied to all the images. Let R0 and R stand for the raw dataset ( Dataset-1 or Dataset-2) and the final preprocessed set, and R1, R2, R3 denote three temporary sets. The flowchart of our pre-processing is displayed in Fig. 2.
Th raw dataset set , where means the number of images. The size of each image is . Although the raw images appear grayscale, but they are stored in RGB format at hospitals’ servers.
First, we need to grayscale all those raw images. The grayscale transformation is described as
which utilized the standard RGB to grayscale transformation .
Second, the histogram stretching (HS) was used to improve the contrast of all grayscaled images . For the k-th image , suppose its upper bound (UB) and lower bound (LB) grayscale values of are and . They can be obtained as
where are the indexes of width and height dimension, respectively. stand for the width and height of image r1, respectively.
The new HS-enhanced image is calculated as
where is the grayscale range of the image . It is defined as
The HS-enhanced image will occupy the full grayscale range as , where rmin and rmax stand for the minimum and maximum gray values, respectively.
Third, we cropped the texts at the right region and the check-up bed at the bottom region. The crop values are pixels to be cropped from four directions: top, left, bottom, and right, respectively. The cropped image can be written as
where stand for the weight and height of any image r3, respectively. The range a : b stands for the range from the integer a to the integer b.
Finally, downsampling was carried out to further reduce the image size and remove redundant information. Suppose the final size is , the final image,
where fds is the downsampling function defined as
Figs. 3a and 3b show the raw and pre-processed images of a COVID-19 case, while Figs. 3c and 3d illustrates the raw and pre-processed images of a HC case. As can be observed from Fig. 3, the pre-processed images have better contrast, remove the irrelevant information, down-sample to a smaller size, and take less storage than the raw images. Tab. 2 itemizes the abbreviation list, which will help readers to understand the following sections.
3.1 18-Way Data Augmentation
Data augmentation (DA) is an important tool over the training set to avoid overfitting of classifiers, and to overcome the small-size dataset problem. Recently, Wang  proposed a novel 14-way data augmentation (DA), which used seven different DA techniques to the preprocessed image , and its horizontal mirrored image , respectively.
This study enhances the 14-way DA method  to 18-way DA, by adding two new DA methods: salt-and-pepper noise (SAPN) and speckle noise (SN) on both and . In our future studies, we will research on including more types of DAs. Fig. 4 displays the illustration example of using two new DAs.
For a given image , the SAPN altered image  is defined as with its values are set as
where stands for noise density, and the probability function. and correspond to black and white colors, respectively. The definition of rmin and rmax can be seen at Section 2.
On the other side, the SN altered image  is defined as
where is uniformly distributed random noise, of which the mean and variance are symbolized as and , respectively.
Let Na stands for the number of DA techniques to the preprocessed image , and Nb stands for the number of new generated images for each DA. Thus, our -way DA algorithm is a four-step algorithm depicted below:
First, Na geometric/photometric/noise-injection DA transforms are utilized on preprocessed image , as shown in Fig. 5. We use , to denote each DA operation. Note each DA operations will yield Nb new images. So, for a given image , we will yield Na different data set , , and each dataset has Nb new images.
Second, horizontal mirror image is generated as
where hM means horizontal mirror function.
Third, all the Na different DA methods are carried out on the mirror image , and generate Na different dataset , .
Four, the raw image , the mirrored image , all the above Na-way results of preprocessed image , , and Na-way DA results of horizontal mirrored image , , are fused together using concatenation function hCON.
The final combined dataset is defined as
Therefore, one image will generate
images (including the original image ). Algorithm 1 shows the pseudocode of this proposed 18-way data augmentation on one image .
3.2 Convolutional Block Attention Module
Deep learning (DL) has gained successes in prediction/classification tasks. There are many DL structures, such as deep neural network , deep belief network, convolutional neural network (CNN) , recurrent neural network , graph neural network, etc. Among all those DL structures, CNN is particularly suitable for analyzing visual images.
To further improve the performance of CNN, a lot of researches are done with respects to either depth, or width, or cardinality of CNN. In recent times, Woo et al.  proposed a novel convolutional block attention module (CBAM), which improves the traditional convolutional block (CB) by integrating attention mechanism. There are many successful applications of CBAM. For example, Mo et al.  combined self-attention and CBAM. They proposed a light-weight dual-path attention block. Chen et al.  proposed a 3D spatiotemporal convolutional neural network with CBAM for micro-expression recognition.
Fig. 6a displays the structure of a traditional CB. The output of previous block was sent to n-repetitions of convolution layer, batch normalization (BN), rectified linear unit (ReLU) layer, and followed by a pooling layer. The output is called activation map (AM), symbolized as , where stands for the sizes of channel, height, and width, respectively.
In contrast to Figs. 6a and 6b displays the structure of CBAM, in which two modules: channel attention module (CAM) and spatial attention module (SAM) are added to refine the activation map . Suppose the CBAM applies a 1D CAM and a 2D SAM in sequence to the input . Thus, we have the channel-refined activation map
And the final refined AM
where stands for the element-wise multiplication. This refined AM will be sent to the next block.
Note if the two operands are not with the same dimension, then the values are broadcasted (copied) in such ways the spatial attentional values are broadcasted along the channel dimension, and the channel attention values are broadcasted along the spatial dimension.
3.3 Channel Attention Module
We define the CAM here. Both max pooling (MP)  zmp and average pooling (AP)  zmp are utilized, generating two features and as shown in Fig. 7.
Both and are then forwarded to a shared multi-layer perceptron (MLP) to generate the output features, which are then merged using element-wise summation . The merged sum is finally sent to the sigmoid function . Mathematically
To reduce the parameter resources, the hidden size of MLP is set to , where r is defined as the reduction ratio. Suppose and stand for the MLP weights, respectively, we can rephrase Eq. (17) as
Note and are shared by both and . Fig. 7 shows the flowchart of CAM. Note that squeeze-and-excitation (SE)  method is similar to CAM.
3.4 Spatial Attention Module
Now we define the spatial attention module (SAM) as shown in Fig. 8. The spatial attention module is a complementary step to the previous channel attention module . The average pooling zap and max pooling zmp are applied again to the channel-refined activation map , and we get
Both and are two dimensional AMs: . They are concatenated together along the channel dimension as
The concatenated activation map is then passed into a standard convolution zconv and followed by a sigmoid function . In all, we have
The is then element-wisely multiplied by to get the final refined AE . See Eq. (15). The flowchart of SAM is drawn in Fig. 8.
3.5 Proposed Attention Network for COVID-19
In this study, we proposed a novel Attention Network for COVID-19 (ANC) based on CBAM (See Fig. 6b). The structure of this proposed ANC is determined by trial-and-error method. The variable n in each block varies, and we found the best values are chosen in the range from . We tested values larger than 3, which increase the computation burden, but the performances do not increase.
The structure of proposed shown below in Tab. 3, which is a 15-layer deep CNN. The Input I is a -size preprocessed image. The first block “CBAM-1” contains 3 repetitions of 32 filters, each filter is with size of . The AM after CBAM-1 is symbolized as T1 with size of . We can observe the size of AMs after CBAM-2, CBAM-3, CBAM-4, and CBAM-5 are , , , and , respectively. The output of CBAM-5 T5 is flattened to a vector symbolized as FL with size of .
Here T5 is used to generate heatmap by the gradient-weighted class activation mapping (Grad-CAM)  method, which shows explanations (i.e., heatmaps) which region the ANC model pays more attention to and how our ANC model makes prediction.
The variable FL is then sent to three consecutive fully-connected layers (FCLs). The neurons of those three FCLs are set to 200, 50, and 2, respectively. The output of those three FCLs are symbolized as F1, F2, and F3, respectively, as shown in Fig. 9. Finally, F3 is sent through a softmax layer to output the probabilities of all classes.
3.6 Implementation and Evaluation
F-fold cross validation is employed on both datasets. Suppose confusion matrix over r-th run and f-th fold is defined as
where stand for TP, FN, FP, and TN, respectively. f represents the index of trial, and r stands for the index of run . At f-th trial, the f-th fold is used as test, and all the left folds are used as training, .
Note that is calculated based on each test fold, and are then summarized across all F trials, as shown in Fig. 10. Afterwards, we get the confusion matrix at r-th run
Seven indicators could be computed based on the confusion matrix over r-th run
where first four indicators are: sensitivity, specificity, precision, and accuracy. We have
is F1 score.
is Matthews correlation coefficient (MCC)
and is the Fowlkes–Mallows index (FMI).
There are two indicators and utilizing all the four basic measures . Considering the range of is , and the range of is , we finally choose as the most important indicator.
Above procedure is one run of F-fold cross validation. In the experiment, we run the F-fold cross validation R runs. The mean and standard deviation (MSD) of all seven indicators are calculated over all R runs.
where v1 stands for the mean value, and v2 stands for the standard deviation. The MSDs are reported in the format of . The pseudocode of our evaluation is displayed in Algorithm 2. In the experiment, the evaluations of two datasets will be reported.
4 Experiments, Results, and Discussions
4.1 Parameter Setting
Tab. 4 shows the parameter setting. Here the crop values of top, bottom, left, and right are all set to 200 pixels. The size of preprocessed image is set to . The minimum and maximum gray values are set to 0 and 255, respectively. The noise density of SAPN is set to . The mean and variance of uniformly distributed random noise in SN are set to 0 and 0.05, respectively. Both the number of folds and the number of runs are set to 10.
4.2 Results of Proposed 18-Way DA
Take Fig. 3b as an example, Fig. 11 displays the Na results of this original image. Due to the page limit, the horizontal mirrored image and its corresponding Na results are not displayed in Fig. 11. As it is calculated, one image will generate images.
4.3 Statistics of Proposed ANC
First, the statistical results on the first dataset and the second dataset are shown in Tabs. 5 and 6 respectively. We can observe that for Dataset-1, the sensitivity is , the specificity is , the precision is , and accuracy is , the F1 score is , MCC is , and FMI is . For dataset-2, the performances are slightly lower than those of dataset-1. The sensitivity is , the specificity is , the precision is , the accuracy is , F1 score is , MCC is , and FMI is . Fig. 12 display the receiver operating characteristic (ROC) curves on datasets D1 and D2. The area under curve (AUC) of proposed ANC model on two datasets are 0.9951 and 0.9911, respectively.
4.4 Effect of Attention Mechanism
This ablation study removed attention blocks (CAM and SAM) from our ANC model. Here NA in Tab. 7 means no attention. Tab. 7 shows that removing attention blocks will reduce the performance significantly. For example, the MCCs, i.e., values, of ANC-NA and ANC on the first dataset are and %, respectively. For the second dataset, the MCC values of ANC-NA and ANC are %, and %, respectively.
Fig. 13 shows the error bars of different models, where D1 and D2 stand for the 1st and 2nd dataset, respectively, as introduced in Section 2. It clearly indicates that using attention mechanism can help improve the classification performances on both datasets.
4.5 Comparison to State-of-the-Art Approaches
We compare our method “ANC” with 9 state-of-the-art methods: ELM-BA , RBFNN , ResNet-50 , ResNet-18 , GoogleNet-COD , DCNN-SP , WE-BBO , WRE , and COVNet . All the methods were evaluated via 10 runs of 10-fold cross validation. The MSD results on ten runs of 10-fold cross validation are displayed in Tab. 8, where the definitions abbreviations can be found in Section 1. This proposed ANC outperforms all the other 9 comparison baseline methods in terms of all indicators.
Fig. 14 shows the 3D bar plot of our ANC method and 9 state-of-the-art algorithms. All the algorithms are sorted in terms of , i.e., MCC indicator. As is shown here, the best is our ANC. The second best and the third best are DCNN-SP  and COVNet , respectively. The reason why our ANC model gives the best results are three folds: (i) The proposed 18-way DA help resist overfitting; (ii) The CBAM help build our ANC model integrate attention mechanism; and (iii) The structure of ANC is fine-tuned (See Tab. 3).
4.6 Explainable Heatmap
Fig. 15 shows the manual delineation and heatmap results of Fig. 3. Figs. 15a–15c show the delineation, where the healthy control image has no lesion (Fig. 15c), so the radiologist does not delineate the lesion regions in Fig. 15c. Figs. 15b–15d show the corresponding heatmaps, where we can observe that this proposed ANC model can find the COVID-19 related regions correctly.
This paper proposed a novel attention network for COVID-19 (ANC) model. The classification results of ANC model are better than 9 state-of-the-art approaches in terms of seven indicators: sensitivity, specificity, precision, accuracy, F1 score, MCC, and FMI.
There are two disadvantages of our proposed method: (i) The dataset is relatively small. We expect to expand the dataset to have more than 1k images. (ii) Graph neural network technologies will be attempted to integrate to our model.
Funding Statement: This paper is partially supported by Open Fund for Jiangsu Key Laboratory of Advanced Manufacturing Technology (HGAMTL-1703); Guangxi Key Laboratory of Trusted Software (kx201901); Fundamental Research Funds for the Central Universities (CDLS-2020-03); Key Laboratory of Child Development and Learning Science (Southeast University), Ministry of Education; Royal Society International Exchanges Cost Share Award, UK (RP202G0230); Medical Research Council Confidence in Concept Award, UK (MC_PC_17171); Hope Foundation for Cancer Research, UK (RM60G0680); British Heart Foundation Accelerator Award, UK.
Conflicts of Interest: The authors declare that they have no conflicts of interest to report regarding the present study.
|This work is licensed under a Creative Commons Attribution 4.0 International License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.|