|Computers, Materials & Continua |
Deep Learning and Improved Particle Swarm Optimization Based Multimodal Brain Tumor Classification
1Department of Computer Science, HITEC University, Taxila, 47040, Pakistan
2College of Computer Science and Engineering, University of Ha’il, Ha’il, Saudi Arabia
3Department of Computer Science and Engineering, Soonchunhyang University, Asan, Korea
4School of Architecture Building and Civil Engineering, Loughborough University, Loughborough, LE11 3TU, UK
5Department of Robotics, SMME NUST, Islamabad, Pakistan
*Corresponding Author: Yunyoung Nam. Email: email@example.com
Received: 08 November 2020; Accepted: 05 February 2021
Abstract: Background: A brain tumor reflects abnormal cell growth. Challenges: Surgery, radiation therapy, and chemotherapy are used to treat brain tumors, but these procedures are painful and costly. Magnetic resonance imaging (MRI) is a non-invasive modality for diagnosing tumors, but scans must be interpretated by an expert radiologist. Methodology: We used deep learning and improved particle swarm optimization (IPSO) to automate brain tumor classification. MRI scan contrast is enhanced by ant colony optimization (ACO); the scans are then used to further train a pretrained deep learning model, via transfer learning (TL), and to extract features from two dense layers. We fused the features of both layers into a single, more informative vector. An IPSO algorithm selected the optimal features, which were classified using a support vector machine. Results: We analyzed high- and low-grade glioma images from the BRATS 2018 dataset; the identification accuracies were 99.9% and 99.3%, respectively. Impact: The accuracy of our method is significantly higher than existing techniques; thus, it will help radiologists to make diagnoses, by providing a “second opinion.”
Keywords: Brain tumor; contrast enhancement; deep learning; feature selection; classification
Brain tumors are the 10th most common type of cancer worldwide [1,2], and glioma is the most prevalent brain tumor. A low-grade glioma (LGG) can be cured if diagnosed early; high-grade gliomas (HGGs) are malignant. Generally, an LGG does not spread . The World Health Organization grades benign and malignant tumors as I, II and III, IV, respectively . Symptoms include difficulty speaking, short-term memory loss, frequent headaches, blurred vision, and seizures; these vary by tumor size and location. Magnetic resonance imaging (MRI) is used to visualize brain tumors. However, accurate classification is not possible with a single MRI sequence; multiple MRI sequences (T1, T1 with contrast enhancement, T2, and FLAIR  are required). In the United States alone, approximately 22,850 patients are diagnosed with brain tumors annually ; the number in 2019 was 23,890 (13,590 males and 10,300 females), including 18,020 deaths (10,190 males and 7,830 females). MRI is much more efficient than computed tomography; the amount of radiation is lower, while the contrast is higher. Analysis of MRI scans is difficult ; an automated approach is required . The typical analytical steps include preprocessing, feature extraction and reduction, and classification. Some researchers have used image segmentation for tumor detection, while others have focused on feature extraction for classification based on tumor intensity and shape [8,9]. Features extraction is an essential step in disease classification . Based on the features, the tumor is identified by feature properties including intensity, shape, etc. More recently, deep learning gives more impressive results for medical infection classification. Deep learning is invaluable for detecting and classifying tumors . There are several pretrained models  that classify extracted features using supervised learning algorithms such as Softmax, support vector machine (SVM), naïve Bayes, and K-nearest neighbor (KNN) .
In medical imaging, deep learning shows huge performance for both disease detections and classification. The major medical diseases are brain tumors , skin cancers , lung cancers , stomach conditions , retinal injuries , and blood diseases , among other conditions [19–21]. Brain tumor analysis remains challenging ; several techniques are available but none of them are 100% accurate [23,24]. Most techniques are based on machine learning , which facilitates early tumor detection . Convolutional neural networks (CNNs) , K-means algorithms , decision-level fusion , machine learning-based evaluation , and deep learning  approaches have all been used. Tanzila et al.  accurately detected tumors using feature fusion and deep learning. A grab-cut method was used for segmentation. The geometry of a transfer learning (TL) model was fine-tuned to identify features, and a serial-based method was used to fuse them. All features were optimized by entropy. The tumor detection accuracy was 98.78% for BRATS 2015, 99.63% for BRATS 2016, and 99.67% for BRATS. Schadeva et al.  improved segmentation and brain tumor classification accuracy using an active contour model that focused on the area of interest; features were extracted, reduced by principal component analysis, and classified using an automated neural network. The classification accuracy was 91%. Mohsen et al.  used deep learning for brain tumor classification. MRI scans were segmented using the fuzzy c-means approach and discrete wavelet transformation was applied to extract features. A deep neural network performed the classification with an accuracy of 96.97%. The linear discriminant analysis (LDA) accuracy was 95.45% and that of sequential minimal optimization (SMO) was 93.94%. The deep learning network resembled a CNN, but required less hardware and was much faster.
Problem Statement: The major challenges in brain tumor classification are as follows: (i) manual evaluation is difficult and time-consuming; (ii) tumor resolution is low and irrelevant features may be highlighted; (iii) redundant features cause classification errors; and; (iv) tumors grades I–IV look relatively similar. To resolve these issues, we present an automated classification method using deep learning and an improved particle swarm optimization (IPSO) algorithm.
Contributions: The major contributions of this study are as follows: (i) MRI scan contrast is improved using an evolutionary approach, i.e., ant colony optimization (ACO); (ii) a pretrained VGG-19 model is fine-tuned via TL; (iii) features are extracted from two different dense layers and fused into one matrix; and, (iv) the IPSO is combined with a bisection method for optimal feature selection.
The remainder of this manuscript is organized as follows. The ACO, improvement of the original image contrast, TL -based fine-tuning, serial feature fusion, and IPSO are discussed in Section 2, the HGG and LGG results are presented in Section 3, and the conclusions are provided in Section 4.
2 Proposed Methodology
We used deep learning for multimodal classification of brain tumors. The contrast of the original images was improved by ACO, and the images were used to train a CNN. TL of brain images was used to enhance a pretrained model. Features computed by different layers were aggregated, and the IPSO was used to select optimal features that were then classified using a one-against-all multiclass SVM (MSVM) classifier. The overall architecture is shown in Fig. 1.
2.1 Contrast Enhancement
Contrast enhancement is very important because unenhanced images exhibit low contrast, noise, and very poor illumination . Several enhancement techniques are available; we used an ACO-based approach.
Initial Ant Distribution—The number of ants is calculated as:
where l is the length of the image, w is the width, and AN is the number of ants randomly placed in the image (one ant).
Decision-based on Probability—The probability that ant n moves from pixel (e, f) to pixel (g, h) is given by:
Here, all pixel locations are written . is the pheromone level. the visibility, and is calculated as follows:
The probability equation shows that -plus reflects the stepwise directional fluctuation:
where is the weight function. Together with the function above, the weight function ensures that sharp turns by ants are less frequent than gentle ones, which we refer to as “probabilistic forward bias.”
Rule of Transition—Mathematically, the rule of transition is expressed as:
where ij is the pixel location, from which ants can travel to pixel . If , an ant can visit the next pixel [see Eq. (2)].
Updating Pheromone Levels—An ant can move from pixel ij to pixel , as stated above, and the pheromone trajectory is given by:
A new trajectory is obtained after each iteration, as follows:
where is the proportion of pheromone that evaporates and is the initial pheromone concentration . Applying the above steps to all image pixels yields an enhanced image (Fig. 2).
2.2 Convolutional Neural Network
A CNN is a type of deep neural network that can be used for image recognition and classification, and object detection . A CNN requires minimal preprocessing. During training and testing, images pass through kernel layers, and are pooled and then fully connected; this is followed by Softmax classification. Probability values range from 0 to 1. Several pretrained CNN models are available, including VggNet and AlexNet . VggNet has valuable medical applications . We used a pretrained VGG-19 model  which includes 16 convolutional layers (local features), 3 fully connected layers, and max-pooling and ReLu layers (Fig. 3).
VGG-19 contains N fully connected layers, where N = 1–3. The PN units of the Nth layers are NRW = 224, Nc = 224 and Nch = 3. The dataset is represented by , and the training sample by . Each is a real number :
where is the first weight matrix, is the Relu activation function, RW the number of rows, c the number of columns, and ch the number of channels. is the bias vector and is the weight of the first layer, defined as:
The output of the first layer becomes the input of the second layer; this step is repeated as follows:
Here, by way of example, and are the second and third weight matrices, respectively. and . represents the last fully connected layer used for high-level feature extraction. Mathematically:
where is the cross-entropy function, B is the total number of classes cl, and ob and p the predicted probabilities.
2.4 Transfer Learning
TL occurs when a system acquires knowledge and skills by solving a specific problem, and then uses that knowledge to solve another problem . We used TL to further train, and improve the performance, of a pretrained model. The input was , and the original learning task can be described as: . The target was and the new learning task can be written as where n < < m and are the training data labels (Fig. 4).
Feature Extraction and Fusion: After TL, activation is required for feature extraction. We extracted features from FC layers 6 and 7. The feature vector of FC layer 6 had dimensions of , and that of FC layer 7 4,096. Mathematically, the vectors are expressed as and ; both and . We then fused the vectors into a single matrix to derive optimal tumor data. This can be done using serial, parallel, and correlational techniques. We used the lengths of extracted features and no features were discarded. Mathematically, the fused matrix can be expressed as:
where is a fused matrix with dimensions of N is the number of images used for training and testing. k1 and k2 both have a value of 4,096. The fused vector includes a few irrelevant/redundant features, which were removed by IPSO.
2.5 Features Selection and Classification
It is important to select appropriate features for classification, because irrelevant features reduce classification accuracy and increase the computational time . However, it is not easy to identify the most important features because of their complex interactions. A good feature vector is required; in this study we used the IPSO algorithm. The original PSO  was a global search algorithm using evolutionary computation. PSO, as a population-based algorithm inspired by flocks of birds and schools of fish, is more effective than a general algorithm  in terms of convergence speed. Particles are initially placed randomly, and their velocities and positions are iteratively updated. The current and updated particle locations are referred to as and , respectively. The IPSO reduces the number of iterations required by including a “stop” condition based on a bisection method (BsM). The selected values are approximated and the algorithm is then terminated; the accuracy of each iteration is approximately the same as the previous one. Assuming that the position of the nth particle is and the velocity is , the local best particle is and the global best particle is . The updated position of the ith particle is calculated as:
where , , S is the number of iterations, N is the size of the swarm, R1 and R2 are random numbers [0, 1], are acceleration coefficients, and x is the inertial weight. A linear value of x that varies with time is calculated as:
Here, T is the maximum iteration time, xmax is the upper limit, and xmin is the lower limit. During feature selection, every solution is a subset of features. Each set of particles is denoted as a binary vector, and every particle has a specific position. The Mth feature is defined by the Mth position. Features are selected by the IPSO, which begins with a random solution and then moves toward the best global solution (represented by a new subset of features). Each feature is linked to a dataset that occupies a search space. If the Mth position is 1, the Mth feature is considered informative, while if the Mth position is 0, the Mth feature is not informative. If the Mthposition is −1, the Mth feature is not added to the set.
Fitness Function: Each solution yielded by the selection algorithm was tested in terms of fitness within every generation. If accuracy improved, the current solution was the best one. The solution with maximum fitness is the best one overall. We used the fine KNN classifier and BsM. The starting accuracy was 90.0 (), and the final accuracy is expressed as t. The midpoint of and t was computed and the root was found. If the root was equal to zero, the algorithm terminated; otherwise, the next iteration started and the root between t and t + 1 was found. If the interval was not zero, the midpoint of t and t + 1 was determined, and the following criteria were checked:
Thus, the values were updated until two successive iterations became very similar. We initially selected 100 iterations, but the algorithm stopped between 10 and 20 iterations, yielding a vector containing approximately 40% of all features that were finally classified using the one-against-all SVM.
Consider an N-class problem with B training samples, , where is an n-dimensional feature vector and . The method builds N binary SVM classifiers, and each classifier separates all classes. Training of the i-th SVM uses all samples with i − th-positive labels and the remaining negative labels :
if tj = i, and otherwise.
Sample s is classified into the class i*, the d* of which is the highest during classification:
3 Experimental Results and Comparison
We analyzed the BRATS 2018 dataset , which contains HGG and LGG data. In total, 70% of the data were used for training and 30% for testing (Fig. 5). We evaluated multiple classifiers in terms of accuracy, sensitivity, precision, the F1-score, the area under the curve (AUC) the false-positive rate (FPR), and computational time. All simulations were run on Matlab 2019a (MathWorks, Natick, MA, USA) using a Core i7 processor, 16 GB of RAM, and an 8 GB graphics card.
3.1 Testing Results of HGG Images Data
We first classified HGG images (30% of all test images). The results obtained via fusion of the original feature vectors are shown in Tab. 1. The highest accuracy was 99.9%, for the MSVM, with a sensitivity of 99.25%, precision of 99.50%, F1-score of 99.3%, FPR of 0.00, and AUC of 1.00. The other accuracies were as follows: fine tree, 89.20%; linear SVM, 98.70%; coarse Gaussian, 95.80%; fine KNN, 99.70%; medium KNN, 97.70%; cubic KNN, 97.0%; weighted KNN, 99.20%; ensemble-boosted tree, 96.40%; and ensemble-bagged tree, 98.0%. Thus, the MSVM performed best. The confusion matrix is shown in Fig. 6; the accuracy rate always exceeded 99%. The computational times are listed in Tab. 1. The medium KNN had the shortest computational time, at 28.52 s but the accuracy was only 97.75%. The receiver operator characteristic (ROC) curves are shown in Fig. 7.
The optimized HGG features are listed in Tab. 2 (HGG). The highest accuracy was 99.9%, for the MVSM, followed by 85.20% for the fine tree classifier, 98.75% for the linear SVM, 95.50% for the course Gaussian, 99.60% for the fine KNN, 97.30% for the medium KNN, 97.50% for the cubic KNN, 99.20% for the weighted KNN, 93.30% for the ensemble-boosted tree, and 97.60% for the ensemble-bagged tree. Thus, the MSVM showed the best performance; the confusion matrix is shown in Fig. 8. The computational times are listed in Tab. 2. The coarse Gaussian SVM had the shortest computational time (6.17 s), but the accuracy was only 95.70%, i.e., lower than that of the MSVM. The ROC curves are shown in Fig. 9.
3.2 Testing Results of LGG Images Data
The original feature vectors for the LGG images were fused (Tab. 3). The highest accuracy (99.1%) was achieved by the MSVM, with a sensitivity of 99.00%, precision of 99.00%, F1-score of 99.00%, FPR of 0.002, and AUC of 1.00. The other accuracies were as follows: fine tree, 78.30%; SVM, 93.40%; coarse Gaussian, 82.60%; fine KNN, 98.00%; medium KNN, 91.90%; cubic KNN, 91.90%; weighted KNN, 96.50%; ensemble-boosted tree, 87.10%; and ensemble-bagged tree, 94.10%. In the confusion matrix shown in Fig. 10; the accuracy rate always exceeded 99%. The computational times are listed in Tab. 3 (last column). The fine KNN had the shortest computational time (27.56 s), but the accuracy was only 98.00%, i.e., less than that of the MSVM. The longest computational time was 356.66 s. The MSVM ROC curves are provided in Fig. 11.
The optimized LGG features are listed in Tab. 4. The MSVM showed the best classification performance, with an accuracy of 99.3%, sensitivity of 99.25%, precision of 99.25%, F1-score of 99.25%, FPR of 0.000, and AUC of 1.00. The computational time required was 11.92 s; however, the best time was in fact 6.25 s. The other accuracies were as follows: fine tree, 78.00%; linear SVM, 93.30%; coarse Gaussian, 85.40%; fine KNN, 98.20%; medium KNN, 93.30%; cubic KNN, 93.20%; weighted KNN, 97.30%; ensemble-boosted tree, 83.90%; and ensemble-bagged tree, 93.90%. The confusion matrix is illustrated in Fig. 12; the accuracy rate always exceeded 99%. The MSVM ROC curves are shown in Fig. 13. The use of optimal selected features improved classification accuracy and significantly reduced computational times.
3.3 Comparison with Existing Techniques
Comparison with the existing techniques is also conducted to validate the proposed method (can be seen in Tab. 5). This table shows that the best accuracy previously achieved on the Brats2018 dataset was 98% . In that approach, the authors used the LSTM approach. Amin et al.  achieved the second-best accuracy of 93.85%. In more recent work, Khan et al.  achieved an accuracy of 92.5% using a deep learning framework. Our proposed method is also deep learning-based. We have tested on both HGG and LGG brain images and achieved an accuracy of 99.9% and 99.3%, respectively. The main strength of this work is the selection of the optimal features using an improved PSO algorithm. Moreover, the proposed labeled results are also given in Fig. 14.
A new automated technique is proposed in this article for brain tumor classification using deep learning and the IPSO algorithm. The contrast of original MRI scans is enhanced using the ACO approach to learn a better CNN model. This step not only enhances the tumor region but also extracts more relevant features. Later, fusion of two-layer features improves the original accuracy of classification. A few redundant features are also added in the fusion process for classification, which does not yield the target accuracy. Therefore, another algorithm called the IPSO is proposed to improve the system’s accuracy and minimize computational time. Hence, we conclude that the most optimum features give better classification accuracy and decrease the system prediction time. The major limitation of this work is the proposed stopping criterion. There is a chance that the features after the stopping condition may perform well. In future, we aim to try to enhance this stopping criterion and will perform experiments on the BraTs2019 dataset as well.
Funding Statement: This research was supported by Korea Institute for Advancement of Technology (KIAT) grant funded by the Korea Government (MOTIE) (P0012724, The Competency Development Program for Industry Specialist) and the Soonchunhyang University Research Fund.
Conflicts of Interest: The authors declare that they have no conflicts of interest to report regarding the present study.
|This work is licensed under a Creative Commons Attribution 4.0 International License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.|