Neurological disorders such as Alzheimer’s disease (AD) are very challenging to treat due to their sensitivity, technical challenges during surgery, and high expenses. The complexity of the brain structures makes it difficult to distinguish between the various brain tissues and categorize AD using conventional classification methods. Furthermore, conventional approaches take a lot of time and might not always be precise. Hence, a suitable classification framework with brain imaging may produce more accurate findings for early diagnosis of AD. Therefore in this paper, an effective hybrid Xception and Fractalnet-based deep learning framework are implemented to classify the stages of AD into five classes. Initially, a network based on Unet++ is built to segment the tissues of the brain. Then, using the segmented tissue components as input, the Xception-based deep learning technique is employed to extract high-level features. Finally, the optimized Fractalnet framework is used to categorize the disease condition using the acquired characteristics. The proposed strategy is tested on the Alzheimer’s Disease Neuroimaging Initiative (ADNI) dataset that accurately segments brain tissues with a 98.45% of dice similarity coefficient (DSC). Additionally, for the multiclass classification of AD, the suggested technique obtains an accuracy of 99.06%. Moreover, ANOVA statistical analysis is also used to evaluate if the groups are significant or not. The findings show that the suggested model outperforms various state-of-the-art methods in terms of several performance metrics.
Alzheimer’s disease (AD) is an illness that affects the brain and causes it to deteriorate, which is a neurological state. In this, the cells in the brain die, causing cognitive impairment and memory loss. It is one of the most frequent types of dementia, that has a significant detrimental influence on the social and personal lives of people [
Chemicals, head injuries, and genetic environmental factors are some of most important causes of AD. Behavior and mood instability, communication and recognition problems, learning issues, and memory loss are common signs of AD [
The hippocampus and cerebral cortex sizes are reduced in AD patients’ brains. However, the ventricle’s size is increased in the brain. If the hippocampus size is reduced, the episodic and spatial memory parts are damaged. It also decreases the connectivity between the body and brain [
AD is a multiclass classification issue, the majority of existing research is focused on binary classification, in which either an individual has AD or does not. However the more significant part of diagnosis should be identifying the stage of the disease [
Various studies have employed a variety of machine learning algorithms to categorize Alzheimer’s disease using neuroimaging data. However the, conventional machine learning algorithms necessitate the human extraction of features before categorization. User-defined features-based approaches have some drawbacks one such is the failure to select the unique features related to the problem [
To counter the drawbacks of traditional methods in classification and feature extraction, in this paper, an effective hybrid Xception-Fractalnet-based deep learning technique is proposed for the Alzheimer’s disease classification system. It classifies AD into five classes namely Cognitively Normal (CN), Late Mild Cognitive Impairment (LMCI), Early Mild Cognitive Impairment (EMCI), Mild Cognitive Impairment (MCI), and Alzheimer’s Disease (AD). Moreover, a Unet++ based effective segmentation network is applied to improve the classification system’s performance and identify AD in the beginning. Compared to other deep learning techniques, the fractal architecture contains different levels of node interaction. Complex tasks are efficiently handled by this network due to its networks’ constant connection among various layers. Further, Unet++ contains dense and nested skip connections which provide an accurate semantic segmentation map. Therefore, this proposed hybrid approach achieves superior results than existing techniques in AD segmentation and classification.
The important contributions of this paper are listed in the following three aspects,
The deep learning-based new framework is suggested to solve the direct five-class classification task in AD which is much more difficult than the traditional single or multiple binary classifications. Depthwise separable convolution in the Xception architecture is developed to enhance the local features and retrieve discriminative features of critically modified microstructures in the brain. The suggested model considerably improves the accuracy of the Alzheimer’s Disease MRI diagnosis by incorporating hyperparameter optimization through a metaheuristic approach. An extensive analysis of the Alzheimer’s disease Neuroimaging Initiative (ADNI) data set revealed a significant performance improvement compared to benchmark techniques.
The remaining part of this paper is organized as follows: In Section 2, the related work about AD classification is discussed. The motivation for the work is presented in Section 3. In Section 4, the procedure and the key technology of the proposed approach are described. The evaluation metrics and experimental of the approach are shown in Section 5. Limitations and future scope are displayed in Section 6 and the conclusion is drawn in the last Section.
In recent years, many new frameworks have been developed that support tissue segmentation and AD categorization. Some of them are discussed in this section.
This part contains papers that (1) address any deep learning or machine learning approach and (2) analyze works on the categorization of AD and brain tissue segmentation, and (3) analyze performance assessment metrics. To find the peer-reviewed academic publications, we combined several words, like “multi-class classification of AD,” “classification of AD with deep learning,” and “AD tissue segmentation”, “Alzheimer’s disease classification”. We have concentrated on recent technological advancements made in 2021 and 2022. The six databases such as SpringerLink, ScienceDirect, IEEE Xplore, and Scopus have been our primary focus. The above-mentioned online databases have been selected as they provide some important complete conference papers and peer-reviewed articles in the area of AD classification with deep learning and machine learning techniques. Relevant papers were discovered by analyzing the title and abstract. For both forward and backward searching, Google Scholar was employed.
Li et al. [
For brain tissue segmentation, the U-net technique was proposed by Basnet et al. [
Another approach based on U-net was presented by Long et al. [
Bhuvaneswari et al. [
Xu et al. [
Basheera et al. [
For AD classification, AbdulAzeem et al. [
Turkson et al. [
Amini et al. [
To recognize and classify the stages of AD, Al-Adhaileh [
The Transfer learning-based deep CNN frameworks were discussed by Srinivas et al. [
Yang et al. [
Alam et al. [
Devnath et al. [
AD categorization using MRI image data is difficult as the MRI data contains higher variance in inconsistency, texture, contour, discontinuity among the tumor cells, and the normal area in the brain image. To diagnose the tumor effectively, automatic segmentation and classification are needed as manual tumor region detection takes a lot of time. Moreover, learning low dimensional representations while keeping the structural details of AD data and minimizing the impact of noise is difficult from the extracted features. Although many solutions have been put out in the past to address the segmentation and classification issues, a quick and effective solution is still required to enhance segmentation and classification performance. This has led us to suggest a deep learning-based hybrid strategy using MRI imaging modalities to get the best results in brain tissue segmentation and tumor classification.
The working procedure of the proposed approach is detailed in this section. The algorithm of the proposed framework is based on five basic steps. The first stage is data pre-processing and augmentation, the second stage is input image segmentation, and the third and fourth stages are feature extraction and dementia classification. Initially, the input data are acquired from the ADNI dataset followed by the pre-processing methods implementation to eliminate noise and artifacts from data. Pre-processed image is then given into Unet++ based architecture for the segmentation of white matter, grey matter, hippocampus, and cerebrospinal fluid. Afterward, feature extraction and classification of AD are performed by Xception and fractal-net-based deep learning techniques. This technique classifies the AD into 5 classes. They are Cognitively Normal (CN), Late Mild Cognitive Impairment (LMCI), Early Mild Cognitive Impairment (EMCI), Mild Cognitive Impairment (MCI), and Alzheimer’s Disease (AD). The system architecture of this proposed work is shown in
In image processing, pre-processing is the significant phase for smoothing, noise removal, and enhancement of images. In this paper, the skull stripping technique was implemented in the pre-processing stage to remove the skull in the brain as it was not part of the region of interest. Hence, skull stripping helped in obtaining better results. Afterward, the quantum matched-filter technique (QMFT) was applied to remove the low-level noise from the image. In this procedure, the essential and specific image features were separated by the active contour. Due to this, unwanted information like noise was removed, local thresholds could simultaneously identify extensive details by combining small and extensive features and read all columns and rows diagonally and linearly. Quantum reaches QMFT, which are used for the reduction of noise in MRI images. In addition, various data augmentations like shearing, flipping, rotation (45°), and brightness improvement were performed. Due to this, the amount of database images increased.
For AD classification, the segmentation of cerebrospinal fluid, hippocampus, white matter, and grey matter regions of the brain are the most significant parts. For this purpose, the Unet++ network was adapted and trained to segregate the above-mentioned brain regions from pre-processed images. To improve the process of segmentation, the dense block and convolutional layers were provided among the decoder and encoder. Compared to the Unit model, the unet++ architecture has some additions like deep supervision, dense skip connections, and redesigned skip pathways. This architecture contains convolution units, skip connections among convolution units, and up-sampling and down-sampling modules. Each node in this network receives the skip connections at the same level as all convolution units.
Initially, the pre-processed image was given as the input to the Unet++. The input image was then convolved by a convolution layer in the encoder path to obtain the feature map. This feature map passed through the skip routes and was sent to the decoder path’s corresponding convolution. Three convolution layers into a dense convolution block appeared in the skip pathway among the corresponding encoder and decoder nodes. Here, a concatenation layer is preceded by each convolution layer which combined the output from the up-sampled result of the lower dense block with the corresponding dense block’s previous convolution layer. The loss of semantic information among the two pathways is minimized with this structure. Down-sampling was done in the encoder path using a maximum pooling operation with a 2 kernel size and 1 stride. The feature map is half the size with this window and stride arrangement. The features were effectively extracted from the image using a down-sampling operation in the encoder path and the up-sampling was employed in the decoder pipeline to double the size of the feature map. Lastly, the segmentation mask was generated based on the final feature maps.
Let us assume that the output of node Yi,j is denoted by Yi,j, where i and j denote the downsampling and convolution layers respectively. The downsampling layers appear in the encoder path and the convolution layers appear in the skip pathway. The generation of feature maps by Yi,j is described as
Here, the activation function associated with the 1-D convolution operation is represented by A(.), the up-sampling layer is denoted by u(⋅), and [⋅] represents the operation of concatenation. Nodes with level j = 0 received only one input from the encoder’s preceding layer, whereas nodes with level j > 0 received j + 1 inputs from both the up-sampling layer and skip connections. It’s important to note that the activation function is scaled exponential linear units (SeLUs) rather than ReLU, which allows for stronger regularisation approaches and more robust learning.
In this model, the deep supervision method was used to force the decoder blocks’ outputs to produce a valid segmentation map. Furthermore, the Unet++ training process’s loss function was based on the loss of categorical cross-entropy:
Here, zi is the proportion of classes belonging to class i and k is the number of classes.
The segmented image was given as input to the exception framework for feature extraction. The Xception architecture is a modified version of the Inception architecture that uses Depth-wise separable convolutional modules instead of Inception modules. The feature extraction was formed by 36 convolutional layers which were separated into 14 modules in the Xception architecture. Each of the modules was surrounded by linear residual connections (excluding the last and first modules).
To extract features, the segmented pictures are first fed into the convolutional kernels of (3, 3, 64) and (3, 3, 128). The convolution layers’ calculations are as follows:
Here, bl denotes the offset parameter, the output of the lth convolutional channel of the jth convolutional layer is denoted by
Second, the depthwise separable convolution is used to extract more features from the feature maps. To decrease the calculation complexity and the number of parameters, depthwise separable convolution is utilized. The connection layer then receives the produced two-dimensional feature map as input.
Here, the threshold offset term is denoted by bl, and the fully connected lth layer’s weight coefficient is denoted by wl. The gradient descent approach is then used to change the training error reduction’s direction.
Here, the variations among actual output and desired output’s square are represented by E, and the squared function error with a change in ul is denoted by δl. Finally, the feature map of the connected layer yielded a 2048-dimensional feature vector. For classification, the feature vector was fed into Fractalnet.
In this work, the classification of AD is conducted based on Fractalnet architecture. The feature vector obtained from the exception network is given as the input to the fractal net for classification. In this network, five fractal blocks and a pooling layer were consequently arranged.
FBC(N) represents the fractal block where C represents the number of columns in the block. In this work, Fractalnet with two columns is used. Therefore, in each fractal block, the number of convolution layers 2C–1 = 22–1 = 3. B * 2C–1 was the overall depth of the convolution layer. Here the number of fractal blocks was denoted by B. As fractal architecture used 5 fractal blocks, the entire amount of convolution layers in the framework was 3 * no of fractal blocks = 3 * 5 = 15. In addition, the results from this shallow network were substantially faster.
The base function of this architecture contains a convolution layer, batch normalization, and ReLU activation function. The following equation shows the convolution layer’s mathematical function.
Here, the outcome of the current layer ‘l’ for the ‘s’ filter is denoted by
Here, the shift parameters, learning rate, standard deviation, and mean are denoted
A ReLU activation function is provided in the following equation
The coloration units are joined together to produce a fractal block with two columns using the join function. The
In the above equation, the number of columns are denoted by C, the join operation is denoted by ⊕ to calculate the mean of 2 convolution blocks, and the composition is denoted by ◦. Several inputs are combined into a single output unit by the Join layer. From input to output, there are 2C–1 convolution layers. To get a deeper network, this expansion rule is repeated. 2C–1 equals the fractal block’s depth. The fractal block with four columns has a depth of 24–1 = 15 convolution units.
The feature vector was fed into the fractal block, which had a joining layer and 3 convolution layers. The fractal block’s output was given to the pooling layer, which decreased the feature map’s dimension and the training parameters. The pooling layer and the fractal block sequence were repeated five times. Finally, the fully connected layer obtained the features followed by the softmax classifier which classified the classes of AD. To improve the performance of the classifier and reduce the error rate, the important parameters of Fractalnet such as learning rate, dropout rate, and batch size were optimized by the Emperor penguin optimization algorithm.
In this section, the EPO technique is discussed to tune the hyperparameters of the Fractalnet model, which improves the classification performance of the proposed model. The purpose of parameter optimization is to alter the classifier’s hyperparameters to the point where the classification performance is maximized.
The EPO algorithm is based on the huddling behavior of emperor penguins (EPs) in Antarctica. Foraging is usually done in colonies by EPs. The huddling habit of the animals when foraging is an interesting characteristic. As a result, the primary goal is to decide on a talented mover from the ground in a mathematically sound manner. Following the temperature profile
Here, C denotes the current round as determined by Iter max and Rn specifies the random number between 0 and 1: Because EPs tend to huddle together to maintain temperature, extra caution must be taken to safeguard them from nearby collisions. As a result, a set of two vectors
Here, the best result is denoted by
The EP population is initialized in EPO using arbitrarily manufactured unique EPs.
In this section, the performance and efficiency of the proposed framework are assessed by several experiments on the ADNI dataset using typical performance metrics. Windows 10 operating system is used to train and test the proposed method with 16 GB of RAM and an Anaconda navigator. Keras is used to run all of the simulations, with Tensorflow as the backend.
The data was gathered from the AD Neuroimaging Initiative (ADNI) database (adni.loni.usc.edu), which was utilized in several studies to classify Alzheimer’s disease. Dr. Michael W. Weiner founded ADNI in 2004 with the aid of public-private collaboration. The basic goals of ADNI were to investigate more reliable and sensitive methodologies on various diagnostic tools, such as structural MRI, PET, MRI, and clinical assessment, to track the early stages of AD and the course of MCI. 1296 MRI images were used in this study, divided into five categories: AD, LMCI, EMCI, MCI, and NC.
The entire dataset was split into a training set (95%) and a testing set (5%). To train the segmentation and classification network, the SGD optimizer was employed. SGD can reach global minima and provide great training accuracy when using momentum. The number of columns in each fractal block in the fractal net were altered from 1 to 4. The training time of the network was increased when the number of columns increased. The model, on the other hand, provided an improved accuracy for fractal blocks with two columns while requiring significantly less training time. As a result, the proposed model employed fractal blocks with two columns, each of which was repeated five times.
The accuracy of the model was evaluated after it is applied to training data, and this was referred to as training accuracy. Testing accuracy is the accuracy when the testing data is applied to the model to obtain the result. As demonstrated in
Both training and testing losses must be kept to a minimum. If the testing loss is greater than the training loss, the network is overfitting. Overfitting can be reduced by using the optimization technique to increase each fractal block’s dropout.
In this section, the segmentation performance of the proposed approach is evaluated and results are presented. Three variables were derived to assess segmentation performance. They are positive predicted value (PPV), sensitivity (SEN_S), and dice similarity coefficient (DSC). The Dice coefficient was used to evaluate the accuracy of the segmentation algorithm that measured the similarity of two samples. The PPV represents the proportion of all correctly segmented patellar dislocation points in the predicted patellar dislocation points and SEN_S represents the proportion of correctly segmented images to the real image which are defined as follows,
Here, True positive pixels denote TPV, False positive pixels denote FPV and the amount of false negative pixels denotes FNV. The ratio of true positive pixels to false negative pixels is denoted as sensitive (SEN_S).
The segmented images of the brain tissues and the comparison among the ground truth images are given in
The calculated results of the three assessment indexes come closer to showing superior segmentation effects. The quantitative results of the segmentation technique are shown in
Performance metrics | CN | MCI | EMCI | LMCI | AD |
---|---|---|---|---|---|
DSC | 97.91 | 98.12 | 98.78 | 98.89 | 98.56 |
PPV | 98.78 | 97.94 | 97.98 | 98.43 | 97.82 |
SEN-S | 99.11 | 98.65 | 99.34 | 99.76 | 98.89 |
For the suggested approach, detailed assessments show that the proposed method demonstrated its superiority in detecting significant object boundaries from brain MRI data. As a result, the phases of Alzheimer’s disease can be detected more precisely.
The suggested segmentation approach was compared to existing approaches in
Performance metrics | CNN [ |
Gaussian mixture model [ |
U-NET [ |
k-means clustering [ |
Proposed |
---|---|---|---|---|---|
DSC | 87.0 | 96.0 | 92.3 | 94.92 | 98.45 |
PPV | 84.6 | - | 90.4 | - | 98.19 |
SEN-S | 89.7 | - | 96.5 | 94.94 | 99.15 |
This section presents the multi-class classification results of the proposed approach and its comparison with the existing techniques to find out the effectiveness of the method. The purpose of the model evaluation is to help decide how well a given data model generalizes to new data so that we can distinguish between different models. To this end, we need a metric calculation to determine the performance of various models. The performance metrics used for classification are shown in the following equation.
A basic metric is classification accuracy, which calculates how much an instance class is correctly predicted by the model in the validation set.
The Recall indicates the classifier’s ability to locate all positive samples.
The positive prediction’s proportion is called precision as stated in the following equation,
The curve formed by comparing the True Positive Rate (TPR)
In estimating classification performance, the AUC is a key metric.
Metrics | CN | MCI | EMCI | LMCI | AD |
---|---|---|---|---|---|
Accuracy | 99.16 | 99.21 | 99.69 | 98.45 | 98.81 |
Precision | 99.47 | 100 | 100 | 100 | 99.13 |
recall | 97.35 | 98.23 | 98.89 | 98.65 | 98.04 |
AUC | 98.9 | 97.8 | 97.9 | 99.2 | 99.8 |
These subcategories were utilized to identify the phases of AD. Each of the classes has features that show the existence of AD, although in varying degrees depending on the subclasses’ dimensions.
Each model’s ROC curve is used to examine the classifier and is recognized as a valuable tool. For CN, MCI, EMCI, LMCI, and AD, the ROC curve is 98.9, 97.8, 97.9, 99.2, and 99.8 respectively. The AUC score for AD is the highest, although other classes also do well.
From
Performance metrics | Accuracy | Precision | recall | AUC |
---|---|---|---|---|
CNN [ |
95.2 | - | 94.6 | 97.2 |
Resnet-101 [ |
96.3 | - | 96.7 | - |
DSCNN [ |
75.32 | - | 80.13 | 81.41 |
Alexnet [ |
98.76 | 99.6 | 97.69 | - |
DEMNET [ |
95.23 | 96 | 95 | 97 |
3DCNN [ |
98.06 | - | 92.96 | - |
FCN [ |
96.8 | - | 95.7 | - |
Proposed | 99.06 | 99.72 | 98.30 | 98.72 |
In terms of recall, Alexnet attained good performance, and Dementia and FCN provided similar results. The accuracy, precision, and recall of fractalnet were better than the other four techniques. Although alexnet’s accuracy was 98.76% which is similar to fractalnet. Even though fractalnet was superior to alexnet in terms of all metrics. The graphical representation of accuracy and recall comparison is given in
To examine the statistical significance of observed performance results, we used a one-way ANOVA test for which the results are displayed in
df | Sum square | Mean square | f-value | ||
---|---|---|---|---|---|
Approaches | 3 | 5.8665 | 1.9555 | 5.3209 | 0.009797 |
Residuals | 16 | 5.8802 | 0.3675 | ||
Total | 19 | 11.7467 | 0.6182 |
Significant levels are determined by
The suggested method performs well in the segmentation and classification of AD, however, there are still a few issues that should be further considered in future studies. First, there are several redundant or irrelevant features are extracted during the feature extraction process. It may reduce the performance of the classifier. Therefore, an effective feature selection technique is required for this framework to select significant features from the feature set. Second, the proposed approach used only MRI image modalities. Other image modalities contain various types of features. Therefore, the enhancement of the proposed framework is needed to handle various types of image modalities. Third, in this work, only five classes of AD are detected. Therefore, the analysis can be performed for various other phases of AD, such as progressive MCI, stable MCI, etc. This could aid in the initial diagnosis of AD. Moreover, the time complexity of the proposed approach is a little high attributing to the nested connections and the interconnection between the nodes in unet++ and fractalnet which a little more time to produce the results. In the future, specific modifications need to be made to this framework to reduce the time complexity.
AD is a serious neurological syndrome that affects a large portion of the global population. Early detection of AD is essential to improve the quality of people’s lives and the development of better treatments with specialized medicines. The proposed framework was established to show the effectiveness of the deep learning algorithms to perform multi-class classification of AD and its various stages like LMCI, EMCI, MCI, CN, and AD. Experiments on the ADNI dataset are used to examine the proposed method in depth. Furthermore, the findings of the proposed approach are compared with the existing methods and the experimental findings show that the model exceeds the competition and achieves an accuracy rate of 99.06%.
Acronyms | Abbreviations |
---|---|
AD | Alzheimer’s Disease |
ADNI | Alzheimer’s Disease Neuroimaging Initiative |
MRI | Magnetic Resonance Imaging |
CADS | Computer-Aided Diagnosis Systems |
CN | Cognitively Normal |
LMCI | Late Mild Cognitive Impairment |
EMCI | Early Mild Cognitive Impairment |
MCI | Mild Cognitive Impairment |
CNN | Convolutional Neural Network |
KNN | K-Nearest Neighbor |
SVM | Support Vector Machine |
RF | Random Forest |
QMFT | Quantum Matched-Filter Technique |
SeLUs | Scaled Exponential Linear Units |
EPO | Emperor Penguin Optimization |
EP | Emperor Penguin |
SGD | Stochastic gradient descent |
PPV | Positive Predicted Value |
SEN_S | Sensitivity |
DSC | Dice Similarity Coefficient |