Modeling of Explainable Artificial Intelligence for Biomedical Mental Disorder Diagnosis

: The abundant existence of both structured and unstructured data and rapid advancement of statistical models stressed the importance of intro-ducing Explainable Artificial Intelligence (XAI), a process that explains how prediction is done in AI models. Biomedical mental disorder, i.e., Autism Spectral Disorder (ASD) needs to be identified and classified at early stage itself in order to reduce health crisis. With this background, the current paper presents XAI-based ASD diagnosis (XAI-ASD) model to detect and classify ASD precisely. The proposed XAI-ASD technique involves the design of Bacterial Foraging Optimization (BFO)-based Feature Selection (FS) technique. In addition, Whale Optimization Algorithm (WOA) with Deep Belief Network (DBN) model is also applied for ASD classification process in which the hyperparameters of DBN model are optimally tuned with the help of WOA. In order to ensure a better ASD diagnostic outcome, a series of simulation process was conducted on ASD dataset.


Introduction
The application of Artificial Intelligence (AI) methods is currently unstoppable and pervasive due to its incredible characteristics and high prevalence of adoption. But, it carries certain sorts of problems, opportunities, and risks which should be tackled in order to have an uncompromised and efficient development [1]. Explainable Artificial Intelligence (XAI) is the answer to overcome this challenge since it brings machines nearby human beings. From a study point of view, the discussion on XAI started several years ago while the idea developed with renewed vigor at the end of 2019 only. Then, Google declared its "AI-first" approach in 2017 and lately published a novel XAI toolset for developers [2]. Currently, different Deep Learning (DL) and Machine Learning (ML) applications exist whereas one could not understand the logic behind these mechanisms or how the decisions are made. So, 'BlackBox' was proposed to understand the effects and is based on ML methods. Its features are deliberated as the main challenge in the application of AI technique; Further, it also helps in decision making process which is not apparent and remain frequently unintelligible to the developers/experts' perspective [3].
XAI system is capable of explaining the logic behind decisions, describe the strengths and weaknesses of decision making, and offer insights about the upcoming behaviours. AI applications are otherwise observed as autonomous driving systems and are utilized in financial, healthcare and military/legal sectors. In this case, there is a need to have reasoning behind the artificial partners in order to trust the decision and the data attained. The most common AI framework is now provided by DL approaches, whereas a Neural Network (NN) of tens or hundreds of layers of 'neurons', or 'fundamental processing unit', is employed. The complication of DL frameworks makes them act like 'black boxes' due to which it becomes nearly impossible to find the appropriate method for systems to provide certain answers [4].
The application of AI in healthcare, especially in diagnostic imaging, is rapidly increasing [5]. However, the contribution of DL frameworks turns the attention towards "accountability" of the process. Since the DL solutions are being extensively used, these problems would gradually become a main attention. In healthcare industry, responsibility or accountability, undertaken by the professionals, plays a major role since it is directly associated with the patients' health. Each clinical decision should be taken based on available scientific evidence backed with professional experience [6]. The results produced by AI processing units is expected to contribute to medical decision making whereas the 'black box' architecture is barely compatible with healthcare sector. Moreover, this software application must be authorized whereas the importance of this process must be understood, despite the fact that it is an unexplained approach. Physicians are satisfied towards the application of approaches that use neural networks in challenging or complex diagnoses. However, they should also comprehend the way, how they conclude and validate the reports.
Autism Spectrum Disorder (ASD) is a mental illness development disorder that limits specific social behaviour and communication from normal evolution [7]. Indeed, the causes of ASD have been associated with neurological and genetic factors. In spite of its genetic association, ASD is often detected using behavioral indicators like imaginative ability, repetitive behaviours, social interaction, and interactions with others [8]. Children with ASD face severe earlier development problems than other infantile groups. These behavioral difficulties differ and comprise of challenges in responding to sensory information (tasting, hearing, smelling, and so on), difficulties in communicating, and impact the earlier learning process. Further, it also creates a challenging time in communicating with others and such children lag behind in language acquisition. A research study establishes that 33% of the children, with challenges except ASD, have few ASD symptoms, when unable to meet the complete classification conditions.
There have been several non-clinical and clinical diagnostic approaches available for ASD. Two established clinical diagnoses methods are Autism Diagnostic Interview (ADI) and Autism Diagnostic Observation Schedule-Revised (ADOS-R). Additionally, some of the approaches that are used in diagnostics are parent-based nonclinical or self-administered techniques namely, Social Communication Questionnaire (SCQ) and Autism Quotient Trait (AQ). It is important to note that the present mainstream ASD diagnostic tools consume significant time to conduct a comprehensive diagnoses. In order to develop the diagnostic procedure of ASD, scientists recently have begun to adopt ML approaches [9]. The key objectives of this research on ML approaches to diagnose ASD are to reduce the diagnoses time to provide quick access to healthcare service, to reduce the dimensionality of input dataset, and to improve diagnostic accuracy in such a way that the maximum ranked feature of ASD is detected. Machine Learning is a field of research that incorporates search methods, AI, and mathematics to derive precise prediction models from the dataset.
The current research article presents an XAI-based ASD diagnosis (XAI-ASD) model to detect and classify ASD in a precise manner. The proposed XAI-ASD technique involves the designing of Bacterial Foraging Optimization (BFO) for Feature Selection (FS) approach. In addition, Whale Optimization Algorithm (WOA) with Deep Belief Network (DBN) model is applied for ASD classification process, in which the hyperparameters of DBN model are optimally tuned with the help of WOA. In order to validate the proposed method in terms of achieving better ASD diagnostic outcomes, a series of experimental analysis was conducted. The experimental results highlight that the results achieved by the proposed method were better and the proposed XAI-ASD technique is superior to other stateof-the-art techniques under different measures. In short, the contributions of the research paper are listed as follows.
• A new XAI-ASD model is presented for detection and classification of ASD • A new BFO-based FS technique is designed to choose an effective subset of features for ASD detection. • A new WOA-DBN technique is derived for detection and classification of ASD.
• The performance of the proposed XAI-ASD technique was validated on benchmark dataset and the outcomes were examined under different evaluation parameters.
The research paper has the following sections. Section 2 provides a comprehensive review of existing ASD diagnosis techniques. Section 3 elaborates the proposed XAI-ASD technique and Section 4 validates the results attained from XAI-ASD technique. Finally, Section 5 draws the conclusion.

Literature Survey
In Pawar et al. [10], XAI is deliberated as a method that could be employed in both diagnosis and analysis of healthcare data through AI system. The aim of the presented method was to achieve result tracing, responsibility, model improvement, and transparency in healthcare domain. Lauritsen et al. [11] presented a new xAI-EWS method for early recognition of acute critical illnesses. The proposed xAI-EWS method potentially executes medical translation by attaining an estimation with EHR data and its description. Eslami et al. [12] summarized the current developments in ML model for the diagnoses of ASD and ADHD. The researchers described and outlined the ML approach, particularly DL methods that are relevant to the area of research in these domains, drawbacks of the access methods, and upcoming directions for the domain. Further, the researchers also predicted how ADHD, ASD, and other mental disorders would be diagnosed, quantified, and accomplished in future through imaging methods such as MRI and ML techniques.
Eslami et al. [13] presented a DL algorithm i.e., ASD DiagNet which exhibited relatively high accuracy in the classification of ASD from a typical neuroscan. Initially, the researchers incorporated classical ML and DL methods which permits the isolation of ASD biomarkers from MRI datasets. This approach named Auto-ASD-Network exploits the integration of DL and SVM in the classification of ASD scans from neurotypical scans. These interpretable methods assist in explaining the decisions made by DL technique which in turn results in knowledge acquisition for neuroscientists and clear analyses for physicians. Abbas et al. [14] proposed a multi modular, ML-based evaluation of autism including two corresponding models for unified results of diagnostics grade.
In Liang et al. [15], the main development is done based on the temporal coherencies among nearby frames as free supervision and setting up a global discriminative margin. Based on a wide range of assessment of the extracted features, the efficacy of these features has been established. Initially, the extracted feature is categorized by K-means approach to show how self-stimulation behavior is sorted out in a completely unsupervised manner. Next, a conditional entropy approach is employed to evaluate the efficacy of these features. Then, advanced result is attained by integrating the unsupervised TCDN approach with optimized supervised learning models (like k-NN, Discriminant, SVM).
Lombardi et al. [16] presented an explicable DL architecture to predict the age of a healthier cohort of subjects from ABIDE I database. This was done based on the morphological features that were generated from MRI scan images. Also, the researchers embedded two local XAI techniques such as SHAP and LIME to explain the results of DL algorithms, establish the participation of all brain morphological descriptors to the concluding predictable age of all subjects and explore the consistency of two techniques. Zhang et al. [17] proposed an approach in which DNN is exploited to analyze the subject's MRI and calculate the efficiency for initial screening of ASD. Early analyses of patients with fMRI were also related to sMRI. The experimental result shows that fMRI is highly sensitive compared to sMRI. Also, the researchers described the classification performance of fMRI.

The Proposed Model
In this study, a new XAI-ASD technique is designed and validated to detect and classify different stages of ASD. The presented XAI-ASD technique encompasses different processes namely, preprocessing, WOA-based FS, DBN-based classification, and WOA-based parameter tuning. Fig. 1 illustrates the overall working process of XAI-ASD model. The complete functional processes of these modules are offered in the following sections.

Pre-Processing
Initially, the patient data is pre-processed through three stages such as format conversion, missing value replacement, and class labeling. In the beginning, the input data in .arff format is modified into a companionable .csv format. Also, the missing values in the dataset are employed by median process. Eventually, class labeling model is incorporated to map the class labels of data against ASD.

Algorithmic Design of BFO-FS Technique
After collecting and pre-processing the patient data to remove the undesirable data, FS procedure is executed in which BFO technique is utilized. A standard BFO approach has two important facts as given herewith. Therefore, the fitness of i th bacterium is identified as J i (j, k, l), in the optimized place and is determined, utilizing the function of bacterium place as follows.
In (1), the lesser values of function refer to maximum fitness. i refers to i th bacterium, but j, k, and l transmit to central procedure of BFO technique: dispersal, reproduction, chemotaxis, and elimination.

Chemotaxis
It has massive amounts of swimming and flipping actions [18]. In j th chemotaxis method, the motion of i th bacterium is provided as follows.
But the swimming step length of i th bacterium is divided into a single swimming step size C(i). The count of swimming n and (i) vector way of i th bacterium from p dimension are provided in the optimized space. All the elements of (i) are numerical values in the range of −1 and 1, while a random value is fixed in the beginning within this range. When i th bacterium detects the maximum fitness place which existed in the favorable atmosphere from j th chemotaxis, it makes sure that it gets affected in the same way according to this time. Somewhat (i) a new random manner is elected.

Swarming
The bacteria is considered for both repulsion and attraction. The numeric relation is given herewith.
where d att represents the depth of the attracted material, when it is released by i th bacterium, but ω att signifies the width of the same attracted material. However, as it is not possible for two bacteria to co-exist in the same place, the repulsion is modified into h rep and ω rep . Next, in swarming method, the fitness of i th bacterium is provided as follows.

Reproduction
The bacteria repeats the reproduction process, when it is exposed to a better atmosphere; otherwise, it tend to die. So, based on chemotaxis and swarming methods, the fitness of whole bacteria is fixed and computed. The fitness of i th bacterium is defined as follows.
Half of the bacteria are at better state while, S r = S 2 is selected for survival and the other half tend to die. The survived bacteria then reproduces two colonies located at the same area and preserves the whole number of bacteria S set.

Elimination and Dispersal
Then all the bacterium reproduce at the probabilities of P ed . But the entire number of bacteria remains the same. If bacterium is distant, it can be randomly distributed to new places.
As demonstrated in (6), the elimination occurs but r i < P ed . The original place of i th bacterium P i is replaced utilizing the new one P i = (m 1 , m 2 , . . . , m D ). Therefore, a better variable m is upgraded for arbitrary variable, m which is solved in the optimized space.

Process Involved in BBFO-FS Algorithm
In FS issue, all the solutions are controlled towards the binary values of [0, 1]. In order to apply BFO technique in FS, BBFO technique is developed as a binary version. In BBFO technique, the solution is defined as 1D vector, but the length of vector depends on the amount of features in original dataset. All the cells in vector are valued at zero/one. The value 'one' represents the outcome feature that has been elected; otherwise, the value is defined as 0. Eq. (7) is utilized to map the continuous values for binary ones.
whereas Z mn stands for the creation of different solution vector, X and X mn refers to the continuous place of search agent, m at dimension, n.
FS is modelled as a multi-objective optimization issue but two differing objectives are achieved i.e., less count of FS and superior classifier accuracy. Here, better result is the solution in terms of less number of FS and superior classifier accuracy. The classifier accuracy of KNN is employed as Fitness Function (FF) in order to assess the performance of entire search agents. In order to balance amongst the number of FS in all solutions (lesser) and the classifier accuracy (higher), the FF in Eq. (8) is utilized in ISSA and whole technique to assess the search agents.
While Err(D) represents the classifier error rate of the known subset, ρ and ϕ refer to constants that control the classifier accuracy and decrease feature, |F| implies the size of known feature subset and |T| indicates the entire count of features.

Design of WOA-DBN Technique
At last, the extracted features are then fed to DBN method for the classification of ASD. The framework of DBN method is shown in Fig. 2. In general, the main difference between DBN and MultiLayer Perceptron (MLP) layers is that the DBN layer is capable of performing the learning procedure through layerwise unsupervised training procedures. However, the MLP approach fails to perform feature learning tasks and is widely used for classification procedure. The resemblance between DBN and the MLP is that both are FC networks. Further, BP method is employed for supervised learning and is executed in the classifier layers of DBN. RBM is a likelihood-based graphical network, realized by the stochastic NN, in which two output states of the neuron are included. It depends upon BM approach whereas a FE method is considered based on energy model and optimization occurs through unsupervised training. Here, RBM method includes two sets of layers i.e., visible layer v = (v 1 , v 2 . . . , v n ) which indicates the observation data, and hidden layer h = (h 1 , h 2 . . ., h m ) which is determined as the FE layer. Primarily, the two topmost layers are denoted as the joint distribution of last hidden (h l ) layer and output layer which are also called as the associated memory elements.
The learning process of the DBN models has two phases like supervised learning (fine-tuning) and unsupervised learning (pre-training) [19]. At first, the unsupervised learning is comprehended by contradictory deviations, to train the stacked RBM in a hierarchical manner. Then, the supervised learning is stimulated by BP method to fine-tune the early biases and weights of the whole network. The primary goal of the unsupervised training in DBN, is the optimization of RBM to extract the features from data. For a provided set of (v, h), the energy function E(v, h|θ) is determined as follows.
Whereas θ = {W ij , a i , b j } represents the parameters in RBM, w represents the weight of layer connection, and a & b denote the bias of hidden and visible layers of the neuron, respectively. The joint likelihood distribution is determined as follows.
During Gibbs sampling procedure, a likelihood distribution condition of the hidden neuron and visible neurons is represented herewith.
With Eq. (11), the likelihood h j represents an active state. Since RBM has a symmetric feature for the determined hidden layer state h, the activation state likelihood of each neuron in the visible layer is shown in Eq. (12). This process is employed to obtain the corresponding weight w of RBM. The unsupervised learning of DBN trains the RBM in a hierarchical manner to obtain the early weight, W = {w 1 , w 2 . . . w l }. Supervised learning includes the fine-tuning of connections weight made from the unsupervised learning. BP method defines the gradient by labelling the training data set and altering the network variables, between the layers, to reduce the gradient. Finally, deep network framework is developed using the least predictive errors.
In order to achieve the optimum tuning of hyperparameter in DBN method, WOA is implemented. In the initial phase, an initialized procedure takes place. In the surrounding prey model, a humpback whale is noted around the place of prey. The whale starts surrounding the prey. To unclear the place in searching area, the current optimum result is regarded as the prey. If the optimum searching agent is determined, the other searching agent refreshes the condition in the way of optimum searching agent.
where S = 2.r and Y = 2.I.r − I. The optimum fitness result is attained at present which has an association with some of the parameters. It is not required to define the primary group of parameters and step sizes for the ideal solution.
Based on the fitness value of above iteration, the coefficient vector "Y" is attained by receiving the flexible probabilities' function as follows.
where, f min and f max denote the minimal and maximal values of Fitness Function (FF), but C 1 and C 3 lies in the range of [0, 1]. The place in the way of ideal solution gets dynamically altered by functional fitness. Then, to define the bubble-net nature of humpback whales, two enhanced methods are employed. The bubble-net process occurs with the help of exploitation as well as exploration phases.
Spiral procedure is employed amongst the place of whale and prey to mimic the helix-framed development of humpback whale which is given herewith [20]: It is observed that the humpback whale swims over the prey in constricting circle and also in a wind molded manner.
In order to illustrate this synchronous efficiency, 50% probabilities have been forecasted to have been optimized among the contracting surrounded and spiral schemes of whales. The mathematical expression as: where y → arbitrary value amongst [−1 to 1]. This illustrates the synchronous efficiency i.e., half probability that can be picked amongst the contract neighboring technique and twisted technique to revise the state of whales in the midst of optimization. The administrator makes a request by selecting a search expert aimlessly before an optimum search operator.
So, the arbitrary values are employed mainly over 1 or under −1 to make the search agent reserved in reference to whale. Then during exploitation phase, the search place expert gets refreshed in the exploration phase as mentioned by arbitrarily-picked search operator. This is executed before an optimum search operator shows up. This technique | Y| > 1 emphasizes the exploration and lets the WOA approach to progress towards a global optimal value and | Y| > 1 to stimulate the place of search agent. This process gets iterated until the maximal amount of iterations is achieved. A novel group of solutions is validated depending on the upgraded model.

Performance Validation
The current section discusses the experimental validation of the proposed XAI-ASD technique on the applied benchmark dataset from UCI repository. The details related to the dataset are listed in Tab. 1. Besides, the proposed XAI-ASD technique selected the following features from the dataset: 1, 3, 4, 6, 7, 8, 9, 12, 14, 16, and 19. Besides, Tab. 2 lists out the features that exist in the ASD dataset.  Born with jaundice 5 Family member with Pervasive Development Disorders (PDD) 6 Who is fulfilment the test 7 Country of residence 8 Used the screening app before or not 9 Screening test type 10-19 Based on the screening method answers of 10 questions 20 Screening score 21 Target class [Yes/No]  The ASD classification results of the proposed XAI-ASD technique on the applied dataset is given in Tab. 3 and Fig. 4. The experimental results demonstrate that XAI-ASD technique achieved the maximum ASD classification outcome. For instance, on ASD-Children dataset, XAI-ASD technique achieved a sensitivity of 94.76%, specificity of 95.57%, accuracy of 95.43%, F-score of 94.82%, and kappa of 90.87%. In addition, on ASD-Adolescent dataset, the proposed XAI-ASD approach obtained a sensitivity of 92.59%, specificity of 98.66%, accuracy of 96.43%, F-score of 92.80%, and kappa of 90.60%. Moreover, on ASD-Adult dataset, the proposed XAI-ASD method achieved a sensitivity of 97.09%, specificity of 93.12%, accuracy of 95.44%, F-score of 96.85%, and kappa of 92.67%. In order to showcase the effective performance of XAI-ASD technique, a detailed comparative analysis was made against existing ASD diagnosis techniques and the results are shown in Tab. 4 [20]. The results are inspected in terms of sensitivity, specificity, and accuracy.   5 shows the results of sensitivity analysis of the proposed XAI-ASD technique against existing models. The figure portrays that DT, NN, and LR models produced low sensitivity values such has 53.3%, 53.3%, and 55.5% respectively. At the same time, IWOA-FRBC technique produced a somewhat moderate outcome. But the XAI-ASD (Children), XAI-ASD (Adolescent), and XAI-ASD (Adult) techniques accomplished maximum sensitivity values such as 94.76%, 92.59%, and 97.09% respectively. Fig. 6 shows the results of specificity analysis of the proposed XAI-ASD approach against other recent algorithms. The figure exhibits that DT, LR, and NN methods achieved less specificity values such as 54.90%, 62.60%, and 71.20% correspondingly. Simultaneously, IWOA-FRBC technique produced a slightly moderate outcome. However, XAI-ASD (Children), XAI-ASD (Adolescent), and XAI-ASD (Adult) methodologies accomplished maximal specificity values such as 95.57%, 98.66%, and 93.12% correspondingly.   In current study, a new XAI-ASD technique is designed and validated to detect and classify the different stages of ASD. The proposed XAI-ASD technique encompasses different processes namely pre-processing, WOA-based FS, DBN-based classification, and WOA-based parameter tuning. The use of WOA-FS technique helps in the selection of optimal feature subsets. Besides, the usage of WOA helps in the optimal selection of hyperparameters of DBN model that in turn helps in accomplishing an improved ASD diagnostic performance. In order to validate the ASD diagnostic outcomes by the proposed model, a series of simulations was conducted on benchmark dataset. The experimental results highlight the betterment of XAI-ASD technique over other recent state-of-the-art techniques under different measures. As a part of future work, the proposed XAI-ASD technique can be tested using real-time dataset collected from hospitals and IoT devices.

Conflicts of Interest:
The authors declare that they have no conflicts of interest to report regarding the present study.