|Intelligent Automation & Soft Computing |
An Efficient Allocation for Lung Transplantation Using Ant Colony Optimization
Biomedical Systems and Informatics Engineering Department, Hijjawi for Engineering Technology, Yarmouk University, Irbid, 21163, Jordan
*Corresponding Author: Lina M. K. Al-Ebbini. Email: Lebbini@yu.edu.jo
Received: 18 March 2022; Accepted: 20 April 2022
Abstract: A relationship between lung transplant success and many features of recipients’/donors has long been studied. However, modeling a robust model of a potential impact on organ transplant success has proved challenging. In this study, a hybrid feature selection model was developed based on ant colony optimization (ACO) and k-nearest neighbor (kNN) classifier to investigate the relationship between the most defining features of recipients/donors and lung transplant success using data from the United Network of Organ Sharing (UNOS). The proposed ACO-kNN approach explores the features space to identify the representative attributes and classify patients’ functional status (i.e., quality of life) after lung transplantation. The efficacy of the proposed model was verified using 3,684 records and 118 input features from the UNOS. The developed approach examined the reliability and validity of the lung allocation process. The results are promising regarding accuracy prediction to be 91.3% and low computational time, along with better decision capabilities, emphasizing the potential for automatic classification of the lung and other organs allocation processes. In addition, the proposed model recommends a new perspective on how medical experts and clinicians respond to uncertain and challenging lung allocation strategies. Having such ACO-kNN model, a medical professional can summarize information through the proposed method and make decisions for the upcoming transplants to allocate the donor organ.
Keywords: Ant colony optimization (ACO); lung transplantation; feature subset selection; quality of life (QoL)
Organ transplantation is the best handling method for patients with untreatable organ diseases. Many pre-transplant factors could increase post-transplant mortality, and organ transplantation can decrease immunity levels and initiate threatening infections leading to early graft failure and mortality [1–3]. However, organ transplantation should increase survivability and improve quality of life.
Transplant success represents a critical aspect of organ transplantation. The standard clinics for organ transplantation include misinterpretation, mortality, and graft failure [4,5]. Therefore, an appropriate method is necessary to aid clinicians in calculating survival and making decisions. For instance, using the most important predictive factors and substitutable transplants may promote accurate prognostic systems. Modeling an accurate prediction system coincides mainly two steps: first, selecting the most essential and representative donor/recipient demographic and characteristic data that guarantee transplant success to a high degree. Secondly, designing an optimal approach to establish a satisfying allocation strategy and validate the transplantation outcome . The essential features to be clinically predicted are graft survival time and the functional status after transplantation (i.e., QoL). Meanwhile, their predictions represent a challenging problem.
It is worth mentioning that it is difficult for the patients to predict whether their durable quality of life is tolerable. Therefore, assessing the functional status after lung transplantation is a meaningful and helpful decision for patients outside the hospital after the transplantation process. Essentially, there is a particular emphasis on improving the expectation procedure of survival and QoL features .
In this setting, it has been reported by Rosso et al. that the lung allocation score (LAS) did not impact the long-lasting survival after transplantation . Instead, LAS was characterized by a high prediction of graft impairment only in the first three days after transplantation. However, it has been stated by Bernhardt et al. that the current policies for organs allocation should be changed entirely to a broad discussion before implementation .
The development of soft computing techniques motivated researchers to apply different methods to enhance the automation process of organs allocation systems. These methods include Genetic Algorithms , Simulated Annealing , Harmony Search , and Ant Colony Optimization  in intelligent decision support systems (DSS). There are many attempts at developing an allocation system for lung transplantation. The authors in  conducted a scale named illness intrusiveness rating scale (IIRS) by assembling responses from different studies about QoL in lung, liver, heart, and renal transplants. The authors used exploratory and confirmatory factor analyses to identify the factor structure and then set it against patient groups of the various used transplants. Reference  presented data mining methods including neural networks, logistic regression, and decision trees to predict the survival of heart-lung transplants. The accuracy of the models ranged from 71% to 86%, considering 10-fold cross-validation.
Similarly, another study  proposed an integrated machine learning method in developing Cox survival models to analyze thoracic transplant procedures effectively. The UNOS dataset was used for survival time estimation to identify the optimum number of risk groups of thoracic recipients. Then, a Kaplan-Meier survival analysis was conducted to validate the identified risk groups. Other researchers  devised a structural equation modeling and decision tree construction procedure for lung transplant performance evaluation. The method was validated through the UNOS dataset, resulting in an R2 of 0.68. Two more studies ( and ) have analyzed QoL after lung transplantation. Reference  examined a fuzzy lung allocation system based on a real dataset from the UNOS to determine potential recipients for transplantation. This research has revealed that interpretation results have an R2 value of 83.2% and an overall accuracy of 82.1%.
On the other hand, another study  utilized a hybrid genetic algorithm-based feature selection model to predict QoL for patients undergoing a lung transplant. Using three classifiers, kNN, support vector machine (SVM), and artificial neural network (ANN), they found that SVM performance dominates the other two models in terms of accuracy. After optimizing parameters, the proposed models were applied to the UNOS dataset. The burden in that approach resides in that GA can become impractical due to the significant increase in computational time. However, many recent studies in medicine are focusing on lightweight decision-making systems [20,21].
Most lung allocation approaches do not have a noticeable mark between the essential and rest features. Considerably, identifying the unknown quality of life categories wants its intrinsic features to be predicted accurately. Therefore, introducing a robust feature decision-making approach for organs allocation is still a contest . In lung transplantation, locating the faster and the most truthful result is still a confrontation as each technique has its restrictions.
This paper introduces an approach based on the ant colony optimization (ACO) technique to obtain an optimal threshold for classification. Many studies pointed out that GA performance in terms of speed is much slower than ACO . However, it is the first work in lung allocation using ACO, which provides faster computational time compared to the GA  as provided in the literature. This investigation aims to develop automatic and fast recognition of the QoL for the lung allocation process. An expert algorithm has been developed to select discriminant features from the UNOS. The dataset has been preprocessed to remove undesirable features. Then, the optimal distinguishable features were explored using the ACO algorithm. As a final point, the kNN was used as a simple classifier for implementation and faster computational time. Other classifiers (e.g., ANN and SVM) were used to compare the accuracy and computational time, but kNN outperforms them. kNN used the resulted features from ACO to classify patients concerning the defined quality of life categories. The following sections give more detailed explanations about classifying QoL.
2 Materials and Methods
2.1 Data Preparation
The scheme for an automatic allocation for lung transplantation has been constructed depending on machine learning. At first, a dataset from the UNOS since 1987 was involved in the proposed algorithm. The records include data related to heart, lung, and thoracic transplants. The patients were assigned unique identification numbers to track their information, considering confidentiality and security issues. The UNOS data is considered a complete source of information for research purposes in organ transplantation in the U.S. .
The raw dataset consists of 60,888 observations and 442 features. For assembling the features space, a few preprocessing steps were applied. This study focused only on lung transplantation, so any observation associated with the heart or simultaneous lung/heart transplants was omitted accordingly. However, most of the observations are related to heart and heart/lung transplants, and only 16,771 records are related to lung transplantation. In addition, any observation with an undetermined output value (i.e., missing class label of the output) was also excluded. Moreover, different cleaning stages were applied, such as removing unnecessary features (e.g., dates, addresses, identification numbers, and zip codes) and dropping highly correlated features.
Next, four steps of preprocessing were applied. Firstly, assertion values for features containing missing values. The average value was applied to continuous features, and the mode was applied to categorical features. Secondly, coding based on a medical dictionary was applied . After that, some features were normalized because their large values might significantly hide the effect of other features with comparatively less significant values . At this point, the Minmax law was applied as defined in Eq. (1).
Xn = the resulted normalized value, Xi = the old value for a particular feature, Xmin = the minimum value in that feature, and Xmax = the maximum value.
Finally, random undersampling (RUS) was applied since the dataset was imbalanced. In this setting, all records from the smaller class (i.e., class 2) were used; meanwhile, the bulk class records (i.e., class 1 and class 3) were randomly reduced until we reached the number of class 2 records . However, the same dataset used in the literature is based on RUS. Therefore, RUS was conducted in this study for comparison and evaluation purposes with related works.
After the preparation process, the final data set produced 3,684 records and 118 inputs. The measured feature is the functional status after transplantation (FUNC-STAT-TRF) of the recipient, also known as recipient QoL. This feature is categorized into three classes: class 1 means independent (i.e., no limitation in mobility), class 2 means partially dependent (i.e., needing some assistance for daily activities), and class 3 means disabled (i.e., needing full assistance for daily activities).
The robust and relevant features are essential due to their significant effect on the classification process compared to irrelevant features. Subsequently, classification performance can be promoted by ignoring redundant and irrelevant features. For optimal feature decision-making, three prominent aspects should be considered: search mechanism, evaluation function, and search stopping criteria.
2.2 The Proposed Search Procedure
Several practical algorithms have been improved and applied as machine learning techniques in various fields. At this point, ACO has been used as an aspirant algorithm, representing a unique contribution in optimization problems such as routing optimization  and system fault detection . In this paper, the idea of ACO resides in finding the optimal set of features in allocating a lung transplant.
As the fundamental part of any feature selection algorithm, the evaluation function evaluates the involved features, considering the robustness of subsets based on their capabilities for perception purposes. This study applies the wrapper technique  in its evaluation function. The wrapper methods denote the hypothesis domain that combines the evaluation function and learning algorithm. Various features are examined through a learning process. After several iterations, the optimal features are defined. Then, finding the accuracy of the obtained classification model is based on the test data set. A heuristic search algorithm should be applied to find optimal features in this setting because of the exponentially increased search space. It is worth mentioning that the heuristic search algorithm is conspicuous due to its tractability in the random examination.
Another principle in the ACO algorithm is the closure time in the search procedure. The algorithm wants to resolve the point where it should terminate searching within the feature space. There are many standard techniques: the first method sets the preferred number of selected features, and when this number is reached, the program closes. The second method is when the algorithm settles with a fixed accuracy, although the features change. The third possible principle is accomplishing a defined number of iterations after defining a determined subset of features. In this study, the third method was applied.
2.3 The ACO Algorithm
As a result of examining ants seeking and cooperative behavior, Darigo et al. presented the ACO method as a nature-inspired metaheuristic technique . The structure is based on the thought described by ethologists about the environment used by ants since the ants employ pheromone paths to transfer information concerning the direct paths to food. Ants indirectly communicate with other ants through an odorous material called a pheromone to realize the direct food path from the destination to the source. Finding the food source allows the ant to drop an amount of pheromone on the route to make a guide while returning to the destination. Thus, a path is made by laying some pheromone on the ground. The memory structure is dynamic since it includes information regarding the efficacy of preceding selections based on the findings. This memory guides the assembly process of each ant. Therefore, the act of each ant is driven by the act of physical ants.
On the other hand, when another ant moves at random for exploration, it encounters the path assigned by other ants and can highly agree to follow it. Accordingly, that ant enforces the path by accumulating its pheromone. Thus, an autocatalytic practice occurred by which as more ants monitor a path, as more the path pays attention to be surveyed. Hence, choosing a route increases with the increasing number of ants who selected the same path. In selecting outstanding features, some adaptation is mandatory in the algorithm to make it appropriately work for QoL classification by representing optimal features.
2.3.1 ACO Structure
Feature selection requires a functional search space in the ACO algorithm, including nodes (i.e., ants) and links (i.e., paths) symbolizing a feature search space. In the proposed approach, the number of used features in allocating a lung transplant is represented by the number of nodes, and the linking between the nodes affirms their reliance. A simulated pheromone is placed on the associated paths, and its amount changes with time.
2.3.2 Probability Expression
Each ant selects its first node randomly penetrating the search space. The ant has the policy to determine the following link coupled to its link . Accordingly, each ant forms a subset of features incrementally. Building subsets of features requires each ant to go through the features via a decision strategy to accomplish a solution. The motion is based on a probability expression containing pheromone value and heuristic information of features. Hence, the probability expression, as shown in Eq. (2), denotes the likelihood of selecting node j by ant k while going from node i as a present location at iteration t:
where, equals the value of pheromone on a link connecting nodes i and j through iteration t, represents heuristic attraction that provides information about the desirability of motion. denotes a group of possible features associated with node i that might be navigated using ant k, and N represents the total number of features. The parameters α and β pointed to the virtual significance of pheromone trail and heuristic values, respectively. The probability expression P is an N × N matrix where each column j represents the probability of linked feature j to the remaining features (j = 1, 2,…, N). The structure of the P matrix is illustrated in the following two points:
(i) Pheromone initialization is the most critical step in the ACO program. The pheromone information of the search space is stored in the matrix ( ), an N × N matrix. Remarkably, the initialized quantity of pheromone is equivalent for all nodes; thus, there is no significance in selecting a specific feature when the algorithm starts.
(ii) Heuristic information: The efficiency of features is evaluated through heuristic information to achieve better discrimination. The heuristic desirability information, such as a correlation coefficient matrix, can highlight the independence among features. The correlation coefficient is a matrix calculated by the covariance of features divided by the corresponding standard deviations (σ) as shown in Eq. (3):
where Xi, Yi represents the features, and μXi, and μYi are their averages, respectively. The result of this equation is a value between −1 and + 1, which demonstrates the strength of the association between features. Indeed, as the resulting value is closed to 1, dependency increases, the sign implies the direction of the relationship, and the value 0 means no relationship. Eq. (4) explains the heuristic matrix (η):
Accordingly, the probability of dependency among features is represented by the absolute correlation matrix subtracted from 1. However, the absolute of Corr(X, Y) is used since only the amplitudes of features were added to the computation, and the orientation of features does not influence the calculation.
2.3.3 Evaluation Function
The quality of findings can be assessed using evaluation functions. In this study, the strength of discriminant features is assessed individually to find a specific subset capable of distinguishing the classes of QoL after lung transplantation. At this setting, a kNN classifier was applied to classify QoL concerning the produced subset of features by ACO, and then the misclassification should be computed. Therefore, an evaluation function such as γ(St) is used to reflect the error rate through two measures expanded by the classifier and the size of the present subset of features, as Eq. (5) shows:
where is the features subset at iteration t, denotes the size of the associated features, and is in the range [0,1], which is used to modify the importance of chosen features in terms of their size. is the error rate computed based on the misclassified records over the whole number of records in the used dataset. The evaluation function minimizes the cost of the kNN error. In other words, the global best subset should satisfy the aspect of error and thus the best subset of features.
2.3.4 Pheromone Updating
Whereas the termination criterion has not yet been acquired, the (τ) matrix is updated over several iterations. Indeed, the pheromone value was updated for each feature subset made by each ant. The increase in the pheromone amount forcefully depends on the penalty of each subset. As comes to be smaller, more pheromone will be informed. Eq. (6) states how the pheromone amount is updated:
where, indicates the dropped quantity of pheromone on the features subset as shown in Eq. (7), which describes how ants update the pheromones.
where Q is a constant dictating how much all ants should put down pheromones . The pheromone value, , is the update of the local paramount subset through each iteration. The evaporation rate of pheromone ρ is used to circumvent too early convergence. Thus, the threat of local optima will reduce, and the ants trail more examination within the search space to discover numerous solutions.
2.3.5 Design of the ACO Algorithm
Implementing the ACO algorithm requires initializing the following parameters: number of iterations, the total number of used ants, the values of α and β parameters, initial pheromone, and evaporation rate. After several iterations, a better convergence has been achieved, obtaining the candidate values of ACO parameters. It has been observed that the evaporation rate significantly influences the fitness value. Tab. 1 shows the best values after several trial and error operations: the number of ants N = 118, the evaporation rate (ρ) and the initial pheromone (τo) are 0.5 and 0.2, respectively, the number of iterations is appropriate at T = 50, and α and β values are 1 and 3, respectively. It is worth mentioning that the ACO algorithm has been recurring for five consecutive times and 50 iterations each time. Thus, the algorithm was carried out for 250 iterations to avoid any bias and ensure the reliability of the features.
Fig. 1 shows the development process of the ACO that is applied to the space of the UNOS feature set to choose a subset of features. After initializing the ACO parameters, essential computations of the probability function and heuristic information are considered. Next, the first iteration executes the defined features subset because the pheromone path is similar for all features. After that, examination of the generated subsets using the evaluation function γ(St). The evaluation function finds the superiority of the chosen subset. In each iteration, the subsets are evaluated, allowing the pheromone value to be updated for the best features subset as a good reaction. This method of reaction is called a wrapper . The interactivity of all features is evaluated using a classifier incorporated with the search algorithm to dictate the local best features in each round. Herein, the local paramount subset is kept for the next iteration. The updating function enhances the best subset by adding and evaporating the pheromone value for the produced subsets on other links. This process recurs through iterations yielding the best global subset. Fig. 2 summarizes the ACO steps of the algorithm. As explained in the following subsection, the classifier trains the dataset according to the global best features.
2.4 Prediction Using kNN Classifier
The instant-based learning classifier (kNN) is a supervised and decision-based prediction learning method that can classify large-scale data into numerous classes. kNN is a simple and efficient data mining technique used for classifying observations based on their distance to other data points in a training dataset. kNN is an instance-based learner (also known as a lazy learner) where the training data is loaded into the model. Then, when a new observation needs to be classified, it looks for the defined k value of nearest neighbors, where the instance that should be classified is based on the majority vote. For example, if k is 5, then the classes of 5 nearest neighbors are determined. Distance functions measure similarity between records, and kNN calculates the distance between new and known data points . The commonly used distance between two observations a = (a1,…, an) and b = (b1,…, bn) is the Euclidian distance as defined in Eq. (8).
The classification procedure for the allocation of lung transplantation is shown in Fig. 3. The number of neighbors (k) was set to 5 after several iterations. Fig. 4 shows the relationship between different odd values of k and the testing accuracy to evaluate the predictions. The kNN model has been trained using the resulting optimal features subset from the ACO using 10-fold cross-validation and repeats the process five times with different random samples to avoid any possible bias in verifying the robustness of the results. However, the kNN model was chosen due to its more straightforward implementation, computational efficiency, and analytical tractability than other potential model types such as ANN or SVM.
3 Results and Discussion
This section shows the performance evaluation of lung allocation based on the proposed ACO-kNN model. The UNOS dataset was used as a real-world medium-sized dataset. Extensive iterations have been tried to optimize the parameters of the ACO to get the best features subsets.
3.1 Feature Decision-Making Results
The ACO-kNN model for optimal features classifies all the processed UNOS dataset records. The ACO finds the most candidate features, and the kNN finds how the data is fitted for the trained data concerning the most important features. The stopping criterion in ACO was set to be twenty features after 50 iterations; thus, the most essential twenty features are derived by the proposed ACO algorithm. The accuracy (i.e., classification rate) is measured according to the average of the correctly classified records in the testing data through 10-fold cross-validation and five random repetitions. The accuracy of the features subset is the main factor in determining the quality of chosen features and the efficacy of the proposed model. The evaluation function γ(St) depends on the ongoing process of repetitions and iterations. As represented in Fig. 5, Y-axis denotes the fitness value γ(St) versus the number of iterations represented in the X-axis. The value of γ(St) fluctuates at the beginning of each repetition since the number of chosen features and the error rate change during the iterations. In addition, the value of γ(St) sharply decreases to converge to a steady level. Likewise, the convergence of all repetitions is approximately at the same level emphasizing the correctness of the globally gained subset and the ability of the algorithm to discover the optimal solution. Thus, the developed approach found the candidate features and disregarded the out-of-work features.
Tab. 2 demonstrates the results of the optimal subset of the UNOS dataset based on high classification accuracy. We considered five rounds validated ten times using the 10-fold cross-validation technique. The selected features in the third round were considered to achieve the highest accuracy, 91.3%. The features selection is made by considering the features with the best-fitted value as it improves the classification accuracy of the selected subset. Tab. 3 lists the names of the most candidate features selected by the third round.
Essentially, the findings listed in Tab. 3 were consistent with medical studies. For instance, the ACO model found that the recipient weight after transplantation (WGT_KG_TRR) is an important feature that affects the functional status after transplantation. This finding is reconcilable with related studies [36–38] that support the findings in the proposed ACO-kNN model. These studies illustrated that the risk of death was increased in underweight recipients (i.e., their body mass index (BMI) is less than 18.5). Moreover, these studies showed that Kaplan-Meier modeling pointed out a significant effect of BMI on survival and QoL. Mortality is higher in obese, overweight, and underweight lung transplant patients than in normal-weight patients. In addition, this study found that the anti-viral therapy (ANTI_VIRAL) feature is essential to be considered in the lung allocation process, as also indicated in other studies [39–42]. By controlling anti-viral drugs, transplant patients can move to a situation where several medical complications can be reduced significantly and prevent viral diseases since the immunity level after lung transplantation is low. Other candidate features (CEREB_VASC_AFTER_LIST, DIAB, MALIG, PEPTIC_ULCER, ISCHTIME, VENT_SUPPORT_AFTER_LIST, and SIX_WALK_FEET) are listed in Tab. 3 are also mentioned by other researchers  as significant features affecting survival and QoL after lung transplantation. Other researchers highlighted the importance of considering (DIAL_TY_TCR) [44,45], (HGT_CM_TCR), (CRSMATCH_DONE) , (BMAT) , (DDR1) , and (PREV_TX) , which are important features in the proposed model as well as by the medical experts in the field of lung transplantation.
3.2 Comparison of Classification Results
The proposed model (ACO-kNN) identified the best contributing features and obtained good accuracy. Overall, an analysis of the selected features was conducted thoroughly and compared with the literature. Tab. 4 compares the proposed ACO-kNN model with the previous work. As mentioned earlier, the UNOS dataset is a standard dataset used by researchers to study the heart, lung, and thoracic allocation process to see whether their approaches work better and compare various other methods in the literature applying the same data. In this study, the comparison is limited to those using the same dataset to simplify the comparison competency. It has been demonstrated that the proposed ACO-kNN approach achieved one of the highest accuracies, to be 91.3%, and the least computational time (48 s on average) for classifying the functional status after lung transplantation. My last search in the allocation of lung transplantation  suffers from a high computational time of around 7 min on average and an accuracy of 95.3%, which is slightly higher than that in the proposed ACO-kNN. Overall, the resulting performance infers that the ACO algorithm is outstanding in selecting efficient features compared to the other methodologies considering a trade-off between accuracy and computational time. In other words, having a potential donor lung could make a fast decision for the recipient patient. Thus, ACO paves the way to consider the proposed model in the allocation process of other organs.
The lung allocation process is critical in lung transplantation. This paper introduced an innovative approach to using ACO for the lung allocation process. In the proposed ACO algorithm, essential features have been introduced. Then, an automated classifier based on kNN was used to classify QoL for the allocation of lung transplantation based on an optimal subset of features as an expert algorithm. From the perspective of the QoL classification application, the ACO-kNN algorithm has been developed to examine the quality of the selected features. The efficiency of the proposed method was evaluated against other similar research. The findings significantly improve QoL classification regarding the accuracy and computational time. The average accuracy was 91.3% by selecting the most essential 20 features, and the computational time did not exceed 48 s on average. Thus, the proposed method outperforms the achieved results in the literature in terms of QoL performance. In future work, this approach could be conducted to develop a reliable model for classifying different organs allocation, such as liver, kidney, and heart. In addition, ACO can deal with large data sizes returning optimal solutions.
Acknowledgement: The author would like to thank the Deanship of Scientific Research and Graduate Studies at Yarmouk University (Irbid, Jordan) for their support.
Funding Statement: The author received no specific funding for this study.
Conflicts of Interest: The author declares that she has no conflicts of interest to report regarding the present study.
|This work is licensed under a Creative Commons Attribution 4.0 International License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.|