|Intelligent Automation & Soft Computing |
Multi-Objective Adapted Binary Bat for Test Suite Reduction
1Department of Mathematics, Faculty of Science, Suez Canal University, Egypt
2Faculty of Informatics and Computer Science, British University in Egypt, Egypt
3Department of Electrical and Mechanical Engineering, Nagoya Institute of Technology, Japan
*Corresponding Author: Abeer Hamdy. Email: firstname.lastname@example.org
Received: 21 April 2021; Accepted: 16 June 2021
Abstract: Regression testing is an essential quality test technique during the maintenance phase of the software. It is executed to ensure the validity of the software after any modification. As software evolves, the test suite expands and may become too large to be executed entirely within a limited testing budget and/or time. So, to reduce the cost of regression testing, it is mandatory to reduce the size of the test suite by discarding the redundant test cases and selecting the most representative ones that do not compromise the effectiveness of the test suite in terms of some predefined criteria such as its fault-detection capability. This problem is known as test suite reduction (TSR); and it is known to be as nondeterministic polynomial-time complete (NP-complete) problem. This paper formulated the TSR problem as a multi-objective optimization problem; and adapted the heuristic binary bat algorithm (BBA) to resolve it. The BBA algorithm was adapted in order to enhance its exploration capabilities during the search for Pareto-optimal solutions. The effectiveness of the proposed multi-objective adapted binary bat algorithm (MO-ABBA) was evaluated using 8 test suites of different sizes, in addition to twelve benchmark functions. Experimental results showed that, for the same fault discovery rate, the MO-ABBA is capable of reducing the test suite size more than each of the multi-objective original binary bat (MO-BBA) and the multi-objective binary particle swarm optimization (MO-BPSO) algorithms. Moreover, MO-ABBA converges to the best solutions faster than each of the MO-BBA and the MO-BPSO.
Keywords: Binary bat algorithm; mutation testing; regression testing; software testing; test suite reduction; heuristic algorithms
Software testing is one of the crucial activities in the software development lifecycle. It is used to detect software defects and ensure that the software is delivered with high quality. Any modifications in one of the software components may affect one or more other components, which necessitates the re-execution of the earlier test cases in addition to the newly generated ones . As a result, the test suite size expands over time and may include redundant test cases. Moreover, it may not execute in it is entirely within the testing budget and/or time. Testing the behavior of the whole system under test (SUT) after each modification is called regression testing . Regression testing is expensive and it accounts for almost one half of the cost of testing . Rothermel et al.  reported a case in the industry where executing a complete regression test suite (of a 20K LOC product) required seven weeks. In order to reduce the cost of regression testing, it is necessary to reduce the number of test cases in the test suite without compromising its effectiveness, in terms of some predefined criteria [4,5]. This problem is known as the test suite reduction (TSR) problem. One way to assess the capabilities of the reduced test suite, in discovering bugs, is through the utilization of a fault-based testing technique called mutation testing. Mutation testing  is a white-box testing technique, which calculates a score for the test suite that indicates its capability on discovering bugs in the SUT. The TSR problem is known to be a combinational multi-objective optimization problem, that can be described as a set covering problem; which is known to be NP-complete. In practice, there is no exact solution for NP-complete problems, however, suboptimal solutions could be found using search-based optimization algorithms .
Researchers have proposed various test suite reduction approaches . The majority of these approaches are in the form of a white-box . They aim at reducing the test suite without compromising test requirements such as statement coverage, fault detection capability rate, etc. Most of these approaches utilized greedy algorithms to solve a single objective TSR (SO-TSR) problem . Recent approaches used heuristic algorithms to solve the multi-objective TSR (MO-TSR) problem [8,10–15] in addition to the SO-TSR [16,17]. Some studies demonstrate a comparison between the performance of heuristic and greedy algorithms in solving the TSR problem; Sun et al.  showed experimentally that the Binary Particle Swarm Optimization (BPSO) algorithm outperforms greedy algorithms in solving the SO-TSR problem. Most of heuristic based approaches for solving the MO-TSR problem utilized Non-dominated Sorting Genetic Algorithm II (NSGAII) or its variants [10,14,15]. Yoo et al.  showed experimentally that the NSGAII is superior to greedy approaches in solving the MO-TSR problem. Wang et al.  used three types of weighted-sum genetic algorithm (GA) and showed experimentally the superiority of the random-weighted GA over the NSGAII and some other popular multi-objective optimization algorithms. Moreover, Mohanty et al.  used Ant Colony (ACO) in their proposed approach for solving the MO-TSR problem.
Aims and Contributions
The main aim of this paper is to propose a heuristic based approach to solve the multi-objective TSR problem. The proposed approach is based on the Bat algorithm (BA) [18,19]; which is a swarm intelligence search algorithm inspired from the echolocation behavior of bats. Echolocation works as a type of sonar; a bat emits a loud sound, and an echo returns when that sound hits an object. The combination of echolocation with swarm intelligence enhances the properties of swarm-based algorithms; which makes BA more effective than other swarm-based algorithms in some contexts [18,20]. Yang  showed empirically that BA outperforms each of the GAs  and Particle swarm Optimization (PSO) [22,23] over some benchmark functions. Chawla et al.  surveyed some applications in the areas of computer science, medical and electrical engineering where the BA surpasses GA, PSO and ACO algorithms. However, the BBA occasionally fails to discover the global best solution for some multi-modal functions.
Our contributions to accomplish the above-mentioned aim can be summarized as follows:
Proposing modifications to the original Binary Bat algorithm (BBA) to enhance its exploration and exploitation capabilities; to reduce its occasional failure to converge to global optimum solutions.
Evaluating the performance of the adapted (modified) binary bat algorithm (ABBA) against each of the BBA and the binary PSO (BPSO)  algorithms, over a set of different unimodal and multi-modal benchmark functions of different ranges (the unimodal function has only one minima location which is the global best solution; while the multi-modal function has more than one local minima locations but only one of them is the global best solution).
Formulating the TSR problem as an optimization problem and defining a fitness function based on two objectives which are: (i) the execution cost of the reduced test suite and (ii) the effectiveness of the reduced test suite in terms of its mutation score. The variable weighted sum method  was utilized to form the multi-objective fitness functions, which guides the BBA to search for the non-dominated solutions that provides an optimum balance between the cost and effectiveness of the reduced test suites.
Evaluating the performance of the multi-objective ABBA (MO-ABBA) against each of the multi-objective BBA (MO-BBA) and the multi-objective BPSO (MO-BPSO) in solving the multi-objective TSR problem over eight test suites of different sizes.
The rest of the paper is organized as follows: Section 2 introduces some important preliminaries for this work. Section 3 discusses the previous studies that tackled the TSR problem. Section 4 presents the multi-objective adapted binary bat algorithm for solving the TSR problem. Section 5 discusses the experiments and results. Finally, Section 6 concludes the paper and introduces possible extensions to this work.
2.1 Test Suite Reduction Problem
Given: A test suite which includes test cases, and a set of mutants , that should be killed to provide an adequate testing of the SUT. Each test case can kill one or more mutants .
Problem: Find an adequate subset that can kill as many as possible number of mutants and includes as few as possible number of test cases. These two objectives are contradictory; this is the reason we formulated the TSR problem as a multi-objective optimization problem.
2.2 Pareto Optimal Concepts
In the multi-objective optimization problems, there is no single solution but a set of multiple trade-offs solutions . The vector of decision variables that optimizes the considered objective functions and satisfy the problem constraints is called a Pareto front. Thus, the Pareto front is a set of Pareto solutions which are not dominated by any other solution. A solution is said to dominate a solution , if and only if is not better than for any objective , and there exists at least one objective in which is better than its corresponding objective in . On the contrary, two solutions are said to be non-dominated when none of them dominates the other. Fig. 1 depicts the difference between dominated and non-dominated solutions and represents the Pareto front. In the figure, the objective functions and are to be minimized. It is obvious that solution A dominates solution D because and Moreover the solutions are non-dominated solutions because none of them is better than the others in both objectives; as is the best for objective , whereas is the best for objective, and is better than for objective and better than for the objective . The set of non-dominated solutions of the multi-objective optimization problem is called the Pareto optimal set, and its representation in the objective space is the Pareto front. This set satisfies two properties: (i) any solution found is dominated by at least one solution in the Pareto set, and (ii) every two solutions in the set are non-dominated to each other.
2.3 Bat Algorithm
Bat algorithm (BA) is one of the recent metaheuristic swarm intelligence optimization algorithms which is proposed by Yang . BA was inspired by the behavior of the micro-bats. A bat flies randomly with velocity at position with a frequency , varying wavelength and loudness to search for a food/prey in a dimensional search space. The BA starts with randomly generating the initial population of bats. The values of the parameters of each bat are updated over the iterations according to Eqs. (1)–(3).
where, is the th bat frequency value, and are the minimum and maximum frequency values respectively, is a random number of a uniform distribution. The bats perform a random walk procedure which is defined by Eq. (4) for exploring the space.
where, is a random number in the range , is the average loudness of all the bats at time . It could be stated that the BA is a balanced combination of the PSO and the intensive local search algorithms. The balance between these two techniques is controlled by both loudness and the pulse emission rate which are updated according to Eqs. (5) and (6).
where, and are constants; is analogous to the cooling factor in the simulated annealing (SA).
Mirjalili et al.  proposed the BBA to solve optimization problems in the binary search space. In the BBA, the bat’s position is changed from one to zero or vice versa based on the probability of the bat’s velocity according to Eqs. (7) and (8).
where and is the position and velocity of -th particle at iteration in -th dimension, and is the complement of .
3 Literature Review
Researchers proposed a significant number of approaches to minimize the size of the test suite. The majority of these approaches are based on greedy algorithms , while very few of them are based on clustering algorithms  or utilize hybrid algorithms (e.g., neuro fuzzy techniques) . Greedy based approaches utilize one of the greedy algorithms to determine the reduced test suite based on the current best strategy. Over each iteration, the greedy algorithm adds to the reduced test suite the test case that has the highest greedy property, e.g., the highest statement coverage, which is a local optimal solution. It stops when the desired percentage of coverage is reached. Greedy approaches were proved empirically to be effective in solving the SO-TSR problem. On the other hand, clustering based approaches utilize one or more of the clustering algorithms to group similar test cases together according to a predefined similarity measure. Then a sampling mechanism is applied to select one or more test cases from each cluster to be included in the reduced test suite; while the rest of the test cases are discarded. Recently, heuristic algorithms where utilized to solve the single and multi-objective TSR [8,10–15,17]. According to a survey study conducted by Khan et al.  the majority of the heuristic search based TSR approaches, 79% of them, are single-objective optimization. Some researchers showed empirically that some heuristic algorithms are superior to greedy algorithms in solving each of the single and multi-objective TSR problem [12,17]. The work of Yoo et al.  is considered as the first work that applies multi-objective optimization for test suite minimization. The authors used the NSGAII algorithm and showed experimentally its superiority over the greedy approaches. Geng et al.  and Gupta et al.  also utilized the NSGA-II algorithm but with different objective drivers. The objectives of Geng et al.  were the code coverage and test suite cost; while the objectives of Gupta et al.  were the code coverage and mutation score. Wang et al.  proposed utilizing three types of weighted-based genetic algorithms for minimizing the test suite of the product lines software. Where, the authors weighted summed the different objective drivers to form a single objective fitness function that guides the GA during the search for a Pareto front. Their experimental results showed that the Random-Weight GA algorithm outperforms seven other popular multi-objective search algorithms including: NSGA-II, strength Pareto evolutionary algorithms (SPEA) and speed-constrained multi-objective Particle Swarm Optimization (SMPSO). Wei et al.  compared among six evolutionary multi-objective optimization algorithms including NSGAII and several variants of the multi-objective decomposition-based evolutionary algorithm (MOEA/D). Their experimental results showed the superiority of the NSGAII over small programs, but over large programs (space) the MOEA/D was superior. They tried different combinations of objective drivers including mutation score, code coverage and test suite cost. The experiments showed that for the same statement coverage, using the “mutation score” as an objective driver guided the evolutionary algorithms to the smallest test suite.
Difference from the previous work
The approach proposed in this paper is a heuristic search-based approach. We adapted one of the recent heuristic algorithms BBA which proved its superiority over other evolutionary algorithms in different contexts . However, the BBA occasionally fails to discover the global best solution for some multi modal functions; in addition, the BBA is used for solving single objective optimization problems. So, we proposed modifications to the BBA and utilized it to minimize the test suite size without loss in the fault detection quality. We used mutation testing to measure the quality of the reduced test suite, because mutation testing has been studied by numerous researchers as a method to assess the quality of a test suite [24,30–32]. Previous research proved empirically that mutation testing is more effective than code coverage in evaluating and comparing test suites .
4 Proposed Multi-Objective ABBA (MO-ABBA) for TSR
4.1 TSR Solution Encoding
Consider that each bat has a position vector that represents a solution to the TSR problem, i.e., each represents a reduced test suite. is encoded as a binary vector = ( , , …., ), where, d is the size of the original test suite (the total number of test cases), each bit corresponds to a test case , the bit value is equal to “1” or “0”. This means that is included/excluded in the test suite, respectively.
4.2 Adapted Binary Bat Algorithm (ABBA)
Generally, the performance of any heuristic algorithm, including the BA, is affected by two crucial competencies which are: 1) exploration and 2) exploitation. Exploration is the ability of an algorithm to find promising solutions by searching various unknown regions, while exploitation leads the algorithm to the best solution among the discovered ones. Exploration capability can get the algorithm away from a local optimum it gets stuck in, while exploitation capability increases the convergence speed of an algorithm. It is important to keep the balance between the global and local search, such that the global search is amplified at the early iterations. While the local search is amplified at the late iterations so the algorithm converges to the global optimum.
The update formula of the bat velocities, Eq. (1), includes two components. The first component is the previous velocity of the bat, , which is responsible for the global search (exploration). As, directs the bat to keep its velocity and direction, thus it overflows the search space. While the second component, ( is responsible for the local search (exploitation). As it directs all the bats to a region near to the best-found global solution (Gbest). So, the following modifications were proposed to the formula: Firstly, multiplying the term by an inertia weight factor “ ”, which is given by Eq. (9). The value of will decrease linearly over iterations. The inertia weight was recommended by a number of previous studies that aimed at enhancing each of the BAand the PSO [24,33].
where and are pre-determined constants is the current iteration number, is the maximum number of iterations.
The other suggested modification is to assume that each bat emits two frequencies instead of one before the bat decides on its moving direction. The first frequency is directed towards the location of the , while the second frequency is directed towards a randomly selected best solution discovered over the previous iterations . Any of these previously discovered best solutions could be a candidate for a global optimum solution. This way each bat benefits from the experiences of the other bats. Consequently, Eq. (1) is amended as follows:
where is a randomly selected best solution other than the , increases non-linearly from to 1 which increases the impact of the location of the over the iterations, so the bats converge to the .
4.3 TSR Multi-Objective Fitness Function Formulation
In this paper the fitness function is composed of two objectives. The first objective aims at minimizing the cost of the test suite; which is expressed in terms of the execution time as recommended by Yoo et al. . While the second objective aims at selecting a reduced test suite that is capable of detecting the largest number of faults. The fault detection capability of the reduced test suite is expressed in terms of the mutation score. Wei et al.  found out that the mutation score is the most effective objective for solving the TSR problem. The two objectives are defined using Eqs. (13)–(14).
where | | is the size of the original test suite, | | is the size of the reduced test suite, is the execution time of a test case “i”, is the total number of mutants of the software under test and is the number of the killed mutants by the reduced test suite. The detection capability (number of killed mutants) of the reduced test suite is the cumulative sum of the detection capability of each in the reduced test suite; where each is represented using a binary vector of size n, n is the total number of mutants of a SUT. The value of a bit number in vector is set to equal “1”/”0” if kill/do not kill the mutant number .
To formulate the fitness function, we used the weighted sum method which is simple and traditional method for multi-objective optimization. It produces a Pareto-optimal set of solutions by changing the weights among the objective functions. Yang  showed experimentally that the weighted sum method for combining the multi-objectives into a single-objective is very efficient even with highly nonlinear problems, complex constraints and diverse Pareto optimal sets. Moreover, Wang et al.  showed that the random weighted-based GA (multi-objective GA based on weighted sum method) is superior to some popular multi-objective algorithms, e.g., NSGAII and SPEA.
The fitness function used in this work is defined by Eqs. (15)–(17), the best solution is the one that maximizes the .
where, are the weights used to find the Pareto-optimal set of solutions, denotes the initial value of , is the current iteration number, and n is a modulation index. With the increase in the iteration the value of increases from to 1, whereas decrease from (1 − to 0. The values of determine the importance of each objective to the fitness function. The different values of produce different non-dominated solutions with sufficient diversity; so the Pareto front can be approximated correctly.
Algorithm 1 shows the basic steps of the multi-objective ABBA.
5 Experiments and Results
5.1 Research Questions
The experiments were designed to answer the following research questions:
RQ1: Does the performance of the single-objective (SO-ABBA) surpasses the performance of the SO-BBA and the SO-BPSO?
RQ2: Does the performance of the MO-ABBA surpasses the performance of the MO-BBA and the MO-BPSO in solving the MO-TSR problem?
RQ3: How does the increase in the test suite size of the SUT and the number of mutants impact the performance of the MO-ABBA?
5.2 Data Sets
To answer RQ1, we used twelve unimodal and multimodal benchmark functions. Tab. 1 lists these functions along with their search boundaries (range). are unimodal and are multimodal benchmark functions. The global minimum values of all benchmark functions used are 0. In the experiments, 15 bits were used to represent each continuous variable in binary. Thus, the dimension of generating a bit vector for a benchmark function was calculated by Eq. (18) as follows:
where, nb is the dimension of the bats/particles.
To answer RQ2 and RQ3, we used eight programs, out of which, two are C open-source programs and their characteristics retrieved from a popular repository, Software-artifact Infrastructure Repository (SIR) , which are: flex v1 and make v1. While the other six java programs are from an available dataset(1)(1)
, provided by Polo et al. . The execution time of the test cases of these six programs is not available, so we assumed that all the test cases have equal execution time equal to 1 unit time. The characteristics of the eight programs are listed in Tab. 2, which are the line of code (LOC), the test suite size |tc|, number of mutants |mu|, and the execution time of the original test suite (Texec).
5.3 Parameter Setting
We experimented with the most recommended values for the parameters in the literature . Tab. 3 lists the parameter values that achieved the best performance.
5.4 Performance Metrics
Two metrics were used to assess the performance of the heuristic algorithms in general which are the fitness and standard deviation (SD), defined by Eqs. (19) and (20).
where, N is the number of runs, is the fitness of the best solution discovered during the run number .
As the evolutionary algorithms are stochastic, for each experiment, 10 independent runs were performed; then the and are calculated. Larger values indicate better solutions. While the smaller the value of the SD, the more robust is the algorithm; as small SD values indicate that the algorithm can find acceptable solutions in the different runs, with small discrepancy.
Extra three specific metrics were used which are: (i) Test suite size reduction rate (TSRR), (ii) Execution time reduction rate ( ) and (iii) Fault detection capability rate ( ); They are calculated using Eqs. (21)–(23)
The higher the values of the TSRR, ETR and FDR, the better the performance of the search algorithm.
Answer to RQ1:
Tab. 4 lists the mean and SD of the optimal solutions discovered for each function over the ten runs, also lists the mean and SD of the number of iterations executed to reach the corresponding optimal solutions; the best results are pointed out in bold style. The maximum number of iterations was set to equal 1000 across all the experiments; however, the best solutions were achieved earlier than the predetermined maximum number of iterations. It should be noted that, for any of the previously mentioned benchmark functions, the best solution is the one that has the smallest mean value (minimization problem). Also, the smaller the mean value of the executed number of iterations, the faster the convergence speed of the heuristic algorithm.
As could be observed from Tab. 4 that the performance of the SO-ABBA algorithm is superior to the SO-BBA over all the unimodal and multimodal benchmark functions; as SO-ABBA could discover better solutions than the ones discovered by the SO-BBA, in terms of the mean values of the best solutions. In addition to, the SD values of the best solutions in case of using the SO-ABBA are smaller than when using SO-BBA over all the functions, which indicates that the SO-ABBA is more robust than the SO-BBA. Small SD values of the optimal solutions discovered by the SO-ABBA prove that the SO-ABBA is efficient in finding the best solutions without large variance among the different runs. When comparing the performance of the SO-ABBA to the SO-BPSO, it was found that the SO-ABBA was capable of discovering better solutions than the SO-BPSO for , , , , , functions. In addition, the SD values of the best solutions in the case of using the SO-ABBA are smaller than when using the SO-BPSO. On the other hand, both the SO-ABBA and the SO-BPSO could discover the same best solutions for , , , , , . As could be observed from the mean values of the number of iterations executed by each of the three algorithms to discover the optimum solutions that, the SO-ABBA could converge to the best solutions much faster than each of the SO-BBA and the SO-BPSO.
Answer to RQ2:
To answer RQ2 we conducted a set of experiments over the previously mentioned 8 programs. The results listed in the paper are the mean and SD of 10 independent runs over each program. The maximum number of iterations was set to equal 100 in all the experiments. Tab. 5 lists the generated values of weight1 and weight2 which were used to calculate 10 non-dominated solutions (NDS) on the Pareto surface. While Tab. 6 lists the mean and SD values of each of the fitness (F) and convergence speed of the discovered NDS. The convergence speed is measured in terms of the number of iterations executed by each algorithm to discover the best solution (#i). To simplify the visualization of the results, only three NDS (nu. 1, 5, 10) were listed in the table. Tab. 7 lists the FDR, TSRR and ETR of the three selected NDS.
As could be observed from Tab. 6, that the MO-ABBA algorithm surpasses both of the MO-BPSO and MO-BBA in terms of the mean fitness values of the discovered NDS, across the 8 programs. Moreover, the fitness standard deviation values of the NDS discovered by the MO-ABBA are very small; most of them are equal to zero or approaches zero, which indicates the stability of the MO-ABBA. E.g., the fitness mean and SD values of the NDS#5 discovered by the MO-ABBA, MO-BBA and MO-BPSO are as follows: Flex_V1 (1.53 ± 0.02, 1.25 ± 0.00, 1.23 ± 0.01), Make_V1 (2.12 ± 0.0, 1.76 ± 0.0, 1.66 ± 0.02), MBisectOk (4.42 ± 0, 4.31 ± 0.13, 4.41 ± 0.01), MBubCorrect (3.70 ± 0.61, 1.61 ± 0.06, 1.61 ± 0.07), MFindOk (20.13 ± 0.35, 2.13 ± 0.24, 2.45 ± 0.33), MFourBall (14.14 ± 0.02, 3.84 ± 3.64, 3.25 ± 0.64), MMidOK (3.70 ± 0.61, 2.34 ± 0.25, 2.48 ± 0.23), MTringle (3.82 ± 0.72, 1.58 ± 0.07, 1.58 ± 0.09).
In terms of the convergence speed, the MO-ABBA was able to converge to the best solutions faster than each of the MO-BBA and the MO-BPSO across all the programs except Flex_V1, NDS 1(#i = 95.5 ± 4.95 ABBA, 94.5 ± 1.77 BBA, 97.5 ± 1.1 BPSO), Make_V1(NDS 10) (#i = 100 ± 0 ABBA, 99 ± 0.71 BBA, 99.5 ± 0.35 BPSO), MBisectOk (NDS 10) (#i = 42.22 ± 30.97 ABBA, 21.90 ± 18.78 BBA, 71.90 ± 16.63 BPSO) and MbubCorrect (NDS 10) (#i = 92.5 ± 6.08 ABBA, 89.40 ± 8.37 BBA, 99.4 ± 10.84 BPSO); where, the MO-BBA converged faster than the MO-ABBA. Nevertheless, the MO-ABBA converged to better solutions than the ones discovered by the MO-BBA. So It could be stated that the MO-BBA was prematurely converged in these cases. Furthermore, the MO-BPSO was occasionally able to discover solutions close to the ones discovered by the MO-ABBA, e.g., MBisectOk NDS 1 (F = 10.57 ± 0 ABBA, 10.57 ± 0.01 BPSO) and MBisectOk NDS 10 (F = 1.05 ± 0 ABBA, 1.05 ± 0.01 BPSO). But the MO-BPSO needed a greater number of iterations than the MO-ABBA to converge to these solutions (MBisectOk NDS 1 #i = 15.33 ± 8.09 ABBA, 67.60 ± 22.08 BPSO ; MBisectOk NDS 10 #i = 42.22 ± 30.97ABBA, 71.90 ± 16.63 BPSO).
As could be observed from Tab. 7 that some of the solutions discovered by the three algorithms have the same FDR values, but the solutions discovered by the MO-ABBA have the highest TSRR and ETR values (e.g., Flex_V1 NDS#1 TSRR = (82.19 ± 2.12 ABBA, 72.40 ± 4.60 BBA, 70.46 ± 7.42 BPSO) ; ETR = (83.50 ± 0.65 ABBA, 73.78 ± 1.53 BBA, 71.71 ± 1.88 BPSO), Make NDS#1 TSRR = (68.41 ± 1.77 ABBA, 60.55 ± 0.35 BBA, 59.16 ± 0.71BPSO); ETR = (80.77 ± 69.0 ABBA, 81.89 ± 34.27 BBA, 81.98 ± 46.38 BPSO) and MFindOk NDS#1 TSRR = (99.3 ± 0 ABBA, 88.7 ± 3.80 BBA, 89.6 ± 1.78 BPSO); ETR = (99.3 ± 0 ABBA, 88.7 ± 3.80 BBA, 89.6 ± 1.78 BPSO)). Which means that for the same fault detection capability rate the MO-ABBA could select from a test suite a subset of test cases with smaller size and faster execution time (less expensive) than the subsets selected by each of the MO-BBA and MO-BPSO. While, some of the solutions discovered by the MO-ABBA are more cost effective (in terms of the TSRR and the ETR values) but with less fault detection capabilities (e.g., MFourBall NDS #1 FDR = (37.4 ± 4.02 ABBA, 83.4 ± 20.99 BBA, 61.7 ± 15.33BPSO); TSRR = (99.0 ± 0 ABBA, 91.8 ± 2.42 BBA, 96.4 ± 1.49 BPSO); ETR = (99.0 ± 0 ABBA, 91.8 ± 2.42 BBA,96.4 ± 1.49 BPSO) ; MFourBall NDS #2 FDR = (37.1 ± 4.06 ABBA, 97.6 ± 40.20 BBA, 85.7 ± 25.01 BPSO); TSRR = (99.0 ± 0 ABBA, 92.5 ± 3.05 BBA, 93.8 ± 1.94 BPSO); ETR = (99.0 ± 0 ABBA, 92.5 ± 3.05 BBA,93.8 ± 1.94 BPSO) ; MMidOk NDS #1; MTringle NDS #1 and MTringle NDS #2. So, the selection of the best algorithm depends on the testers’ targets and testing budgets. Fig. 2 shows sample convergence curves of the three algorithms over the programs. As could be observed that the MO-ABBA converge faster and to better solutions than the MO-BBA and the MO-BPSO, although the parameters settings of both of the MO-ABBA and the MO-BBA are the same.
Answer to RQ3:
As could be observed from the set of experiments over the benchmark functions that the performance of the SO-ABBA was not affected by the functions’ ranges or types. For example, the SO-ABBA was superior to SO-BBA and SO-BPSO over unimodal functions and , but has a small range, while has a wide range. SO-ABBA has the same superior performance over the multi-model functions and Moreover, the performance of the MO-ABBA was superior over the 8 programs with different test suites sizes (ranges from 25 to 1034), different execution times (ranges from 170.38 to 12070.23) and different numbers of mutants (ranges from 17 to 239). From these observations we could conclude that the ABBA is scalable.
7 Conclusion and Future Work
This paper proposed solving the multi-objective test suite reduction problem using the Binary Bat algorithm, which is reported in the literature as one of the effective swarm intelligence based algorithms. The BBA algorithm was adapted for better exploration capabilities and consequently better performance. The TSR problem was formulated as a multi-objective optimization problem. The adapted binary bat algorithm was utilized to search for the non-dominated solutions that keep the balance between the cost of the reduced test suites and their fault detection capabilities. The effectiveness of the adapted binary bat algorithm was assessed over eight programs of different test suites sizes, in addition to a set of unimodal and multi modal benchmark functions. The experimental results showed that the performance of the proposed MO-ABBA is superior to each of the MO-BBA and MO-BPSO in terms of the previously defined five metrics. Moreover, The MO-ABBA converged to the best solutions faster than each of the MO-BBA and the MO-BPSO. As a further extension for this work, different weighting mechanisms could be tried for the weighted sum multi-objective optimization. In addition, the fitness function could be redefined to include more objectives such as branch coverage. Furthermore, different inertia formulas  could be experimented with.
Funding Statement: The authors received no specific funding for this study.
Conflicts of Interest: The authors declare that they have no conflicts of interest to report regarding the present study.
|This work is licensed under a Creative Commons Attribution 4.0 International License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.|