Multi-Objective Adapted Binary Bat for Test Suite Reduction

Regression testing is an essential quality test technique during the maintenance phase of the software. It is executed to ensure the validity of the software after any modification. As software evolves, the test suite expands and may become too large to be executed entirely within a limited testing budget and/or time. So, to reduce the cost of regression testing, it is mandatory to reduce the size of the test suite by discarding the redundant test cases and selecting the most representative ones that do not compromise the effectiveness of the test suite in terms of some predefined criteria such as its fault-detection capability. This problem is known as test suite reduction (TSR); and it is known to be as nondeterministic polynomial-time complete (NP-complete) problem. This paper formulated the TSR problem as a multi-objective optimization problem; and adapted the heuristic binary bat algorithm (BBA) to resolve it. The BBA algorithm was adapted in order to enhance its exploration capabilities during the search for Pareto-optimal solutions. The effectiveness of the proposed multiobjective adapted binary bat algorithm (MO-ABBA) was evaluated using 8 test suites of different sizes, in addition to twelve benchmark functions. Experimental results showed that, for the same fault discovery rate, the MO-ABBA is capable of reducing the test suite size more than each of the multi-objective original binary bat (MO-BBA) and the multi-objective binary particle swarm optimization (MOBPSO) algorithms. Moreover, MO-ABBA converges to the best solutions faster than each of the MO-BBA and the MO-BPSO.


Introduction
Software testing is one of the crucial activities in the software development lifecycle. It is used to detect software defects and ensure that the software is delivered with high quality. Any modifications in one of the software components may affect one or more other components, which necessitates the re-execution of the earlier test cases in addition to the newly generated ones [1]. As a result, the test suite size expands over time and may include redundant test cases. Moreover, it may not execute in it is entirely within the testing budget the global best solution; while the multi-modal function has more than one local minima locations but only one of them is the global best solution).
Formulating the TSR problem as an optimization problem and defining a fitness function based on two objectives which are: (i) the execution cost of the reduced test suite and (ii) the effectiveness of the reduced test suite in terms of its mutation score. The variable weighted sum method [19] was utilized to form the multi-objective fitness functions, which guides the BBA to search for the non-dominated solutions that provides an optimum balance between the cost and effectiveness of the reduced test suites.
Evaluating the performance of the multi-objective ABBA (MO-ABBA) against each of the multiobjective BBA (MO-BBA) and the multi-objective BPSO (MO-BPSO) in solving the multi-objective TSR problem over eight test suites of different sizes.
The rest of the paper is organized as follows: Section 2 introduces some important preliminaries for this work. Section 3 discusses the previous studies that tackled the TSR problem. Section 4 presents the multiobjective adapted binary bat algorithm for solving the TSR problem. Section 5 discusses the experiments and results. Finally, Section 6 concludes the paper and introduces possible extensions to this work.

Test Suite Reduction Problem
Given: A test suite TS which includes d test cases, and a set of mutants mu 1 ; . . . ; mu n f g , that should be killed to provide an adequate testing of the SUT. Each test case tc j can kill one or more mutants mu i .
Problem: Find an adequate subset TS ' TS that can kill as many as possible number of mutants and includes as few as possible number of test cases. These two objectives are contradictory; this is the reason we formulated the TSR problem as a multi-objective optimization problem.

Pareto Optimal Concepts
In the multi-objective optimization problems, there is no single solution but a set of multiple trade-offs solutions [25]. The vector of decision variables that optimizes the considered objective functions and satisfy the problem constraints is called a Pareto front. Thus, the Pareto front is a set of Pareto solutions which are not dominated by any other solution. A solution x ¼ x 1 ; x 2 ; : : :; x n ½ is said to dominate a solution y ¼ y 1 ; y 2 ; : : :; y n ½ , if and only if y is not better than x for any objective i ¼ 1; 2; : : :; n, and there exists at least one objective x i in x which is better than its corresponding objective y i in y. On the contrary, two solutions are said to be non-dominated when none of them dominates the other. Fig. 1 depicts the difference between dominated and non-dominated solutions and represents the Pareto front. In the figure, the objective functions f 1 and f 2 are to be minimized. It is obvious that solution A dominates Þ: Moreover the solutions A; B and C are nondominated solutions because none of them is better than the others in both objectives; as A is the best for objective f 1, whereas C is the best for f 2 objective, and B is better than A for objective f 2 and better than C for the objective f 1. The set of non-dominated solutions of the multi-objective optimization problem is called the Pareto optimal set, and its representation in the objective space is the Pareto front. This set satisfies two properties: (i) any solution found is dominated by at least one solution in the Pareto set, and (ii) every two solutions in the set are non-dominated to each other.

Bat Algorithm
Bat algorithm (BA) is one of the recent metaheuristic swarm intelligence optimization algorithms which is proposed by Yang [18]. BA was inspired by the behavior of the micro-bats. A bat b i flies randomly with velocity V i at position X i with a frequency F i , varying wavelength i ¼ V i F i and loudness A i to search for a food/prey in a d dimensional search space. The BA starts with randomly generating the initial population of bats. The values of the parameters of each bat b i are updated over the iterations according to Eqs. (1)- (3).
where, Gbest is the current global best location solution ð Þ; F i is the i th bat frequency value, F min and F max are the minimum and maximum frequency values respectively, b is a random number of a uniform distribution. The bats perform a random walk procedure which is defined by Eq. (4) for exploring the space.
where, E is a random number in the range À1; 1 ½ , A t is the average loudness of all the bats at time t. It could be stated that the BA is a balanced combination of the PSO and the intensive local search algorithms. The balance between these two techniques is controlled by both loudness A ð Þ and the pulse emission rate r ð Þ which are updated according to Eqs. (5) and (6).
where, a and c are constants; a is analogous to the cooling factor in the simulated annealing (SA).
Mirjalili et al. [26] proposed the BBA to solve optimization problems in the binary search space. In the BBA, the bat's position is changed from one to zero or vice versa based on the probability of the bat's velocity according to Eqs. (7) and (8).
where x k i t ð Þ and V k i t ð Þ is the position and velocity of i-th particle at iteration t in k-th dimension, and ðx k i t ð ÞÞ À1 is the complement of x k i t ð Þ. Researchers proposed a significant number of approaches to minimize the size of the test suite. The majority of these approaches are based on greedy algorithms [27], while very few of them are based on clustering algorithms [28] or utilize hybrid algorithms (e.g., neuro fuzzy techniques) [29]. Greedy based approaches utilize one of the greedy algorithms to determine the reduced test suite based on the current best strategy. Over each iteration, the greedy algorithm adds to the reduced test suite the test case that has the highest greedy property, e.g., the highest statement coverage, which is a local optimal solution. It stops when the desired percentage of coverage is reached. Greedy approaches were proved empirically to be effective in solving the SO-TSR problem. On the other hand, clustering based approaches utilize one or more of the clustering algorithms to group similar test cases together according to a predefined similarity measure. Then a sampling mechanism is applied to select one or more test cases from each cluster to be included in the reduced test suite; while the rest of the test cases are discarded. Recently, heuristic algorithms where utilized to solve the single and multi-objective TSR [8,[10][11][12][13][14][15]17]. According to a survey study conducted by Khan et al. [8] the majority of the heuristic search based TSR approaches, 79% of them, are single-objective optimization. Some researchers showed empirically that some heuristic algorithms are superior to greedy algorithms in solving each of the single and multi-objective TSR problem [12,17]. The work of Yoo et al. [12] is considered as the first work that applies multi-objective optimization for test suite minimization. The authors used the NSGAII algorithm and showed experimentally its superiority over the greedy approaches. Geng et al. [10] and Gupta et al. [15] also utilized the NSGA-II algorithm but with different objective drivers. The objectives of Geng et al. [10] were the code coverage and test suite cost; while the objectives of Gupta et al. [15] were the code coverage and mutation score. Wang et al. [13] proposed utilizing three types of weighted-based genetic algorithms for minimizing the test suite of the product lines software. Where, the authors weighted summed the different objective drivers to form a single objective fitness function that guides the GA during the search for a Pareto front. Their experimental results showed that the Random-Weight GA algorithm outperforms seven other popular multi-objective search algorithms including: NSGA-II, strength Pareto evolutionary algorithms (SPEA) and speed-constrained multi-objective Particle Swarm Optimization (SMPSO). Wei et al. [14] compared among six evolutionary multi-objective optimization algorithms including NSGAII and several variants of the multi-objective decomposition-based evolutionary algorithm (MOEA/D). Their experimental results showed the superiority of the NSGAII over small programs, but over large programs (space) the MOEA/D was superior. They tried different combinations of objective drivers including mutation score, code coverage and test suite cost. The experiments showed that for the same statement coverage, using the "mutation score" as an objective driver guided the evolutionary algorithms to the smallest test suite.

Difference from the previous work
The approach proposed in this paper is a heuristic search-based approach. We adapted one of the recent heuristic algorithms BBA which proved its superiority over other evolutionary algorithms in different contexts [20]. However, the BBA occasionally fails to discover the global best solution for some multi modal functions; in addition, the BBA is used for solving single objective optimization problems. So, we proposed modifications to the BBA and utilized it to minimize the test suite size without loss in the fault detection quality. We used mutation testing to measure the quality of the reduced test suite, because mutation testing has been studied by numerous researchers as a method to assess the quality of a test suite [24,[30][31][32]. Previous research proved empirically that mutation testing is more effective than code coverage in evaluating and comparing test suites [24].

TSR Solution Encoding
Consider that each bat b i has a position vector X i that represents a solution to the TSR problem, i.e., each X i represents a reduced test suite. X i is encoded as a binary vector X i = (x i1 , x i2 , …., x id ), where, d is the size of the original test suite (the total number of test cases), each bit x ij corresponds to a test case tc j , the bit value is equal to "1" or "0". This means that tc j is included/excluded in the test suite, respectively.

Adapted Binary Bat Algorithm (ABBA)
Generally, the performance of any heuristic algorithm, including the BA, is affected by two crucial competencies which are: 1) exploration and 2) exploitation. Exploration is the ability of an algorithm to find promising solutions by searching various unknown regions, while exploitation leads the algorithm to the best solution among the discovered ones. Exploration capability can get the algorithm away from a local optimum it gets stuck in, while exploitation capability increases the convergence speed of an algorithm. It is important to keep the balance between the global and local search, such that the global search is amplified at the early iterations. While the local search is amplified at the late iterations so the algorithm converges to the global optimum.
The update formula of the bat velocities, (1), includes two components. The first component is the previous velocity of the bat, V i t ð Þ, which is responsible for the global search (exploration). As, V i t ð Þ directs the bat to keep its velocity and direction, thus it overflows the search space. While the second component, (X i t ð Þ À GbestÞF i ; is responsible for the local search (exploitation). As it directs all the bats to a region near to the best-found global solution (Gbest). So, the following modifications were proposed to the V i t þ 1 ð Þ formula: Firstly, multiplying the term V i t ð Þ by an inertia weight factor "w", which is given by Eq. (9). The value of w will decrease linearly over iterations. The inertia weight was recommended by a number of previous studies that aimed at enhancing each of the BAand the PSO [24,33].
where w max and w min are pre-determined constants; iter is the current iteration number, iter max is the maximum number of iterations.
The other suggested modification is to assume that each bat emits two frequencies instead of one before the bat decides on its moving direction. The first frequency is directed towards the location of the Gbest, while the second frequency is directed towards a randomly selected best solution discovered over the previous iterations Rbest. Any of these previously discovered best solutions could be a candidate for a global optimum solution. This way each bat benefits from the experiences of the other bats. Consequently, Eq. (1) is amended as follows: where Rbest is a randomly selected best solution other than the Gbest, d increases non-linearly from d min to 1 which increases the impact of the location of the Gbest over the iterations, so the bats converge to the Gbest.

TSR Multi-Objective Fitness Function Formulation
In this paper the fitness function is composed of two objectives. The first objective aims at minimizing the cost of the test suite; which is expressed in terms of the execution time as recommended by Yoo et al. [12]. While the second objective aims at selecting a reduced test suite that is capable of detecting the largest number of faults. The fault detection capability of the reduced test suite is expressed in terms of the mutation score. Wei et al. [14] found out that the mutation score is the most effective objective for solving the TSR problem. The two objectives are defined using Eqs. (13)- (14).
where |tc | is the size of the original test suite, | RTS| is the size of the reduced test suite, exec time i is the execution time of a test case "i", mu j j is the total number of mutants of the software under test and killed mu j jis the number of the killed mutants by the reduced test suite. The detection capability (number of killed mutants) of the reduced test suite is the cumulative sum of the detection capability of each tc i in the reduced test suite; where each tc i is represented using a binary vector of size n, n is the total number of mutants of a SUT. The value of a bit number j in tc i vector is set to equal "1"/"0" if tc i kill/do not kill the mutant number j.
To formulate the fitness function, we used the weighted sum method which is simple and traditional method for multi-objective optimization. It produces a Pareto-optimal set of solutions by changing the weights among the objective functions. Yang [19] showed experimentally that the weighted sum method for combining the multi-objectives into a single-objective is very efficient even with highly nonlinear problems, complex constraints and diverse Pareto optimal sets. Moreover, Wang et al. [13] showed that the random weighted-based GA (multi-objective GA based on weighted sum method) is superior to some popular multi-objective algorithms, e.g., NSGAII and SPEA.
The fitness function used in this work is defined by Eqs. (15)- (17), the best solution is the one that maximizes the fitness.
where, weight 1 ; weight 2 are the weights used to find the Pareto-optimal set of solutions, weight init denotes the initial value of weight 2 , iter is the current iteration number, and n is a modulation index. With the increase in the iteration the value of weight 2 increases from weight init to 1, whereas weight 1 decrease from (1 − weight init Þ to 0. The values of weight 1 and weight 2 determine the importance of each objective to the fitness function. The different values of weight 1 ; weight 2 produce different non-dominated solutions with sufficient diversity; so the Pareto front can be approximated correctly.
Algorithm 1 shows the basic steps of the multi-objective ABBA.

Research Questions
The experiments were designed to answer the following research questions:   (3) and (10) Calculate v V i ð Þ using Eq. (7) Update X i using Eq.

Data Sets
To answer RQ1, we used twelve unimodal and multimodal benchmark functions. Tab. 1 lists these functions along with their search boundaries (range). fn 1 À fn 6 are unimodal and fn 7 À fn 12 are multimodal benchmark functions. The global minimum values of all benchmark functions used are 0. In the experiments, 15 bits were used to represent each continuous variable in binary. Thus, the dimension of generating a bit vector for a benchmark function f n was calculated by Eq. (18) as follows: where, nb is the dimension of the bats/particles.
To answer RQ2 and RQ3, we used eight programs, out of which, two are C open-source programs and their characteristics retrieved from a popular repository, Software-artifact Infrastructure Repository (SIR) [34], which are: flex v1 and make v1. While the other six java programs are from an available dataset (1) , provided by Polo et al. [35]. The execution time of the test cases of these six programs is not available, so we assumed that all the test cases have equal execution time equal to 1 unit time. The characteristics of the eight programs are listed in Tab. 2, which are the line of code (LOC), the test suite size |tc|, number of mutants |mu|, and the execution time of the original test suite (T exec ).

Parameter Setting
We experimented with the most recommended values for the parameters in the literature [36]. Tab. 3 lists the parameter values that achieved the best performance.

Performance Metrics
Two metrics were used to assess the performance of the heuristic algorithms in general which are the fitness Mean and standard deviation (SD), defined by Eqs. (19) and (20).
where, N is the number of runs, f i is the fitness of the best solution discovered during the run number i.
As the evolutionary algorithms are stochastic, for each experiment, 10 independent runs were performed; then the Mean and SD are calculated. Larger mean values indicate better solutions. While the smaller the value of the SD, the more robust is the algorithm; as small SD values indicate that the algorithm can find acceptable solutions in the different runs, with small discrepancy.
Extra three specific metrics were used which are: (i) Test suite size reduction rate (TSRR), (ii) Execution time reduction rate (ETR) and (iii) Fault detection capability rate (FDR); They are calculated using Eqs. (21)-(23) The higher the values of the TSRR, ETR and FDR, the better the performance of the search algorithm.
6 Results Answer to RQ1: Tab. 4 lists the mean and SD of the optimal solutions discovered for each function over the ten runs, also lists the mean and SD of the number of iterations executed to reach the corresponding optimal solutions; the best results are pointed out in bold style. The maximum number of iterations was set to equal 1000 across all the experiments; however, the best solutions were achieved earlier than the predetermined maximum number of iterations. It should be noted that, for any of the previously mentioned benchmark functions, the best solution is the one that has the smallest mean value (minimization problem). Also, the smaller the mean value of the executed number of iterations, the faster the convergence speed of the heuristic algorithm.
As could be observed from Tab. 4 that the performance of the SO-ABBA algorithm is superior to the SO-BBA over all the unimodal and multimodal benchmark functions; as SO-ABBA could discover better solutions than the ones discovered by the SO-BBA, in terms of the mean values of the best solutions. In addition to, the SD values of the best solutions in case of using the SO-ABBA are smaller than when using SO-BBA over all the functions, which indicates that the SO-ABBA is more robust than the SO-BBA. Small SD values of the optimal solutions discovered by the SO-ABBA prove that the SO-ABBA is efficient in finding the best solutions without large variance among the different runs. When comparing the performance of the SO-ABBA to the SO-BPSO, it was found that the SO-ABBA was capable of discovering better solutions than the SO-BPSO for fn 3 , fn 4 , fn 6 , fn 7 , fn 9 , fn 12 functions. In addition, the SD values of the best solutions in the case of using the SO-ABBA are smaller than when using the SO-BPSO. On the other hand, both the SO-ABBA and the SO-BPSO could discover the same best solutions for fn 1 , fn 2 , fn 5 , fn 8 , fn 10 , fn 11 . As could be observed from the mean values of the number of iterations executed by each of the three algorithms to discover the optimum solutions that, the SO-ABBA could converge to the best solutions much faster than each of the SO-BBA and the SO-BPSO. Answer to RQ2: To answer RQ2 we conducted a set of experiments over the previously mentioned 8 programs. The results listed in the paper are the mean and SD of 10 independent runs over each program. The maximum number of iterations was set to equal 100 in all the experiments. Tab. 5 lists the generated values of weight1 and weight2 which were used to calculate 10 non-dominated solutions (NDS) on the Pareto surface. While Tab. 6 lists the mean and SD values of each of the fitness (F) and convergence speed of the discovered NDS. The convergence speed is measured in terms of the number of iterations executed by each algorithm to discover the best solution (#i). To simplify the visualization of the results, only three NDS (nu. 1, 5, 10) were listed in the table. Tab. 7 lists the FDR, TSRR and ETR of the three selected NDS.
As could be observed from Tab. 6, that the MO-ABBA algorithm surpasses both of the MO-BPSO and MO-BBA in terms of the mean fitness values of the discovered NDS, across the 8 programs. Moreover, the fitness standard deviation values of the NDS discovered by the MO-ABBA are very small; most of them are equal to zero or approaches zero, which indicates the stability of the MO-ABBA. E.g., the fitness mean and SD values of the NDS#5 discovered by the MO-ABBA, MO-BBA and MO-BPSO are as follows: Flex_V1   Fig. 2 shows sample convergence curves of the three algorithms over the programs. As could be observed that the MO-ABBA converge faster and to better solutions than the MO-BBA and the MO-BPSO, although the parameters settings of both of the MO-ABBA and the MO-BBA are the same. As could be observed from the set of experiments over the benchmark functions that the performance of the SO-ABBA was not affected by the functions' ranges or types. For example, the SO-ABBA was superior to SO-BBA and SO-BPSO over unimodal functions fn 2 and fn 4 , but fn 2 has a small range, while fn 4 has a wide range. SO-ABBA has the same superior performance over the multi-model functions fn 9 and fn 11 : Moreover, the performance of the MO-ABBA was superior over the 8 programs with different test suites sizes (ranges from 25 to 1034), different execution times (ranges from 170.38 to 12070.23) and different numbers of mutants (ranges from 17 to 239). From these observations we could conclude that the ABBA is scalable.

Conclusion and Future Work
This paper proposed solving the multi-objective test suite reduction problem using the Binary Bat algorithm, which is reported in the literature as one of the effective swarm intelligence based algorithms. The BBA algorithm was adapted for better exploration capabilities and consequently better performance. The TSR problem was formulated as a multi-objective optimization problem. The adapted binary bat algorithm was utilized to search for the non-dominated solutions that keep the balance between the cost of the reduced test suites and their fault detection capabilities. The effectiveness of the adapted binary bat algorithm was assessed over eight programs of different test suites sizes, in addition to a set of unimodal and multi modal benchmark functions. The experimental results showed that the performance of the proposed MO-ABBA is superior to each of the MO-BBA and MO-BPSO in terms of the previously defined five metrics. Moreover, The MO-ABBA converged to the best solutions faster than each of the MO-BBA and the MO-BPSO. As a further extension for this work, different weighting mechanisms could be tried for the weighted sum multi-objective optimization. In addition, the fitness function could be redefined to include more objectives such as branch coverage. Furthermore, different inertia formulas [24] could be experimented with.
Funding Statement: The authors received no specific funding for this study.

Conflicts of Interest:
The authors declare that they have no conflicts of interest to report regarding the present study.