Automatic PSO Based Path Generation Technique for Data Flow Coverage

Path-based testing involves two main steps: 1) finding all paths throughout the code under test; 2) creating a test suite to cover these paths. Unfortunately, covering all paths in the code under test is impossible. Path-based testing could be achieved by targeting a subset of all feasible paths that satisfy a given testing criterion. Then, a test suite is created to execute this paths subset. Generating those paths is a key problem in path testing. In this paper, a new path testing technique is presented. This technique employs Particle Swarm Optimization (PSO) for generating a set of paths to satisfy the all-uses criterion. To construct such paths for programs with loops, the proposed technique applies the ZOT-criterion. This criterion selects paths that traverse loops 0, 1, and 2 times. The proposed technique utilizes the decision-decision graph of the program under test to represent the position vector of the particle. To evaluate the efficiency of the presented technique, an empirical study has been conducted, which included 15 C# programs. In this study, the proposed technique has been compared with a genetic algorithm (GA)-based one. The results showed that the PSO required 199 generations, while the GA required 349 generations, to satisfy all def-use paths of all programs. In addition, the proposed technique required a smaller number of paths than the GA-based one.


Introduction
Path testing requires executing each path in the code under test at least once. However, executing all paths is impossible as a program with loops may include an infinite number of paths. This issue may be addressed by using a subset of all executable paths that satisfy a given path selection criterion. Finding this subset of paths is a significant challenge in path testing.
Scholars presented many path-generation approaches. Bertolino and Marre [1] presented a technique constructing a set of paths that satisfy all arcs of the tested program control-flow graph. Other studies have concentrated on generating the set of basis paths (i.e., the class of all linearly independent paths) This paper introduces a path testing approach that utilizes PSO for finding the paths required to cover the all-uses criterion proposed by Rapps and Weyuker [36]. The presented technique accomplishes this process by creating new paths using the formerly promising ones. The approach uses a criterion called ZOT-criterion to construct paths for programs with loops [37]. This criterion targets paths that traverse loops 0, 1, and 2 times. The approach also uses the edges of the decision-decision graph (dd-graph) of the code under test for encoding the position vector. The approach encodes the position vector of any particle as an integer array. The paper presented an experimental comparison between the proposed approach and a GA-based one. This comparison considered two main factors: 1) the number of generations required to find a path-cover for all def-use criterion, and 2) the size of this path-cover.
The rest of this paper is organized as follows. Section 2 presents a description of data-flow concepts. Section 3 discusses the basic PSO algorithm. Section 4 presents the proposed POS algorithm. Section 5 introduces the proposed PSO-based approach. Section 6 discusses the findings of the experimental study carried out to evaluate the proposed technique. Finally, Section 7 summarizes this work and gives some points for future work.

Data-Flow Concepts
Data-flow concepts consider the relations between the definitions (def) and references (use) of the variables in the program under test. Data-flow methods employ the control-flow graph of the program to find the relations between defs and uses.
The control flow of a program may be represented by a directed graph, named the control-flow graph (CFG). A CFG consists of a group of vertices and a group of arcs (edges). Each vertex represents one statement or a group of consecutive statements. Each arc represents a potential transfer of control flow between vertices. A path in a CFG is a finite sequence of vertices linked by arcs. A complete path is a path that starts at the entry vertex and ends at the exit vertex of the CFG. A path is called a def-clear path for a variable if it does not include a new definition for that variable.
The proposed approach uses a reduced form of the CFG, named a decision-decision graph (dd-graph). In a dd-graph, every arc is a decision-to-decision path (dd-path). Fig. 3 shows the dd-graph that corresponds to the CFG (shown in Fig. 2) of the C# code given in Fig. 1. Tab. 1 presents the dd-paths, which match the arcs of the dd-graph shown in Fig. 3. [32] presented a set of dataflow testing criteria, such as the all-uses criterion. Alluses criterion requires the execution of a def-clear path from each def of a variable to each use of that variable. The required def-clear paths to fulfil the all-uses criterion are named def-use paths. Def-use paths are built using the def-use associations of program variables by the method presented by Girgis [38]. Tab. 2 presents all def-use associations of the C# code given in Fig. 1 and their killing nodes.

Rapps and Weyuker
Killing nodes are those nodes that must not be included in a def-use path of a variable (i.e., nodes that contain other definitions for that variable).

Particle Swarm Optimization
PSO is a search-based optimization algorithm that is inspired by the motion and intelligence of flocks of animals. PSO employs the concept of social interaction to solve the given problem. A comprehensive survey of PSO applications was presented by Sengupta et al. [39]. PSO concepts are based on the motion of a set of particles in the search domain to find the best solution. Every particle is handled as a single node in an N-dimensional domain. Every particle regulates its "flying" based on its personal "flying experience" and the "flying experience" of the others.
Each particle has a fitness value estimated using a fitness function and velocities. PSO has two changeable variables: 1. pbest (personal best): saves the coordinates of the best position of the current particle in the solution domain. This position corresponds to the greatest solution achieved up to now by the particle. 2. gbest (global best): is the greatest value achieved up to now by any particle inside the neighborhood of the current particle.
Every particle attempts to adjust its position based on four factors: 1) the present place; 2) the present velocity; 3) the distance from the present place to pbest; and 4) the distance between the present place and gbest. The PSO steps are demonstrated below. At first, the initial position and the initial velocity of each particle are randomly assigned. Then, the fitness function of each particle is calculated and compared to the fitness value of the pbest location. If the present location has a fitness value greater than the pbest, then the present location becomes the new pbest. The greatest value of all pbests becomes the gbest. The position and velocity of each particle in the swarm are modified based on this gbest. The overall procedure is repeated until a specific number of generations (mGens) is reached.
The outcome of this algorithm is the global best value. The fitness function of the algorithm depends on the type of the application of the PSO. The updates of the position and velocity of each particle are calculated by the following equations [40]: where: 1. v k i : velocity of particle i at iteration k. 2. K: constriction factor that is required to guarantee the convergence of the PSO [41]. 3. c 1 , c 2 : learning factors. 4. r 1 , r 2 : uniformly distributed random number between 0 and 1. 5. p k i : current position of particle i at iteration k. 6. pbest i : pbest of particle i . 7. gbest: gbest of the group.
Eq. (1) is applied to update the velocity of the particles. In addition, Eq. (2) is applied to update the position of the particles. The PSO specifications can be: 1. The learning factors c 1 and c 2 are values in the interval from 0 to 4. Where the best values of c 1 and c 2 are determined empirically. The proposed algorithm was executed 20 times on each subject program and the best values of c 1 and c 2 were determined. 2. The number of particles is based on the PSO application. Commonly, the number of particles is in the range from 20 to 40; 3. The fitness function of the PSO is very related to the representation of the given problem.

The Proposed PSO Algorithm
The proposed PSO-based approach targets the automatic generation of test paths. The technique looks for the paths that fulfill the all-uses criterion. If the program under test contains loops, the proposed technique creates a subset of those paths that satisfies the ZOT-criterion.

Representation
In the proposed PSO algorithm, the position vector of a particle symbolizes the edges of a path through the program dd-graph. The length, L, of any position vector can be calculated using the formula L = n + 2 + p, where n + 2 is the number of edges in the dd-graph plus two more edges representing the entry and exit ones. In addition, p is the number of edges enclosed in the loops, which are included twice. This representation guarantees that the paths created for any program with loops satisfy the ZOT-criterion. For instance, the basic edges of the dd-graph of the C# code given in Fig. 3 are e 1 ; e 2 ; e 3 ; e 4 ; e 5 ; e 6 ; e 7 ; e 8 ; e 9 . Therefore, n = 9, plus 2 (entry edge, e 0 , and exit edge, e 14 Þ. In addition, p = 4, which are the edges enclosed in "while" loop, which are e 6 ; e 7 ; e 8 ; e 9 . Moreover, the duplicate of the edges of the loop is placed after the final edge, e 9 , with labels beginning by 10, i.e., e 10 ; e 11 ; e 12 ; e 13 . Consequently, the length of this position vector is L = 15. The position vector of the particle has the following form: The proposed algorithm employs two different forms for the position vector. The first form is a binary form called bit position vector. The second form is an integer form called actual position vector. Through any bit position vector, the digit 1 means that an edge is existing and the digit 0 means an edge is missing. In the actual position vector, each "1" digit in the bit vector is replaced by the index of the matching edge. Then, the bit vector is filled out with "−1" instead of the digits "0". The digit −1 is used to replace the missing edges to differentiate these edges from the indices of the existing edges, which are positive integers. The velocity vector of a particle consists of L digits in the interval [0, L] (for the example code the interval is [0,15].

Initial Population
The position vectors of the particles represent a group of paths. These vectors are symbolized by two forms of length L: bit position vector and actual position vector. To form the initial population, the proposed technique randomly creates PS bit arrays of length L, where PS is the population size. In each vector, the first, second, and last cells are filled with 1's to represent the entry, first, and exit edges that have to be exist in all paths. The proper value of the size of the population is empirically determined. The algorithm creates for every produced binary position vector the corresponding actual position vector. All paths in the produced population must satisfy the connectivity rule (i.e., composed of a series of connected edges). The algorithm discards any disconnected path and generates another one to replace it. Also, the algorithm randomly creates PS velocity vectors for the initial position population. The values of every velocity vector are integers randomly selected from the interval [0, L-1]. The values of the first, second, and last cells are 0, 0, and L, respectively. This configuration guarantees that the update process of the actual position vector produces a path that contains the entry, first, and exit edges.

Position and Velocity Vectors Updating
PSO algorithms update the position and velocity of a particle using Eqs. (1) and (2). These equations can only be applied in continuous optimization problems. In the proposed approach, updating the velocity and position are accomplished using the actual position and velocity vectors, which are encoded as integer sequences, rather than real-number vectors. So, the current forms of addition and subtraction in Eqs. (1) and (2) are not applicable to the problem of path generation. Therefore, in the proposed algorithm, addition and subtraction operations in the basic velocity and position updating equations are modified to work with the path generation problem.
The presented technique utilizes two operators: the first is subtracting-two-positions operator and the second is adding-position-to-velocity operator. The two operators are applied to the members of the current population to form a new population.
Subtracting-two-positions operator: The equation of updating the velocity vector (Eq. (1)) contains two terms for subtracting positions. We suggested the following definition (Eq. (4)) for this operator.
where p i f g LÀ1 i¼0 and p 0 i È É LÀ1 i¼0 are two position vectors. Adding-position-to-velocity operator: The equation of updating the position vector (Eq. (2)) adds the new velocity to the old position. We suggested the following definition (Eq. (5)) for this operator.
where p i f g LÀ1 i¼0 and v i f g LÀ1 i¼0 are position and velocity vectors, respectively. If the updated position vector is disconnected (i.e., it contains two consecutive edges that are not adjacent edges in the dd-graph), the algorithm tries to make it connected, by adding and/or removing one or more edges to/from it. The number of edges to be added or removed to connect the disconnected part depends on the length of the sub-path between the two disconnected edges and according to their positions in the dd-graph. If no sub-path can be added to fill the gap in the disconnected path, the algorithm tries to remove some edges to connect that path. Otherwise, the algorithm discards the disconnected vector and replaces it with the original position vector. Finding a sub-path to be added or removed to fill the gap in a disconnected path is a trial-and-error process. We used the algorithm given below (Algorithm Connect) to fill the gap between any two edges in the vector.
It should be noted that the resultant position vector p' may include a set of repeated values. If the repeated values are not indices of edges of a loop, these values are replaced by −1. The cells of the vector that represent indices for edges of a loop are allowed to be repeated only once. Therefore, an index for an edge in a loop can exist twice in the resulting position vector: the first existence is the original index and the second is the copy. To illustrate this rule, assume that p' is 0, 1,5,6,6,12,4,12,13,3,3,12,13,8,14. Now, the rule is applied to the resultant position p' as follows: Algorithm Connect(v: is a position vector; e1, e2: are two edges, D: is a dd-graph) If v is disconnected at e1, and e2 then Call ConnectByAdd (v, e1,  If v is empty return v cannot be connected Else if the predecessor of e1 in v is a predecessor of e2 remove e1 and return v Else if the successors of e2 in v is a successors of e1 remove e2 and return v Else remove e1 and e2 ConnectByRemove (v, predecessor of e1, successors of e2) End ConnectByRemove 1. The value 6 is an index of an edge in a loop. This edge is repeated twice. So, the first existence is not changed, while the second existence is replaced by the index of its copy, 10.

Fitness Function
The algorithm assesses every path (p) using the ratio of the number of the def-use paths, which it covers, to the total number of def-use paths in the program. A def-use (du) is covered if there is a path contains a subpath that begins with the def-node (d) and finishes with the use-node (u) and that path doesn't contain any killing node for that def-use. The fitness value fitness(P i ) of a particle P i (i = 1, …, PS) is computed using the following formula [42,43]: fitness P i ð Þ ¼ no: of def À use paths covered by P i total no: of def À use paths (6)

The Proposed Technique
The main modules of the system that implements the presented path generation technique are described below.
The system is written using C# programing language. It contains the following modules: 1. Analysis module. 2. Path generation module.
These modules are discussed in more detail in the following subsections. Fig. 4 presents the pseudo-code of the proposed system. This system applies the proposed PSO algorithm given in Section 4.

Analysis Module
This module accepts as input the source code P to be tested, analyses it, and produces the following outputs: For instance, for program P#8, the GA-based system needed 29 generations to cover 100% of all def-use paths. On the other hand, the PSO-based system needed 20 generations to cover 100% of all def-use paths. For the program P#4, both the PSO-based and GA-based systems reached the mGens of generations. The two techniques fulfilled only 88.89% coverage because this program contains a number of infeasible paths. For program P#13, the PSO-based system covered 100% in 12 generations, while the GA-based system reached the maximum number of generations (mGens = 100) and covered only 91.11% of all def-use paths.
(a) (b) Figure 5: Part of the report of the path generation module Fig. 7 compares the PSO-based system and the GA-based system according to the number of generations needed to cover all def-uses paths for every subject program. According to this comparison, the PSO system is more effective than the GA-based system in reducing the number of generations needed to accomplish 100% def-use path coverage. According to this comparison, the PSO-based system is more effective than the GA-based system in reducing the number of paths for 5 out of the 15 programs. In the other 9 programs, the two systems created an equal number of paths. In P#13 the GA-based system created a number of paths smaller than the PSO-based system but it reached only 91.11% coverage, while the PSO-based system reached 100% coverage.   This paper introduced a path-testing approach, which applies a PSO algorithm to construct test paths to satisfy the all-uses criterion. For programs that contain loops, the presented approach creates the paths according to the ZOT-criterion. An experiment that included 15 C# programs has been conducted to evaluate the efficiency of the presented technique compared to a GA-based path generation technique. The results showed that the PSO-based technique required 199 generations, while the GA-based technique needed 349 generations, to satisfy all def-uses of all programs. The PSO-based technique generated 60 paths and the GA-based technique generated 63 paths to satisfy all def-uses. In the future work, the experiments will be re-conducted to compare the time consumed by the two techniques using the same machine. Also, we will focus on conducting the experiments using another programing language such as Java with large data set. In addition, in the future work, we will focus on developing a system to find a set of test data to cover the generated paths.

Conflicts of Interest:
The authors declare that they have no conflicts of interest to report regarding the present study.