This paper describes an efficient solution to parallelize software program instructions, regardless of the programming language in which they are written. We solve the problem of the optimal distribution of a set of instructions on available processors. We propose a genetic algorithm to parallelize computations, using evolution to search the solution space. The stages of our proposed genetic algorithm are: The choice of the initial population and its representation in chromosomes, the crossover, and the mutation operations customized to the problem being dealt with. In this paper, genetic algorithms are applied to the entire search space of the parallelization of the program instructions problem. This problem is NP-complete, so there are no polynomial algorithms that can scan the solution space and solve the problem. The genetic algorithm-based method is general and it is simple and efficient to implement because it can be scaled to a larger or smaller number of instructions that must be parallelized. The parallelization technique proposed in this paper was developed in the C# programming language, and our results confirm the effectiveness of our parallelization method. Experimental results obtained and presented for different working scenarios confirm the theoretical results, and they provide insight on how to improve the exploration of a search space that is too large to be searched exhaustively.

The importance of parallel instruction execution for our digital society is undeniable, and the development of fast processing algorithms raises interesting research questions in a wide range of fields: Aerospace technology, banking systems, CAD/CAM technologies, and IoT. Modern society depends on the parallel processing capabilities of electronic devices, and we would not be able to maintain our current operations without different optimization techniques. However, there is currently no polynomial algorithm capable of parallelizing software instructions.

Optimizing parallel computing algorithms is an open research field. Research methods generally involve finding a series of suboptimal solutions that asymptotically tend to the optimal solution. For some poorly optimized problems, probabilistic algorithms can be used to improve efficiency. They usually do not guarantee the best solution, but there is a chance that, through random selection, they will approach the proposed goal. In addition, analytical models cannot be applied to such problems either, because the problem of optimal distribution in multiprocessor systems is not a typical problem of the extremum. It has been shown that this type of problem is NP-complete (there are no polynomial algorithms capable of providing solutions) [

Running parallel instructions on multiple processors becomes more difficult in the case of additional communication constraints between processors. For example, there are dependencies between the instructions executed in parallel on different processors, dependencies between the instructions on the same processor, and dependencies between instructions on different processors, that are not executed in parallel. Optimization for parallel computation is generally based on broad and poorly structured search areas. For this reason, heuristic techniques are much more appropriate. Artificial intelligence methods provide effective solutions, and genetic algorithms, a type of evolutionary strategies, are perfectly suitable for complex optimization. Moreover, if a task is properly encoded, in some cases, it is possible to obtain better results than analytical approaches.

This paper is structured in five sections as follows. Section 2 provides a brief review of the parallelization of program instructions and genetic algorithms used for exploring large search spaces. Section 3 presents our method for the parallelization of software program instructions and our implementation of the genetic algorithm-based solution. Section 4 contains experimental results, with a focus on the functional performance in terms of processing speed and quality of the obtained results. Finally, Section 5 discusses our findings and sets some goals for future research directions.

In this section, we briefly present the literature on the parallelization of program instructions and existing genetic algorithms-based models.

The parallelization of program instructions or instruction scheduling is performed with a set of program instructions and a number of functional units. As reported by [

It has been proven that there are no polynomial algorithms that can scan the solution space and solve the instruction scheduling problem [

Research has been conducted on the different heuristics that can be applied for different computer architectures. The aim of the artificial intelligence-based strategies, including genetic algorithms, is to move away from investigating regions of the search space that do not produce significant results. In [

An interesting approach was presented in [

In [

In [

Taking previous research into consideration, we investigated instruction scheduling techniques, and we designed and developed a general genetic algorithm applicable to different parallelization methods and functional units. By investigating the way solutions are found, we improved our understanding of instruction scheduling and proposed optimized arrangements of concurrent instructions for different implementations.

The beginnings of genetic algorithms are rooted in the 1950s, when several biologists suggested using computers to simulate biological systems. Later it was found that genetic algorithms were powerful computing tools for optimization problems. Therefore, models for finding optima in complex problems were established using the basic principles of genetics [

There are several forms for a genetic algorithm [

The algorithm above requires some discussion. First, the initial population is generated by randomly creating a fixed number of chromosomes, with each chromosome being a valid solution to the problem at hand. An evaluation function is then applied to each individual in the population [

In this section, we present our method for the parallelization of program instructions. First, we describe the parallelization steps and the construction of the general data dependence graph using an example. Second, we present our genetic algorithm in detail. Third, we illustrate our algorithm with a general diagram and describe our implementation.

A software program consists of a set of instructions, each of them with associated latencies. Parallelization is possible when there are multiple processors that can execute the instructions. The goal is to rearrange the program’s instructions so that the number of the unparalleled instructions and the makespan (task execution time) are minimized. In order to describe the problem, let a program consist of several instructions that we number in the order of their appearance:

Using two processors, to parallelize, we can distribute these instructions as presented in

Iteration steps | Processor 1 | Processor 2 | ||
---|---|---|---|---|

Instruction number | Instruction code | Instruction number | Instruction code | |

1 | (1) | (2) | ||

2 | (3) | (5) | ||

3 | (6) | (7) | ||

4 | (4) | (8) | ||

5 | (9) | – | – |

In this example, we could parallelize instructions (1) and (2) because there are no data dependences between them, as well as (3) and (5), (6) and (7), and (4) and (8) for the same reason. However, we cannot parallelize (3) and (4) because to calculate w, in (4), z is required, but z is modified in (3), so there is a data dependence between instructions (3) and (4). Also, instruction (3) depends on (1), because (3) needs x, which is modified in (1). Moreover, instructions (1) and (3) must appear in this order. Instructions (4) and (5) depend on (2) and (4) depends on (2) and (3), so (4) must appear after (2) and (3).

We can generate a general data dependence graph in which the nodes contain the instructions, and each edge denotes a data dependence constraint between the nodes it connects. The general data dependence graph for the instructions presented above is depicted in

One possibility for representing a chromosome (possible solution) would be: (1), (3), (6), (4), (9), (2), (5), (7), (8), –, i.e., by taking the instructions in order on the first processor and then on the second one (as in

Our proposed genetic algorithm for parallelizing instructions of a software program has six main components: The initialization function, the initial population (chromosomes), the evaluation function, the selection function, the crossover function, and the mutation function.

We first describe how to perform the crossover operation. The crossover operator is applied to individuals from the intermediate population and imitates natural inter-chromosomal crossover. Crossover provides an exchange of information between the two parents, with the offspring having the characteristics of both parents. The crossover operator acts as follows: Two individuals of the intermediate population (also known as the “crossover pool”) are randomly selected, and some parts of the two individuals are interchanged.

For our experiment, the selection of the individuals to be crossed is based on a roulette wheel that has a number of slots equal to the number of individuals in the population, but the slots are not equal in size. The size of each slot is proportional to the value of the evaluation function applied to the individual it represents. In the first step, we calculated the fitness value of each chromosome from the old population and then computed the ratio of each solution’s fitness value to the solution. In other words, even if an individual is far from the optimal solution, there is still a chance that the individual with improper fitness will reproduce, but if an individual is closer to the optimal solution, it is more likely to be selected for the crossover. The number of chromosomes involved in the crossover operation is calculated in software with the expression:

The crossover consists of the exchange of genetic information between two parents and is performed by randomly choosing two positions within each selected chromosome to obtain two new chromosomes according with the pseudocode presented in Subsection 3.3.

As an example, let two chromosomes be:

Two positions from each chromosome are randomly selected, for example, 5 and 8 (the vectors start with the index 0 and end with 17). The new chromosomes are obtained as follows: The elements between positions 5 and 8 remain unchanged and the other positions are replaced by elements from the other chromosome.

Two valid chromosomes have been obtained, and any supplementary checking is unnecessary.

For the

If we generate positions 1 and 13, and we interchange the corresponding values, a new chromosome can be obtained:

There should be very little dependencies between instructions, ideally none. There are three types of dependencies:

Case 1: If we try to parallelize instructions (1) and (3) from Section 3.1, there is a data dependence between them, so we increase the value of such a chromosome by 1 (where a high value means a solution far from optimal). The goal is to minimize this value.

Case 2: If we interchange instructions (1) and (3) on the first processor, we will add a new penalty because

Case 3: If we have instruction (3) on the first processor and (1) on the second, so that the execution of (1) follows the execution of (3), we will get a new penalty for the same reasons as before.

A chromosome that has zero dependencies between instructions is a potential solution, because the instructions were able to be parallelized. But we can imagine a chromosome such as 1 2 3 4 5 6 7 8 9 | 10 11 12 13 14 15 16 17 18, for which the number of dependencies is zero, but it is not an optimal solution, because all the instructions were put on the first processor, and all the instructions are executed in sequential order. Therefore, we need a parameter to indicate how many instructions were not actually put in parallel. We should also note whether the instructions have the appropriate sequence. Therefore, it is necessary to define a more complex evaluation function, capturing the multitude of possible situations.

Based on the three cases that were identified, we construct a chromosome evaluation function with three components, as presented in

where

It is clear that the

For instance, suppose the first individual (chromosome) with the

For an odd number of instructions and when everything can parallelize, the best values for components are as follows:

For an even number of instructions and when everything can be parallelized, the best values for components are as follows:

Consequently, the stop criterion (desired optimum) in the ideal case where all instructions can be parallelized is

This subsection presents the implementation of our proposed genetic algorithm for parallel optimization of a program instructions. Based on the details presented in Subsections 3.1. and 3.2., our algorithm is represented in

The crossover act as follows: Select two chromosomes from the population using roulette wheel, randomly place two positions within each selected chromosome and obtain two new chromosomes according with the details presented in Section 3.2. The pseudocode of our algorithm for computing the crossover is the following:

1. Input: Old population

2. Output: New chromosomes in the new population (the number of new chromosomes is determined by the parameter named crossover probability)

Start

Select two chromosomes from the old population. We use the roulette wheel to generate two numbers (between 0 and population size).

Generate two random positions inside of the selected chromosomes (the positions must be between 0 and chromosome size).

Perform the two-point crossover, according with the details presented in Section 3.2. (Explanation of genetic operators—crossover) and obtain two new chromosomes.

Add the two generated chromosomes in the new population.

Stop

The algorithm will be applied until the application crossover probability is reached. The formula used to calculate the number of new chromosomes resulting from the crossover operation is:

The mutation (changing an element, gene, from a chromosome) is accomplished as follows:

1. Input: Old population

2. Output: New chromosomes in the new population (the number of new chromosomes is determined by the parameter named

Start

Select one chromosome from the old population. We use the roulette wheel to generate the chromosome number (between 0 and population size).

Generate two random positions inside of the selected chromosome (the positions must be between 0 and chromosome size).

Perform the swap mutation, according with the details presented in Section 3.2. (Explanation of genetic operators—mutation) and obtain one new chromosome.

Add the generated chromosome in the new population.

Stop

The algorithm will be applied until the mutation probability established in the application interface is reached. The formula used to calculate the number of new chromosomes resulting from the mutation operation is:

The evaluation function of each chromosome from the population is presented in detail and with different examples in Section 3.2. (explanation of evaluation function). The evaluation function (fitness) is accomplished as follows:

1. Input: Current population of chromosomes.

2. Output: Current population of chromosomes each having attached the value of the evaluation function.

Start

Select one chromosome from the current population.

for each gene (which is actually a program instruction) i from the selected chromosome verify the dependencies between parallel instructions (executed in parallel on different processors)

if there are dependencies

verify the dependencies between two instructions on the same processor

if there are dependencies

verify the dependencies an instruction on a processor and other instructions on different processors

if there are dependencies

end for each gene

verify the number of non-parallelized instructions (comparing the gene values) if there are non-parallelized instructions

verify the running order of the instructions (the gene that cannot be parallelized) if there are non-parallelized instructions

Compute the fitness for the selected chromosome using

Stop

For this problem, the contribution to the evaluation function of one gene from a chromosome depends on the value and the position of other genes from the same chromosome.

The algorithm shown in

This section presents testing and experimental results of the proposed genetic algorithm used for the parallel optimization of software program instructions. In this section, we test our genetic algorithm in two experiments, using different input files that contains program instructions.

Once all input data has been transformed into the required format, the genetic algorithm with the parameters specified in

The data from

Chromosome number | Chromosome genes | Chromosome fitness value |
---|---|---|

0 | 8 6 2 9 7 0 5 1 4 3 | |

1 | 5 9 6 2 3 8 1 0 7 4 | |

2 | 2 5 0 7 3 1 9 4 8 6 | |

3 | 1 0 9 8 6 2 7 5 4 3 | |

4 | 2 0 3 6 5 4 1 9 8 7 | |

5 | 2 4 3 1 9 7 0 8 5 6 | |

147 | 9 3 0 1 4 8 2 6 7 5 | |

148 | 5 6 8 2 1 9 7 3 0 4 | |

149 | 7 2 0 4 8 3 5 6 1 9 |

In

The evolution of fitness (best fitness) through the generations is depicted in

For the instructions presented in

The algorithms were coded in C# programming language, version 2015, and run on a Dell XPS L502X system with a configuration based on an i7-2630QM processor (frequency 2 GHz) and 8 GB RAM, and using the Microsoft Windows 10 PRO 64-bit operating system.

The evolution of fitness (best fitness) through the generations is depicted in

In this paper, we proposed a genetic algorithm to perform the parallelization of the instructions of a software program. The applicability and the efficiency of the proposed genetic algorithm in instruction scheduling is demonstrated by the experimental evaluation. The simulation results demonstrate that the solution based on genetic algorithms is suitable for this kind of problems and this approach can be used by compiler designers.

The steps we took to build our algorithm provide insight on how to further improve exploration of the search space. The results we obtained show that genetic algorithms are definitely worth investigating and represent a promising approach for future research. The logic used here to develop the genetic algorithm may be helpful with a few modifications to address other complicated parallelizing problems, such as parallelizing complex arithmetic instructions, and the treatment of cycles and decisions. In addition, it is possible to create a parallel version of the described algorithm, e.g., in a message passing interface, by using multiple computers or the IoT.