An Energy Aware Algorithm for Edge Task Offloading

To solve the problem of energy consumption optimization of edge servers in the process of edge task unloading, we propose a task unloading algorithm based on reinforcement learning in this paper. The algorithm observes and analyzes the current environment state, selects the deployment location of edge tasks according to current states, and realizes the edge task unloading oriented to energy consumption optimization. To achieve the above goals, we first construct a network energy consumption model including servers’ energy consumption and link transmission energy consumption, which improves the accuracy of network energy consumption evaluation. Because of the complexity and variability of the edge environment, this paper designs a task unloading algorithm based on Proximal Policy Optimization (PPO), besides we use Dijkstra to determine the connection path between edge servers where adjacent tasks are deployed. Finally, lots of simulation experiments verify the effectiveness of the proposed method in the process of task unloading. Compared with contrast algorithms, the average energy saving of the proposed algorithm can reach 22.69%.

The problem of energy consumption optimization in the process of task unloading has attracted extensive attention. However, most of existing works focus on the energy consumption management and resource allocation of terminal equipment, lacks energy management method of edge servers, Fig. 1 describes the task unloading process in detail. In Fig. 1 each terminal device has a task queue to be unloaded, each task in the queue is represented by T i;j , i is the number of terminal equipment, and j is the number of the task to be unloaded in the task queue. And T i;jþ1 depends on the output of T i;j . We assume that the terminal only completes the data collection and does not participate in the execution of the task.
We focus on the energy consumption optimization of edge servers during task unloading.
In this paper, we first construct a mathematical model including server energy consumption and link energy consumption. Because of the complexity of the edge computing environment, it is necessary to use a reliable and scalable learning algorithm. Based on the above analysis, we design an edge task unloading algorithm based on Proximate Policy Optimization (PPO). The technical contributions of this paper are summarized as follows: To accurately describe the energy consumption of edge network during task unloading, we construct a mathematical model to describe the energy consumption, which includes server energy consumption and link energy consumption. The processing energy consumption of the server is directly proportional to the CPU utilization, and the transmission energy consumption of the link is proportional to the bandwidth utilization. In this paper, the reward function of reinforcement learning algorithm is designed according to the network energy consumption model, the greater the energy consumption, the smaller reward value. Task offload strategy designed in this paper is responsible for selecting the deployable edge server, and the path between adjacent servers is determined by Dijkstra algorithm. Once a task is unloaded, the available resources (environment states) in the edge computing environment are updated.
The rest of this paper is organized as follows: In the second part, this paper briefly reviews related works. In the third part, we construct a mathematical model to describe the energy consumption of edge network, then constructs the problem as ILP model after considering various constraints. In the fifth part, the task unloading algorithm is simulated and the simulation results are analyzed. The last part summarizes the work of this paper and supplements the parts to be improved.

Related Work
In edge computing environment, terminal devices can choose to upload some tasks to the edge server. Through the above operations, not only can the energy consumption of the device be reduced, but also the risk of privacy leakage in traditional cloud computing can be decreased, and the real-time performance of task processing can be improved. However, the existing literature focuses on reducing terminal energy consumption and improving the response speed of unloading tasks. Reference [5] considered how to allocate channel resources, reference [6] focus on how to allocate computing resources, [7][8][9] comprehensive consideration of multiple factors to reduce terminal energy consumption. By offloading tasks to edge servers, terminal energy consumption can be reduced. However, current researches mainly focus on the allocation of channel resources and computing resources in edge networks, only a few researches involve the energy consumption of edge servers. Reference [10] studied how to ensure the longest running time of the whole system through mutual task unloading among servers in the scenario of limited energy of each edge service. However, this method can only prolong the survival time of the whole edge server node. Considering the energy consumption of data transmission between edge servers, the above research does not reduce the energy consumption of edge server.
Dynamic voltage Scaling (DVS) can dynamically adjust voltage frequency to reduce energy consumption and ensure the quality of service of real-time tasks [11,12] studies how to use DVS to unload tasks from terminal devices to edge servers. The experimental results show that DVS can not only complete the task processing within the specified time, but also reduce the terminal energy consumption. [13,14] further discussed the joint optimization of TD energy consumption and task processing time in Mobile Edge Computing (MEC) system using DVS technology, but none of the above studies involved the energy consumption of edge server.

Problem Formulation
In this section, we first construct a task unloading model with energy consumption as the optimization goal in IoT. After considering the constraints of bandwidth, computing resources and traffic conservation, we get an optimization problem model.

Physical Network Model
In this paper, undirected graph G V; L ð Þ is used to represent the edge network, where V is the set of physical nodes in the network, L is the set of physical links in the edge network, l vu 2 L. Computing power of each physical node is represented by C v , the load capacity of each physical link is represented by C l , and physical link l vu connects node v and node u.

Task Queue Model
In this section, task queue on each terminal is modeled as a directed graph in this paper, and the tasks to be unloaded in the queue are not repeated. Each task queue can be described by a four tuple where v s and v d represent the target node and start node of the task queue respectively, and R ts represents the detailed information of the task queue to be unloaded, including the task type and the data dependency between tasks. In this paper, we assume that the transmission bandwidth between tasks in the same queue is same, and the value is b ts . cr t represents the number of CPUs required to complete task t, and ct t represents the throughput of task t.

Energy Consumption Model
The energy consumption model of edge network constructed in this paper includes the energy consumption of edge server and the energy consumption of physical link transmission traffic.
In addition to the energy consumption of computing tasks, the energy consumption of storage devices and communication devices on the edge server is also considerable. Therefore, the energy consumption of the server is modeled as the power consumption of the server starting up and processing the unloading task. The former is the energy required by the edge server to maintain its normal operation, which depends on whether there are tasks deployed on the edge server, regardless of the number of deployment tasks. The latter is positively correlated with CPU utilization. We use N v t to indicate the number of tasks t deployed on edge server v.
Since the energy consumption of the edge server is positively related to the CPU utilization, the processing energy consumption of edge server is calculated as follows: p s b is start up energy consumption, and p s h is the energy consumption when the edge server is running at full load. Therefore, the energy consumption p v of the edge server can be expressed as: In the above formula, min 1; Similarly, the physical energy consumption in edge network also includes the power consumption of switches on links and the transmission energy consumption when the link transmits the traffic between servers. The former depends on the power on state of the switch on the link, and latter depends on the bandwidth utilization of the physical link. In this section, r l is used to represent the bandwidth utilization of link l. the calculation results are as follows: indicates that task f on task queue ts is deployed on node v and task g is deployed on node u. y uvl ts indicates whether the task ts deployed on link l passes through node v and node u in turn.
Based on the above analysis, the total energy consumption of l can be calculated as: represents the power consumption of the switch on the link, and 2 0; 1 f g indicates that the power consumption cannot be calculated repeatedly. Once a task is unloaded on the server, the switch must be kept on. p l h represents the energy consumption of the link at full load. Hence, the total energy consumption of the edge network can be expressed as:

ILP Model
In this section, after considering the constraints of processing sequence of tasks in the queue, computing capacity constraints, and network bandwidth constraints, the unloading problem of terminal tasks is established as an ILP model with energy consumption as the optimization objective.
Firstly, to meet traffic constraints in the process of task unloading, this section assumes that a task sequence to be unloaded is ts, t i and t j are two tasks to be unloaded in ts, and task t j must be executed later. The above constraints are expressed as follows: We assume that the tasks in the task sequence can only be unloaded on one edge server: C2 : The tasks in sequences to be unloaded must be unloaded according to the dependency relationship between tasks. The mathematical formula is used to describe the following formula: w vfug ts ¼x fg ts (9) In addition, this section also considers the computing capacity constraints of edge servers and the bandwidth constraints of physical links.
C4 : In conclusion, the ILP model for Energy Efficient Task Offload (EETO) problem can be expressed as follows: 4 Proposed Algorithm

Markov Decision Process for Task Offloading
Markov chain is a probability model, the future state is only related to current states. Markov decision process (MDP) is a decision process based on Markov chain. MDP can be represented by a five tuple S; A; P; R; c ð Þ . S is the state space observed by agent; A represents the action space; P is the set of transition probabilities, the finite set of probabilities that an agent enters a specific state after executing action a i 2 A in a certain state. R represents a set of immediate rewards after performing the action; c is the discount coefficient.
An edge server in the edge network can be used as an agent to obtain the available resources and task unloading information of the edge network topology through the perceptron installed in physical network. In a certain state, the environment state after agent performs the action is only related to current state, independent of the historical state, and has no aftereffect. Therefore, the edge task unloading problem can be expressed as an MDP model. The problem of edge task offloading based on MDP is presented as follows: State space S: for s l 2 S, and s l ¼ U cpu l ð Þ; U bw l ð Þ , which indicates the bandwidth resource utilization of CPU and physical link of each edge server in the edge network when l edge task has been unloaded.
Action space A: a l 2 A means that the agent selects an edge server in the edge network according to the specific strategy and the current state s l , then deploys the l þ 1 task in the task queue.
Action execution function: step s l ; a ð Þ ¼ r l ; s l 0 ; l 0 , This function represents the immediate reward r l , subsequent state s l 0 and the number of edge tasks that have been successfully unloaded. If edge task satisfies the constraints (7)-(10) in the process of unloading, it means that the task can be successfully unloaded to the edge network. Reward function Reward s l ; a l ð Þ represents the immediate reward obtained by the unloading action a l in the state s l . The goal of this paper is to reduce the energy consumption in the edge network, therefore, reward function in this paper can be expressed as follows: Reward s l ; a l ð Þ ¼ N À p total (12) In above formula, the source of action a l is from policy p. p is a mapping from state space s l to action space a l : The optimization goal of the MDP model established in this paper is to get an optimization strategy, it maximizes the goal of reinforcement learning-the expectation of cumulative return value: c t is discount factor, and its value decreases with time.

PPO Algorithm Framework
Because the environment of edge computing network is complex and changeable, to learn in this challenging environment, it is necessary to use a reliable and scalable intelligent algorithm [15]. Because PPO algorithm guarantees stability by binding the range of parameter update to the trust area, this paper considers using this algorithm to complete the unloading of edge tasks [16].
PPO algorithm is a deep reinforcement learning algorithm based on actor-critic framework. Its architecture contains two actor networks, Actor 1 and Actor 2. Actor 1 represents the latest policy p, which guides the agent to interact with the environment. Critic evaluates the current strategy according to the reward, then updates the parameters in the critic network through back propagation of the loss function. Actor 2 stands for the old strategy p old , after the agent trains a certain number of steps, it uses the parameters in Actor1 to update Actor2. Repeating above process until PPO algorithm converges, we get a trained edge task unloading model based on Actor-Critic framework.
h old and h new respectively represent the strategy parameters before and after the update, a represents the update step size, and r h J is the objective function gradient. The key of the policy gradient algorithm is the update step size. If the update step size is not selected properly, the algorithm may collapse. PPO decomposes the return function into the return function corresponding to the old strategy plus other items. Once the other items in the new strategy are greater than or 0, the return function can be guaranteed to be monotonous.
A p s t ; a t ð Þ is the dominance function. The calculation of the dominance function is shown in the following formula.
The policy update formula of PPO is shown in the above formula, but there is a problem that the super parameter b is difficult to determine. PPO considers another method to limit the update step size of the policy.
When policy does not change, r t h ð Þ ¼ 1. and PPO algorithms consider using clipðÞ to limit the similarity between the old and new policies. The improved policy update method is shown in the following formula.

PPO Algorithm Implement
To optimize the energy consumption in the process of edge task unloading, algorithm designed in this paper mainly includes the following three modules: 1) construction of edge network environment and parameter setting; 2) edge task unloading model training; 3) output of energy consumption aware unloading scheme.
As mentioned in 4.2, the actor network of PPO algorithm designed in this paper is composed of two neural networks, Actor 1 and Actor 2. Actor 1 guides the agent to interact with the environment, obtains transfer samples and caches them. The policy parameters in Actor 2 represents the old policies. After a period of iteration, the parameters in Actor 1 will be used to update the parameters in Actor 2. The critic network consists of a neural network. Training steps of unloading model are as follows: Step 1: Input current state into Actor 1, and the agent selects an action based on a l ¼ p s l ð Þ. After repeating the above process, the agent continuously interacts with the edge network for T time steps, collects the historical interaction information and caches it.
Step 2: Use (17) to calculate the advantage function of each time step.
Step 3: Use the following formula to calculate the loss function of the critic network, update the critic network parameters f according to the function back propagation.
Step5: Repeat step 4. After a certain step, use the network parameters in Actor1 to update the parameters of Actor 2.
Based on above analysis, the edge task unloading algorithm based on PPO is described in the following table.

Simulation Setting
In this section, two kinds of network topology are used to verify the algorithm proposed in the previous chapter. First network topology is composed of five edge servers and eight physical links. Second physical topology consists of 8 edge servers and 12 physical links. To verify the energy optimization performance of the task offload algorithm proposed in this paper, the energy consumption of 10, 15, 20 … 60 task queues on the terminal device is simulated and measured. We assume that three types of terminal tasks need to be unloaded at the edge network, and detailed parameters settings of each task are shown in the Tab. 1. We also assume that the bandwidth of each physical link is 1000 Mbps, and the available computing resources of each edge server are 9 or 10 CPUs. Task types in each task queue are randomly selected from the above three types of tasks and consist of at most three tasks. The number of tasks in each task queue is evenly distributed between 2 and 3. In addition, this paper also assumes that the required bandwidth of all task queues is evenly distributed in the range of [40, 50 Mbps]. The startup energy consumption and full load operation energy consumption of the server and physical link are set to 170 and 800 W, 100 and 600 W respectively. The task parameters used for simulation are shown in Tab. 2.
This paper uses the following four indicators to evaluate the proposed algorithm: 1. Total energy consumption of the network: including energy consumption when processing terminal tasks and energy consumption in the communication process. 2. Number of CPUs: the total number of CPUs consumed when the offload task is executed at the edge network. 3. Physical network bandwidth: the total Table 1: Edge task offloading algorithm Algorithm: Edge task unloading algorithm based on PPO Input: Initial state of the edge network and tasks queue to be unloaded Output: Edge task offloading scheme 1. // 1) Build edge network environment and set parameters 2. Initialize the edge network environment and queue to be unloaded, and set the super parameters in PPO algorithm 3. // 2) PPO algorithm based on AC framework is trained 4. Initializes the interactive cache, which is used for historical information collection To verify the effectiveness of the proposed algorithm. In this section, the algorithm (PPO_EM) proposed in the previous section is compared with random algorithm and task unloading algorithm based on PPO but without considering start up energy consumption (PPO_NEM).

Simulation Results
PPO_EM is implemented on a computer equipped with inter (R) core i5-9300 h and 16 g memory. The program running environment is Python 3.7.4, Tensorflow 1.15.0; Fig. 2 shows the convergence of PPO_EM in the training process. During training process, the number of task queues to be deployed is set to 80, learning rate of actor network and critical network are set to 0.0001, and reward discount coefficient is set to 0.9. The parameters of Actor 2 are updated with the network parameters in Actor 1 every 15 steps. As can be seen from Fig. 2 at the beginning of training, the results of PPO_EM algorithm fluctuate due to the randomly selected task deployment scheme, but with the increase of training times, reward function gradually converges to the optimal value at about 150 steps.
Energy Consumption: Figs. 3a and 3b show the network energy consumption when the task unloading strategy is executed in topology 1 with 5 nodes, 8 physical links, and topology 2 with 8 nodes and 12 physical links, respectively. Compared with random algorithm, algorithm designed in this paper can save 22.69% energy on average when executing task unloading. The reasons are as follows: Random algorithm only considers the network topology, does not consider the factor of network energy consumption. When  building the MDP based optimization model, the algorithm designed in this paper designs the reward function of each step from the perspective of energy consumption, realizes the joint optimization of server energy consumption and physical link, and minimizes the total energy consumption in the process of task unloading. Compared with the PPO_NEM which does not consider the power consumption of power on, the algorithm designed in this paper only starts when the task is deployed on the corresponding edge server, so it is better than the two comparative algorithms in energy consumption.
Offloading Success Rate: Figs. 4a and 4b show the success rate of task unloading in topology 1 with 5 nodes, 8 physical links, and topology 2 with 8 nodes and 12 physical links. It can be seen from figures that with the increase of the number of task queues to be deployed, the success rate of unloading tasks based on the three algorithms decreases. Because the number of tasks in the queue is randomly distributed between 2 and 3, it is possible that the deployment task queue increases, but the overall required resources decrease. This can explain the phenomenon that the unloading success rate increases slightly with the increase of task queue in the simulation diagram. It can be seen from the figure that the task unloading algorithm based on PPO designed in this paper has a better unloading success rate than the random algorithm. The reason may be that the reuse rate of the same type of tasks in the random algorithm is low, the edge network resources are limited, and the repeated deployment of tasks consumes more computing and bandwidth resources, thus affecting the task unloading success rate.
Consumed Bandwidth: Fig. 5 shows total link bandwidth cost when unloading tasks in the network topology with the number of physical nodes being 5 and the number of physical links being 8. It can be seen from the above figure that with the increase of the number of task queues to be unloaded, the total amount of bandwidth consumed in the network is also increasing. What's more, it can be seen from the figure that the algorithm designed in this paper consumes the least network bandwidth Computer algorithm consumes the most network bandwidth on the edge network. The reasons are as follows: Although the random algorithm uses the shortest path connection when connecting the edge servers deployed with adjacent tasks, the edge servers are randomly selected during the unloading process. Therefore, compared with the PPO_EM algorithm which considers the link energy consumption, the algorithm designed in this paper has less hops and less network bandwidth consumption when implementing the routing between adjacent tasks. Consumed CPUs: Fig. 6 shows the total number of CPU consumed by the edge network when the task is unloaded in the physical topology with 5 physical nodes and 8 physical links. It can be seen from Fig. 6 that when the three offload strategies are implemented in the edge network, the number of CPU consumed in the network increases with the number of task queues to be deployed. From the overall trend of CPU consumption, the CPU consumption of the task offload strategy designed in this paper is slightly better than that of the algorithm without considering power consumption, and it is far better than the algorithm based on random policy for task unloading. The reasons are as follows: because the random algorithm randomly selects the edge servers to be deployed, the same type of tasks in different queues need to be deployed repeatedly. In contrast, PPO_EM improves the utilization of the same task type by aggregating task queue requests. Therefore, the random offload strategy consumes more CPU.

Conclusion
In this paper, we focus on the optimization of energy consumption of edge server in the process of task unloading. To improve the accuracy of server energy consumption evaluation, we first construct a server energy consumption model including both startup energy consumption and processing energy consumption, then we describe the model as an optimization problem model for energy consumption optimization. Then, a task unloading strategy based on PPO is proposed to solve the approximate optimal task unloading scheme. Simulation results shows that compared with the random algorithm, the proposed algorithm can save 22.69% energy on average.
Funding Statement: This work was supported by State Grid Corporation of China science and technology project "Key technology and application of new multi-mode intelligent network for State Grid" (5700-202024176A-0-0-00).

Conflicts of Interest:
We declare that we have no conflicts of interest to report regarding the present study.