With the development of the Industrial Internet of Things (IIoT), end devices (EDs) are equipped with more functions to capture information. Therefore, a large amount of data is generated at the edge of the network and needs to be processed. However, no matter whether these computing tasks are offloaded to traditional central clusters or mobile edge computing (MEC) devices, the data is short of security and may be changed during transmission. In view of this challenge, this paper proposes a trusted task offloading optimization scheme that can offer low latency and high bandwidth services for IIoT with data security. Blockchain technology is adopted to ensure data consistency. Meanwhile, to reduce the impact of low throughput of blockchain on task offloading performance, we design the processes of consensus and offloading as a Markov decision process (MDP) by defining states, actions, and rewards. Deep reinforcement learning (DRL) algorithm is introduced to dynamically select offloading actions. To accelerate the optimization, we design a novel reward function for the DRL algorithm according to the scale and computational complexity of the task. Experiments demonstrate that compared with methods without optimization, our mechanism performs better when it comes to the number of task offloading and throughput of blockchain.
With the rapid development of fifth-generation (5G) communication technology and the Industrial Internet of Things (IIoT), a growing number of end devices (EDs) with sensing and monitoring capabilities are being connected at an unprecedented rate [
Mobile edge computing (MEC) is a widely considered solution that can meet the above requirements by using offloading algorithms [
To resolve the security issues, researchers introduce blockchain to build a secure environment for data offloading in the IIoT systems with MEC [
Many applications are highly sensitive to time and data security in IIoT scenarios [ We design a comprehensive scheme that integrates MEC servers, controllers, and blockchain into IIoT systems for efficiency and data security. At the same time, because the offloading action is related to blockchain, we also consider chain optimization. By fully considering the important features of IIoT, MEC servers, and blockchain, as well as the adjustable factors of the integrated systems, we define the optimal decision-making problem as a Markov decision process (MDP). To better handle computing tasks, we propose a novel reward function by taking task difficulty and scale into account. Due to the complexity of the issue, we adopt a deep Q-learning-based (DQN) approach to make offloading decisions dynamically, including validation node selection, block size adjustment, task-offloading server choice, and block interval adjustment. The aim is to optimize the overall performance of the task offloading systems combined with MEC and blockchain. The simulation demonstrates that the proposed scheme addresses more computing tasks than other schemes under security.
The remainder of this article is organized as follows. In Section 2, we present the related works about the integration of IIoT systems, MEC, and blockchain, as well as the related methods to solve optimization problems with deep reinforcement learning (DRL) algorithms. The system framework and formulation of the offloading model, as well as the DRL-based solution, are presented in Section 3. In Section 4, simulation results are exhibited and analyzed. Finally, Section 5 concludes this article with an overview of proposed future work.
IIoT uses smart EDs, fast protocols, and efficient cybersecurity mechanisms to improve industrial processes and applications [
Data is vulnerable during transmission, and the information stored in MEC servers may result in leakage, theft, and breach. Miller [
DRL is one of the most suitable methods for optimizing decision-making policy and maximizing long-term rewards. So, it has attracted many researchers to make use of the algorithm to solve task offloading problems for IIoT systems with MEC and blockchain. Qiu et al. [
In this section, we introduce the system framework. Then, we express the offloading optimization problem in formulas. Finally, we present the DRL-based solution to the problem.
Generally, a MEC server only performs data computing in the task offloading framework. However, to verify the data consistency during the offloading process, we randomly select MEC servers to form the blockchain and perform data verification.
In the IIoT layer, we assume that IIoT EDs are divided into
The MEC layer consists of
There is a cloud computing server in the cloud layer. The state of the cloud computing server at timeslot
The overall time cost of offloaded tasks is an important metric in evaluating task offloading decisions. For tasks
We consider that the channel gain between controllers and servers is diverse and let
Since these servers use a first-in-first-out (FIFO) queue, the queuing time of the task on each server is the time to execute all earlier tasks in the buffer. Consequently, the queuing time and task execution time of tasks
For simplicity, we only take block generation time
The validation time is largely influenced by the consensus algorithm. By comprehensively considering the efficiency and security of the system, we adopt the practical Byzantine fault tolerance (PBFT) algorithm in our architecture. The process of PBFT consists of the request, pre-prepare, prepare, commit and reply stages, as presented in
The PBFT consensus algorithm requires each validation node to exchange a large number of messages with others, thus guaranteeing the high security of the blockchain system with the cost of communication efficiency. Among the nodes are one primary node and many replica nodes. In the request phase, the primary node verifies every transaction and pickles it into a block. In the pre-prepare phase, the primary node sends the block to all replica nodes, and the replica nodes confirm the receipt. In the preparation phase, all replica nodes deliver the results they received in the last stage to other nodes. Meanwhile, every replica node should validate the message authentication code (MAC) and generate an identical MAC for delivery. In the commit phase, all nodes send confirmation messages to other nodes. If more than 2/3 of the messages have been agreed upon, the node sends the results to the client in the reply phase.
Then, we carry out a quantitative analysis of the latency in the consensus process. We assume that block size is
According to the five stages of the consensus process, the latency of block message delivery is:
To ensure security, we include blockchain in the architecture. Therefore, the entire assignments of the MEC server and the results of the computing tasks must be recorded for trusted traceability [
The key factors that impact the throughput of blockchain include block size
Additionally, it is important to note that the goal of this architecture is to solve as many complex computing tasks as possible and ensure all assignments/results can be validated, so once the tasks are completed, they must be recorded on the chain. In other words, the effective throughput of the entire system is only measured by the number of completed and recorded tasks. Therefore, the throughput of the system is limited by two aspects: the number of tasks executed by MEC servers and the record performance of the blockchain. Regarding the performance of the blockchain, each block in the blockchain contains two parts. The first is the task allocation record and the corresponding calculation results. The second is the redundant structure, such as a pointer and a block header. So, the calculation of effective throughput involves two aspects. First, we count the number of tasks completed, denoted by
Since the designed framework integrates blockchain, controllers, and MEC servers in addition to traditional IIoT systems, the states and actions are complicated and dynamic. Specifically, there are a large number of computing tasks with different scales, costs, and tolerant time in every time step that should be assigned to a computing server. Additionally, the server significantly affects queuing time and task execution time according to
Obviously, the state of the system is dynamic every time step. Any action may affect the state in the next time step and rewards in the future. Therefore, based on the characteristics of the system, we describe the optimal decision-making problem as a discrete MDP by defining the state space, action space, and reward function.
In each time step, the agent learns action value or policy from experience by observing the state. In some scenarios, before making a decision, the agent observes the current state, defined as a set by timeslot
Facing a complex environment, the system is required to make decisions and adjust the scheme for maximum reward in the future at every time step. We must thoroughly consider not only task-offloading behaviors but also factors that promote the throughput of blockchain and then create the action space. The action space at decision epoch
In this system, our initial goal is to improve the efficiency of task offloading under security conditions, and that involves two aspects. On the one hand, tasks must be assigned to servers rich in channel resources and computational power to ensure that they may be accomplished in a short time and make profits. On the other hand, because any allocation of computing results is required to reach consensus on the blockchain when the data size exceeds the throughput of the blockchain, the data must be put into a data pool and wait for another chance to reach consensus. The batch of the tasks cannot be counted in the effective throughput of the whole system. In sum, the decisions made by our model must take task assignments and blockchain throughput into account. Once the scale of data to be recorded on-chain is larger than throughput, part of the data cannot be recorded in time. On the contrary, if the scale of the data is smaller than the record capabilities of blockchain, it leads to inefficient space resource use.
In conclusion, our final reward consists of two parts: the information on computational tasks that are computed at the current time and the effective throughput of the entire system at the current time. To better measure completed tasks, we first get the task score of a batch of tasks by multiplying the scale (data size) and difficulty (required number of CPU cycles). To accurately measure throughput, we take the smaller value of the number of computing tasks completed by MEC/cloud servers with and throughput of the blockchain. We set the reward by multiplying the average score of finished tasks and the throughput of the entire system.
Moreover, when designing a reward function, we should consider positive and negative rewards. Here, we set two constraints when the state of the system out of which the model will be punished. First, each server has a buffer limit, and each task has a maximum tolerance time. When the number of remaining tasks in the buffer of a server is greater than the limit or the overall time of finishing tasks (including result recorded) is longer than the maximum tolerance time, we punish the final reward by subtracting tasks scores from the original reward in case of unreasonable assignment. Second, when the overall time cost of offloaded tasks does not meet
Because the system involves high-dimension features and large-scale actions, we resort to the DQN approach to deal with the MDP problem. While running the DQN algorithm, the agent frequently interacts with the environment, finally learning a smart optimization strategy based on action values. The DQN, which evaluates the value of discrete actions by neural networks based on consecutive and complicated observations, is one of the most famous action-state value functions in DRL. The value and update of action-values are defined as follows:
To prevent temporal correlations issued by continuous states in the system from affecting optimization, we introduce experience replay into our approach. In the system, we focus on which agent action may acquire a maximum long-term reward in the current state, which is unrelated to previous states. Therefore, it is necessary to reduce the relevance among the states learned by the model. To achieve this, the agent stores every pair
Thus, the loss function is denoted as:
In this section, we first introduce our simulation environment and significant system parameters, as well as details on the DQN algorithm. Then, we present and analyze the simulation results with different parameters and conditions.
In the experiments, the DQN algorithm is implemented by TensorFlow2.2.4, a widely used framework for deploying deep learning models. Our software environment is Python 3.6.9 based on the Ubuntu operating system. The DQN algorithm in our system is shown as
In this simulation, there are five cells and five computational servers in the blockchain-empowered IIoT scenario, including four MEC servers and one cloud server. Among all MEC servers, we set three validation nodes. Meanwhile, we allocate fixed computational resources and buffer to every server and set communication rates between different servers and controllers. Some of the parameters are given in
Parameters | Value |
---|---|
Number of MEC servers |
4 |
Number of cloud server | 1 |
Number of cells and controllers |
5 |
Number of validation nodes |
3 |
Computational power of servers |
60–80 GHz |
Communication rate between servers |
10–20 Mbps |
Communication rate between server and controller |
1–10 Mbps |
Computing cost for verifying signatures and operating MACs |
2 MHz/3 MHz |
Servers’ buffer limit | 4–6 |
Average transaction size |
30 KB |
To better measure the performance of our framework, we designed ablation studies by comparing the proposed framework and the following schemes:
Proposed framework without block size adjustment Proposed framework without block interval adjustment Proposed framework without validation nodes choice Proposed framework with task offloading node choice randomly
The real IIoT environment is complex, and applications have various requirements for data size and recording time. To simulate the real IIoT scenario and verify the effectiveness of the scheme proposed in this paper, we design experiments with different factors ω and transaction sizes. Then observe whether the scheme can make use of resources according to states under as many scenarios as possible.
Effective throughput is an important metric for the scheme performance in applications, which can partially reflect the ability of different methods to process data with different scales.
The aim of our method is to solve computing tasks fast, so to better verify the performance of the optimization scheme, we generate some batches of tasks and set a value about computing circles that need to be executed according to the tasks. Then, we record the time consumption of all schemes to achieve the value. It’s noted that the scores generated by the actions that break the system constraints are calculated in the results.
In this paper, we propose an optimization scheme of trusted task offloading to ensure data security and efficiency in a dynamic IIoT scenario based on a DQN algorithm. The system performance considers overall latency and the throughput of the blockchain. We formulate the task allocation decision and blockchain performance optimization problem as an MDP. To solve the high dimensional and dynamic problem, a DQN-based approach is proposed to dynamically select actions and adjust parameters to achieve the optimal general performance of MEC and blockchain systems while the security and integrity of data can be guaranteed. The simulation results show that the proposed scheme can significantly improve the system’s performance. In future work, we will study superior verification node selection strategies and optimize the DQN algorithm to solve the multi-objective optimization problem of more action combinations. Meanwhile, the states of a real scenario is more complex, so we will improve our scheme in a real IIoT scenario.
I would like to thank Professor Mei Song for her important comments, Professor Yinglei Teng for her meaningful guidance on the architecture and my friend Ximing Xing for his contribution to the paper.