Self-Adaptive Fault Recovery Mechanism Based on Task Migration Negotiation

Ruijun Chai; Sujie Shao; Shaoyong Guo; Yuqi Wang; Xuesong Qiu; Linna Ruan

doi:10.32604/iasc.2021.013373

[BACK]

Intelligent Automation & Soft Computing DOI:10.32604/iasc.2021.013373
Article

Self-Adaptive Fault Recovery Mechanism Based on Task Migration Negotiation

Ruijun Chai1, Sujie Shao1,*, Shaoyong Guo1, Yuqi Wang1, Xuesong Qiu1 and Linna Ruan2

1State Key Laboratory of Networking and Switching Technology, Beijing University of Posts and Telecommunications, Beijing, 100876, China
2The Cloud Computing and Distributed Systems Laboratory, School of Computing and Information Systems, The University of Melbourne, VIC3010, Australia
*Corresponding Author: Sujie Shao. Email: buptssj@bupt.edu.cn
Received: 30 August 2020; Accepted: 19 November 2020

Abstract: Long Range Radio (LoRa) has become one of the widely adopted Low-Power Wide Area Network (LPWAN) technologies in power Internet of Things (PIoT). Its major advantages include long-distance, large links and low power consumption. However, in LoRa-based PIoT, terminals are often deployed in the wild place and are easily affected by bad weather or disaster, which could easily lead to large-scale operation faults and could seriously affect the normal operation of the network. Simultaneously, the distribution characteristics of outdoor terminals with wide coverage and large links lead to a sharp increase in the difficulty and cost of fault recovery. Given this background, this paper proposes a self-adaptive fault recovery mechanism for PIoT terminals based on task migration negotiation. Firstly, based on the terminal fault type and service category assessment, a selection strategy of a candidate neighbor terminal or a terminal set is studied to deal with the fault recovery problem among two scenarios: the same rate and the boundary of the rate change, while considering the adaptive characteristics of the LoRa data rate. Secondly, the adaptive terminal task migration negotiation mechanism is discussed. Then, a novel Terminal Fault Self-Adaptive Recovery (TFSR) algorithm is proposed. Simulation results show that, compared with the Genetic Algorithm (GA) and Discrete Particle Swarm Optimization (DPSO) Algorithm, our proposed algorithm can maintain a higher fault recovery rate and a lower task recovery cost in the case of frequent faults.

Keywords: Fault recovery; task migration negotiation; adaptive; LoRa; power Internet of Things

1 Introduction

With the continuous development of smart grids and the Internet of Things (IoT) [1,2], the PIoT (PIoT) has become a promising network technology in the smart grid. Massive sensing terminals are deployed in PIoT to comprehensively collect environmental, network and equipment data. Low-Power Wide Area Network (LPWAN) is more suitable for deploying large-scale IoT terminals due to its low power consumption, long-distance and large number of links [3–5]. As a popular LPWAN communication technology, Long Range Radio (LoRa) has a wider propagation range than others under the same power consumption, so it is preferentially adopted in PIoT sensing networks.

The PIoT sensing network based on LoRa can deploy a large number of terminals over a wide area [6,7]. However, for the complex and changing outdoor environments, sensing terminal faults occur frequently, which is not tolerated for the continuous and effective operation of the PIoT sensing network. Since manual recovery is a labor-consumed method, the self-adaptive fault recovery mechanism of terminals, which outperforms on time and resource saving, is deemed as a potential solution. LoRa is suitable in this solution, where the terminals are directly connected to the LoRa gateway, which can quickly detect faults and negotiate with other terminals to ensure the recovery efficiency of tasks and data.

The fault recovery mechanism of the LoRa Wide Area Network (LoRaWAN) is different from that of the Wireless Sensor Network (WSN). Considering the adaptive data rate mechanism of LoRa, the self-adaptive terminal fault recovery mechanism can be divided into two different cases: same-rate recovery and variable-rate recovery. If terminals have the same data transmission rate, the scheme priority is calculated from the migration energy, communication load and the number of sensor types. Then, the optimal neighbor terminal or terminal set can be selected. At the boundary where the terminal data transmission rate changes, the recovery of the fault terminal can either choose the higher rate terminal or speed up the low-rate terminal. In this case, the rate is further considered in the priority calculation to balance the low latency and increased energy consumption.

Based on the above analysis, this paper proposes a self-adaptive terminal fault recovery mechanism based on task migration negotiation, with the aim of recovering frequent faults of PIoT sensing networks in complex and variable environments, improving self-adaptive fault recovery capability of network and ensuring the business continuity. Specifically, the main contributions of this paper are summarized as follows:

A mechanism for judging terminal fault types and service categories based on the LoRa gateway is proposed. Gateways and servers periodically check data to determine if data is missing or abnormal.

A self-adaptive terminal fault recovery algorithm is proposed to find the optimal neighboring terminal or terminal set to recover the IoT sensing task, with the energy consumption, communication resources, rate and number of sensor types taken into account.

A task migration negotiation mechanism is formulated. LoRa gateway performs task assignment and data migration by judging the candidates’ actual status to realize the self-adaptive recovery of terminal faults.

The rest of this paper is organized as follows: Section 2 introduces related work. The self-adaptive fault recovery mechanism based on task migration negotiation is established in Section 3. Section 4 proposes the TFSR algorithm. Simulation results and analysis can be found in Section 5. Section 6 concludes this paper.

2 Related Work

In terms of LPWAN fault handling, current research mainly focuses on traditional sensor fault detection and recovery methods, with less research on the fault recovery mechanism that incorporates LPWAN characteristics. Reference [8] uses LoRa wireless mesh topology for forest fire monitoring to solve the transmission delay of previous forest fire information. Reference [9] proposes a LoRa mesh network system for wide-area monitoring in IoT applications, which improves the communication range and data transmission rate of the gateway. Reference [10] describes the composition and role of a typical LoRaWAN system, discusses the characteristics of LPWAN construction and demonstrates some of the advantages of LoRaWAN technology through a large number of network tests and application cases in different environments. Based on LoRa technology, Reference [11] uses star and chain networks for self-organizing network design and builds an intelligent meter reading system with long communication distance and resistance to multiple interference sources in a complex network environment.

At present, there is a wide variety of fault detection and recovery mechanisms for wireless sensor networks. The network in [12] is divided into virtual cell grids used a cellular architecture and performs fault detection and recovery in the grid with minimum energy consumption. Reference [13] assigns nodes corresponding credit ratings by calculating the difference between the predicted and measured values and proposes a fault recovery algorithm for opportunity credibility. There is also researches on fault recovery through gradient diffusion algorithm and genetic algorithm [14,15], which works on reducing the energy consumed for fault recovery. Reference [16] analyzes and compares similar WSN recovery algorithms mainly in terms of energy efficiency, scalability and network type. However, the above references are all about terminal recovery strategies in the WSN scenario, while the techniques for recovering terminal faults in the LoRa network are less.

In terms of task negotiation, Reference [17] introduces a node sleep scheduling mechanism based on network coverage, which reduces the energy consumption of the network and ensures the monitoring range of the network. Reference [18] introduces the obvious advantages of LoRa over other LPWANs and elaborates on the concept and process of the LoRa adaptive rate mechanism. Reference [19], in its study of collaborative methods for wireless sensor networks, proposes network task allocation based on dynamic negotiation and combinatorial auction.

3 Problem Description

3.1 Terminal Fault Type and Service Category Judgment

The fault types of PIoT sensing terminals are mainly divided into two categories: Sensor module fault, communication module and other components fault. In the case of sensor module fault, the terminal communication module and other components are considered normal. It means that the data missing or abnormal data collection is caused by the sensor module. Hence, the corresponding type of data that needs to be recovered can be obtained. If other components such as communication modules fail, it is considered that the terminal cannot perform the data sensing function and data service supported by all sensor modules of the terminal needs to be recovered.

The sensor module fault can be judged by missing or abnormal data from the terminal [20]. Missing data can be discovered by statistical methods over a fixed period and abnormal data can be found by a comparative analysis of historical data. Faults of other components, such as communication modules, can be detected by the network server via determining whether data is reported from that terminal within the same period.

3.2 Candidate Neighbor Terminal or Terminal Set Selection

After determining the type of data services supported by the fault sensing terminals that need to be recovered, this paper classifies the recovery scheme into single neighbor terminal recovery or neighbor terminal set recovery from the perspective of the number of candidate terminals. The optimal recovery method is selected by comparing different candidate solutions. The first task in fault recovery is to select a candidate set of recovery terminals. Under the condition that the terminal can be recovered, the credibility images of the candidate terminal images is used to filter the set of candidate terminals, with images being the lower bound of the credibility of the candidate terminal.

The credibility of terminal images is calculated as follows:

images

images and images are the mean and variance of the nearest images data at the current moment, images and images are the mean and variance of the nearest images data at the previous moment. images and images are the thresholds for the change in variance and mean to determine the effect of the change in variance and mean at the beginning and end of the moments on the increase or decrease of the credibility value. images is the magnitude of the increase or decrease. Set the initial credibility of all terminals as 1 and iteratively calculate the credibility images of the terminal images several times according to Eq. (1).

When a single terminal can recover multiple types of tasks, the same selection method is adopted for a single type of sensor data task recovery. When the number of terminals is more than one, the terminal set is considered as few terminals as possible. The credibility images calculation method of the terminal set images is as follows:

images

3.2.1 Same Spreading Factor

The model with the same spreading factor (SF) is shown in Fig. 1. Each self-organized network contains a monitored point images , several sensing terminals images . The sensing terminals include normal working terminals, fault terminals and dormant terminals. The dotted line in the network indicates the neighbor relationship between terminals in the network. Since all the terminals have the same data transmission rate, only energy consumption, the number of sensor types and communication load will be taken into consideration when calculating the priority of the candidate terminals.

images

Figure 1: Network model with the same SF

3.2.2 Boundary of Spreading Factor

Considering the adaptive rate mechanism of LoRa, sensing terminals at the boundary can reduce the communication delay by speeding up the data rate, but it also brings additional energy consumption to the terminal. Therefore, how to balance the pros and cons is also a problem in the optimization model. The model is shown in Fig. 2.

images

Figure 2: Network model at the boundary of the SF

When recovering a fault terminal, the information of neighbor terminal images needs to be considered: energy images required to recover the terminal fault, the proportion of sensor types images of images , the communication load images of images , the percentage increase in data transmission rate images . Based on this information, the recovery priority images of the candidate neighbor terminal is calculated. The selection optimization model is constructed as follows:

images

images represents the coefficient used to calculate energy consumption through distance. If images equals images , it means there is a terminal that needs to be woken up, while images means not. Similarly, if images is images , it means there is a terminal at the boundary and a rate change is required, while images means no change is needed.

LoRaWAN mainly uses the 125 kHz signal bandwidth. The function representing the relationship between data rate and SF is obtained by Eq. (5). Among them, images represents the signal bandwidth, images represents the coding rate.

images

3.3 Task Migration Negotiation

The task migration negotiation mechanism aims to adaptively assign tasks to the neighboring terminals of the fault terminal. This process is implemented through the LoRa gateway in the self-organizing network, as shown in Fig. 3.

images

Figure 3: The process of task migration negotiation

When the sensor module or communication module of the terminal fails, the LoRa gateway will analyze the data uploaded by the terminal, find abnormality and report the terminal fault information to the server. The server will analyze the candidate neighbor terminals, select the optimal result by the algorithm and send the recovery solution to the LoRa gateway. Then, the LoRa gateway sends the task migration command to the candidate terminal or the terminal set.

LoRa usually uses Over-The-Air Activation (OTAA) as the terminal access method when it is necessary to activate the dormant terminal to recovery a task. Once the terminal intends to apply for access to the network, it sends an access request and waits for approval from the server. In this way, the dormant terminal will be activated to join the network. When the terminal needs to speed up, the LoRa gateway sends a speed change command to the terminal to change the data rate by modifying the SF.

The LoRa gateway negotiates according to the real-time status of the terminal. When the selected terminal is insufficient to support the recovery task due to sudden fault or insufficient energy, etc., the terminal reports back to the LoRa gateway. The LoRa gateway will notify the server that the task assignment has failed. At this time, the server needs to reassign normal terminals for fault recovery or directly migrate the task using the suboptimal solution.

4 Our Proposed TFSR Algorithm

In order to find the optimal recovery solution that can recover the data tasks of the fault terminal, the recovery process is divided into two stages: Dynamic adjustment coefficient and network fault recovery. The information from the fault terminal images and the neighboring terminal images is used to find the candidate set which has higher credibility than the lower limit. Normalization coefficients vary dynamically according to the cost of recovering from terminal faults using various resources. The dynamic normalization coefficient is used to calculate the current fault recovery rate until the new fault recovery rate images is less than the old images or the maximum number of iterations is reached. Then, the optimal recovery scheme of the terminal fault is calculated by using the optimal coefficients.

images in Eq. (7) is the number of sensors in the terminal. The variables of the terminal set are calculated as follows:

images

After calculating the values of variables in the current candidate set, which variables account for the higher cost of fault recovery can be obtained. The average cost images can be calculated as follows:

images

For the optimal selection strategy described above, the fixed normalization coefficient will cause insufficient generalization of the model. The normalization coefficients are dynamically adjusted according to the actual proportion of energy consumption, the number of sensor types, the communication load and the percentage increase in data transmission rate in the network. The coefficient images can be iterated as follows:

images

Algorithm 1: Terminal fault self-adaptive recovery algorithmn

images

5 Simulation Results

In this paper, MATLAB is used for simulation. Within the range of images , images terminals are uniformly deployed to simulate the LoRa network at the SF boundary. The left half of the terminals have images , the right half of the terminals have images . Within the range of images , images terminals will be deployed uniformly and randomly to simulate the LoRa network with the same transmission rate. The dormant terminals account for about images . A first-order energy consumption model is adopted and the initial energy is between images . The communication load is between images . Each terminal contains images kinds of sensors to monitor the environment. Terminals within images from the fault terminal are identified as neighbors. Terminals within images from the boundary of SF are set as the terminal with adjustable-rate. The coefficient of variation images . The maximum number of iterations is set to images .

Figs. 4 and 6 show the fault recovery rate with different initialization coefficients at the SF boundary and the same SF case, respectively. With the increase in the number of fault terminals, the fault recovery rate gradually decreases. It can be seen that different initialization coefficients have an impact on the final result. Reassigning tasks after an assignment fails will result in additional recovery costs. It can be seen from Figs. 5 and 7 that with the increase of the number of fault terminals, the cost of fault recovery gradually increases. Since some terminals are assigned multiple recovery tasks at the same time, it is difficult for the terminals to complete these tasks. The total cost in Fig. 5 is relatively high because the addition of the rate coefficient leads to an increasing number of alternatives.

images

Figure 4: Fault recovery rate at SF boundary

images

Figure 5: Extra recovery cost at SF boundary

images

Figure 6: Fault recovery rate with the same SF

images

Figure 7: Extra recovery cost with the same SF

Figs. 8 and 9 show the comparison of the TFSR algorithm, GA and DPSO algorithm in terms of fault recovery rate and additional recovery cost. It can be seen that our TFSR algorithm maintains a higher fault recovery rate and a lower extra recovery cost. The TFSR algorithm reduces the consumption of network resources while maintaining the normal operation of the network. The performance of the GA and DPSO algorithms is not as well as the TFSR algorithm.

images

Figure 8: Fault recovery rate of different algorithms

images

Figure 9: Extra recovery cost of different algorithms

6 Conclusion

A large number of terminals in the PIoT sensing network are facing the problems of frequent faults and high manual recovery cost. Therefore, it is necessary to recover the fault data tasks through the neighbors of the fault terminals. This paper proposes a self-adaptive terminal fault recovery mechanism for PIoT based on task migration negotiation. After the LoRa gateway detects the fault terminal, the TFSR algorithm is proposed to solve the problem of self-adaptive recovery of faults. Finally, the fault data tasks are allocated through task negotiation. Simulation results show that the TFSR algorithm is superior to the GA algorithm and DPSO algorithm in the fault recovery rate and can maintain a lower fault recovery cost.

Funding Statement: This work was supported in part by the Beijing Natural Science Foundation through the Research on Adaptive Fault Recovery Mechanism for Electric PIoT under Grant 4194085 and in part by the Fundamental Research Funds for the Central Universities under Grant 2019RC08.

Conflicts of Interest: The authors declare that they have no conflicts of interest to report regarding the present study.

References

1. F. L. He, J. Q. Chen, Q. H. Li, Y. Q. Yi and Y. J. Zhang. (2020). “Application and development of internet of things technology in smart grid,” Power System Protection and Control, vol. 48, no. 3, pp. 58–69. [Google Scholar]

2. W. Rao, J. Y. Ding and R. Li. (2011). “Application of internet of things technology in smart grid,” Central China Electric Power, vol. 24, no. 2, pp. 1–5. [Google Scholar]

3. N. Zheng, X. Yang and S. L. Wu. (2017). “A survey of low-power wide-area network technology,” Information and Communications Technologies, vol. 11, no. 1, pp. 47–54. [Google Scholar]

4. Y. B. Chen, Y. Tang and X. W. Ai. (2017). “Electricity internet of things based on LPWAN technology,” Telecommunications Science, vol. 33, no. 5, pp. 143–152. [Google Scholar]

5. J. R. Tang, J. Li, A. Zhong, B. Y. Xiong, X. H. Bian et al. (2019). , “Application of LoRa and NB-IoT in ubiquitous power internet of things: A case study of fault indicator in electricity distribution network,” in 2019 4th International Conference on Intelligent Green Building and Smart Grid (IGBSGYichang, China, pp. 380–383. [Google Scholar]

6. H. Y. Chen, Y. B. Chen, X. J. Wang, W. T. Chen and Z. H. Li. (2019). “Ubiquitous power internet of things based on LPWAN,” Power System Protection and Control, vol. 8, no. 8, pp. 1–8. [Google Scholar]

7. Y. Wang, X. M. Wen, Z. M. Lu, G. Cheng and Q. Pan. (2017). “Emerging technology for the internet of things–LoRa,” Information and Communications Technologies, vol. 11, no. 1, pp. 55–72. [Google Scholar]

8. A. Adana, E. U. Salam, A. Arifin and M. Rizal. (2018). “Forest fire detection using LoRa wireless mesh topology,” in 2018 2nd East Indonesia Conference on Computer and Information Technology (EIConCITMakassar, Indonesia, pp. 184–187. [Google Scholar]

9. H. Lee and K. Ke. (2018). “Monitoring of large-area IoT sensors using a LoRa wireless mesh network system: design and evaluation,” IEEE Transactions on Instrumentation and Measurement, vol. 67, no. 9, pp. 2177–2187. [Google Scholar]

10. H. Zheng. (2017). “The implementation and application of LoRa modulation for LPWAN,” Information and Communications Technologies, vol. 11, no. 1, pp. 19–26. [Google Scholar]

11. T. F. Zhao, L. B. Chen, L. Yuan and X. Q. Hu. (2016). “Design and implementation of smart meter reading system based on LoRa,” Computer Measurement & Control, vol. 24, no. 9, pp. 298–301. [Google Scholar]

12. M. Asim, H. Mokhtar and M. Merabti. (2009). “A cellular approach to fault detection and recovery in wireless sensor networks,” in 2009 Third Int. Conf. on Sensor Technologies and Applications, Athens, Glyfada, Greece, pp. 352–357. [Google Scholar]

13. J. H. Zhu, Y. Yang, X. S. Qiu and Z. P. Gao. (2014). “Sensor failure detection and recovery mechanism based on support vector and genetic algorithm,” in 16th Asia-Pacific Network Operations and Management Symposium, Hsinchu, Taiwan, pp. 1–4. [Google Scholar]

14. H. Shih, J. Ho, B. Liao and J. Pan. (2013). “Fault node recovery algorithm for a wireless sensor network,” IEEE Sensors Journal, vol. 13, no. 7, pp. 2683–2689. [Google Scholar]

15. S. Abuelenin, S. Dawood and A. Atwan. (2016). “Enhancing failure recovery in wireless sensor network based on grade diffusion,” in 2016 11th Int. Conf. on Computer Engineering & Systems (ICCESCairo, Egypt, pp. 334–339. [Google Scholar]

16. S. Mitra, A. Das and S. Mazumder. (2016). “Comparative study of fault recovery techniques in wireless sensor network,” in 2016 IEEE Int. WIE Conf. on Electrical and Computer Engineering (WIECON-ECEPune, India, pp. 130–133. [Google Scholar]

17. Z. H. Sun, E. R. Pei and H. Z. Han. (2016). “Node sleep scheduling mechanism based on network coverage for wireless sensor networks,” Application Research of Computers, vol. 33, no. 09, pp. 2731–2742. [Google Scholar]

18. J. Zhao and G. T. Su. (2016). “Analysis on LoRa wireless network technology,” Mobile Communications, vol. 40, no. 21, pp. 50–57. [Google Scholar]

19. X. G. Zhu. (2007). “Research on task cooperation technology of wireless sensor network,” M.S. thesis. Northwestern Polytechnical University, Xi’an, China. [Google Scholar]

20. R. Sathiyavathi and B. Bharathi. (2017). “A review on fault detection in wireless sensor networks,” in 2017 Int. Conf. on Communication and Signal Processing (ICCSPChennai, India, pp. 1487–1490. [Google Scholar]

This work is licensed under a Creative Commons Attribution 4.0 International License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.