Enhanced Coverage Path Planning Strategies for UAV Swarms Based on SADQN Algorithm

Zhuoyan Xie; Qi Wang; Bin Kong; Shang Gao

doi:10.32604/cmc.2025.064147

icon Open Access

ARTICLE

Enhanced Coverage Path Planning Strategies for UAV Swarms Based on SADQN Algorithm

Zhuoyan Xie¹, Qi Wang^1,*, Bin Kong^2,*, Shang Gao¹

1 School of Computer Science and Engineering, Jiangsu University of Science and Technology, Zhenjiang, 212003, China
2 Experimental Centre of Forestry in North China, Chinese Academy of Forestry, Beijing, 102300, China

* Corresponding Authors: Qi Wang. Email: email ; Bin Kong. Email: email

Computers, Materials & Continua 2025, 84(2), 3013-3027. https://doi.org/10.32604/cmc.2025.064147

Received 06 February 2025; Accepted 06 May 2025; Issue published 03 July 2025

Abstract

In the current era of intelligent technologies, comprehensive and precise regional coverage path planning is critical for tasks such as environmental monitoring, emergency rescue, and agricultural plant protection. Owing to their exceptional flexibility and rapid deployment capabilities, unmanned aerial vehicles (UAVs) have emerged as the ideal platforms for accomplishing these tasks. This study proposes a swarm A*-guided Deep Q-Network (SADQN) algorithm to address the coverage path planning (CPP) problem for UAV swarms in complex environments. Firstly, to overcome the dependency of traditional modeling methods on regular terrain environments, this study proposes an improved cellular decomposition method for map discretization. Simultaneously, a distributed UAV swarm system architecture is adopted, which, through the integration of multi-scale maps, addresses the issues of redundant operations and flight conflicts in multi-UAV cooperative coverage. Secondly, the heuristic mechanism of the A* algorithm is combined with full-coverage path planning, and this approach is incorporated at the initial stage of Deep Q-Network (DQN) algorithm training to provide effective guidance in action selection, thereby accelerating convergence. Additionally, a prioritized experience replay mechanism is introduced to further enhance the coverage performance of the algorithm. To evaluate the efficacy of the proposed algorithm, simulation experiments were conducted in several irregular environments and compared with several popular algorithms. Simulation results show that the SADQN algorithm outperforms other methods, achieving performance comparable to that of the baseline prior algorithm, with an average coverage efficiency exceeding 2.6 and fewer turning maneuvers. In addition, the algorithm demonstrates excellent generalization ability, enabling it to adapt to different environments.

Keywords

Coverage path planning; unmanned aerial vehicles; swarm intelligence; Deep Q-Network; A* algorithm; prioritized experience replay

1 Introduction

In recent years, due to their efficiency, flexibility, safety, and other advantages, unmanned aerial vehicles (UAVs) have been widely applied in various fields such as surveillance, agriculture, disaster management, and power line inspections [1–5]. These applications require UAV to quickly cover the target area while acquiring environmental information and avoiding obstacles [6]. Therefore, efficient coverage path planning (CPP) is critical to determining the operational efficiency and quality of the UAV. However, in practical applications, a single UAV is limited by its endurance and is suitable only for small-scale tasks [7]. For large-scale missions, UAV swarm are typically used. This paper focuses on the coverage task of the UAV swarm in order to further improve efficiency and reduce time and energy consumption.

Researchers have proposed various methods to address the UAV swarm coverage path planning problem, which can be broadly classified into centralized and distributed approaches. In centralized methods, a central planner allocates tasks based on prior environmental knowledge and mathematical optimization to find the optimal path [8]. However, these methods assume that environmental information is available, which underestimates the complexity of the CPP problem. Distributed methods, on the other hand, enable collaborative decision-making through communication between UAVs and a ground station, avoiding collisions [9], but still face challenges in complex and irregular environments. With the development of artificial intelligence, more and more researchers are applying machine learning to path planning problems, with reinforcement learning (RL) being the most representative approach [10]. However, its research in coverage path planning is still in the early stages [11].

This paper proposes a UAV swarm-based Coverage Path Planning methodology, aiming to ensure complete coverage of the target area while minimizing time and energy consumption, taking into account factors such as no-fly zones, boundary constraints, and flight conflicts. The main contributions of this paper include:

(1) We proffer an enhanced cell decomposition method, discretizing the irregular target area into grids whilst concurrently minimizing the grid-based target area to curtail the task execution time to the greatest extent possible.

(2) We put forward a multi-scale map fusion method, which amalgamates and updates local environment maps perceived by individual UAVs through communication among UAVs, augmenting the observation range of UAVs and further augmenting coverage efficiency.

(3) We present a distributed SADQN algorithm, leveraging the A* algorithm to assist in initial action selection for UAV swarm DQN training, expediting the training process, and employing prioritized experience replay to train neural networks, hastening the convergence of the algorithm.

The remainder of this work is structured as follows. Section 2 furnishes an overview of germane work in the field of multi-UAV coverage path planning. Section 3 introduces the model of UAV swarm coverage tasks and delineates objectives and constraints. Section 4 presents the SADQN algorithm for UAV swarm coverage along with its minutiae. Section 5 showcases the simulation results of the algorithm. Finally, Section 6 concludes the paper and deliberates on future research directions.

2 Related Works

At present, a copious amount of research efforts have been dedicated to the coverage path planning problem concerning UAV swarms. This part undertakes a comprehensive survey of pertinent work from both centralized and distributed perspectives.

In [12], a grid decomposition-based coverage method is proposed, which optimizes the allocation of coverage areas for multiple UAVs by constructing a linear programming model. Lu et al. introduced a turning-minimization multi-robot spanning tree coverage algorithm that transforms the problem into one of finding the maximum independent set in a bipartite graph and then employs a greedy strategy to minimize the number of turns in the spanning tree’s circumnavigation coverage path [13]. Maza and Ollero presented a cooperative strategy in which the ground control station divides the target area into multiple non-overlapping, obstacle-free subregions and assigns each subregion to a UAV based on its relative capabilities and initial position [14]. However, since this method initiates task assignments from the center of each subregion, it may result in considerable redundant coverage. Building upon this, Chen et al. proposed a novel path planning method based on a success-history-adaptive differential evolution variant combined with linear population reduction. This approach establishes the relationship between all possible starting points of subregion paths and the overall multi-region path to enhance the efficiency of multi-region coverage tasks [15].

These methods rely on the availability of environmental information, underestimating the complexity of the CPP problem and being applicable only to deterministic scenarios. To augment UAVs’ robustness in complex environments, distributed algorithms can be enlisted. In [16], a coordinated search scheme based on model predictive control and communication constraints was designed to effectively enable UAV swarms to search for both static and dynamic targets in uncertain scenarios. Qiu et al. proposed a novel complete coverage path planning method for mobile robot obstacle avoidance based on bio-inspired neural networks, rolling path planning, and heuristic search approaches [17]. Bine et al. introduced an energy-aware ant colony optimization algorithm that leverages multi-UAV Internet technologies to coordinate and organize UAVs in order to avoid airspace collisions and congestion [18]. Li et al. proposed a DQN-based coverage path planning method that approximates the optimal action values via DQN, while combining a sliding window approach with probabilistic statistics to handle unknown environments. This method optimizes coverage decisions and enhances the adaptability and performance of multi-UAV missions in unknown scenarios [19]. Moreover, in [20], a comprehensive coverage path planning framework was constructed using deep reinforcement learning techniques, which integrates convolutional neural networks with long short-term memory networks. The framework simultaneously maximizes cumulative rewards and optimizes an overall cost weight based on kinetic energy.

Although the above research results have obvious application effects in their respective research fields, there is still significant room for improvement in solving the problem of collaborative coverage path planning for UAV swarms in discrete environments, while simultaneously reducing path repetition rate and energy consumption.

3 Framework of the Model

3.1 Enhanced Approximate Cell Decomposition

The coverage task in this study is conducted on a discretized map. However, real-world environments are often irregular, which results in many redundant grids during the discretization process. In irregular terrain environments, to improve task execution efficiency, we have adopted an innovative strategy: rotating the target area to minimize the decomposed grid map, thereby minimizing task execution time.

Fig. 1 illustrates this enhanced approximate cell decomposition method, where black lines represent the boundaries of the irregular target area, red grids represent the area that UAVs need to cover, black blocks represent obstacles in the environment, and gray grids represent no-fly zones. Fig. 1a illustrates the initial situation of cell decomposition in an irregular area. To optimize the terrain redundancy issue, we first identify all vertices (xi,yi) of the target area and select the longest edge AB. Point A is chosen as the reference point, and the entire target area is translated using Eq. (1) to obtain coordinates (xi′,yi′), as shown in Fig. 1b. Then, using edge AB as the rotation axis, the entire target area is rotated using Eq. (2) to obtain coordinates (xi′′,yi′′), and where li is the length of edge AB, and θi is the rotation angle. Fig. 1c represents the final grid target area.

(xi′,yi′)=(xi−xa,yi−ya)(1)

(xi′′,yi′′)=(li∗cos⁡θi,li∗sin⁡θi)(2)

images

Figure 1: Illustration of enhanced approximate cell decomposition in irregular terrain

3.2 Multi-Scale Maps

UAV swarms often face issues such as trajectory overlap, coverage gaps, and collisions when performing coverage path planning in complex environments with obstacles. To address these challenges, we propose a method for integrating and updating multi-scale maps to achieve close cooperation among UAVs.

Specifically, a UAV swarm system consisting of N UAVs is established, where each UAV is denoted as UAVi,(i∈N). These UAVs are tasked with coverage missions in a l×w grid area, with each grid’s geometric center denoted as P=(x,y), where x∈{1,2,…,l},y∈{1,2,…,w}. Thus, the position information of the i-th UAV can be represented as Pi=(x,y). Let G represent the target grid area in the irregular environment where CPP needs to be performed, requiring Pi∈G. When a UAV reaches the position above point P it indicates that the current grid has been fully covered. Let Covrx,y represent the environmental information status value of the current grid: Covrx,y=0 indicates that the grid has not been covered by any UAV, Covrx,y=1 indicates that the grid has been covered, and Covrx,y=−1 indicates that the UAV has detected an obstacle in the grid. Therefore, the coverage status matrix mapi of the i-th UAV can be represented as follows:

mapi=(Covr1,1i…Covr1,wi⋮⋱⋮Covrl,1i⋯Covrl,wi)(3)

In this equation, Covrx,yi represents the coverage status of UAVi at grid (x,y). The process of multi-scale map fusion and update for the UAV swarm can be represented using the coverage status matrix:

mapi⟵updatemap(maxi∈NCovrx,yi)(4)

Fig. 2 shows the system framework for three UAVs working together on the CPP task. UAVs interact with the environment, choose actions, and receive rewards. Specifically, we use local environmental maps to record the areas covered by each UAV and no-fly zones. At each time step, UAVs exchange local environmental map information with the ground station and update the multi-scale maps, enabling action decisions based on global observations.

images

Figure 2: Illustration of the CPP system framework

3.3 Performance Metrics

In summary, this paper adopts a distributed approach, allowing each UAV to take off simultaneously from different initial positions. At each step, UAVs collaborate using multi-scale map integration while independently making action decisions. During this process, each UAV only needs to cover a portion of the target area, completing its sub-task. Therefore, the completion time of the last UAV’s sub-task determines the total completion time of the CPP task. This paper assumes that the UAV maintains a constant speed during flight and uses the number of task completion steps Step as an important metric to evaluate the CPP problem, which can be expressed as follows:

Step=maxi∈Nstepi(5)

where stepi represents the number of steps required for UAVi to complete its subtask. We need to set multiple metrics for precise measurement to better assess the coverage task performance. First, we define Plapped as the set of all duplicate grids within the target area, and Picover as the set of all grids covered by the path of UAVi. Thus, the difference between the two represents the effective coverage grid set. Based on this, we define the coverage efficiency of the UAV swarm CPP task as follows:

η=∑i=1Nf(Picover−Plapped)Step(6)

where the function f(∙) is used to obtain the number of elements in a set. The CPP problem for the UAV swarm requires maximizing coverage efficiency within the target area, meaning minimizing the task completion steps. Additionally, the number of turns in the path needs to be counted fturn.

4 Distributed SADQN Algorithm

4.1 DQN Algorithm

The Deep Q-Network (DQN) algorithm aims to learn the state-action value (Q-value) function through a deep neural network to find the optimal policy, enabling the agent to take appropriate actions in each state to maximize future cumulative rewards. The update process of its Q-value function is as follows:

Q(s,a;θ)=r+γmaxa′Q(s′,a′;θ−)(7)

The parameters θ in Eq. (7) are the weights of each layer in the neural network, γ is the discount factor, and r is the reward value for the current action. The core idea of the DQN algorithm is to minimize the loss function and continuously adjust the weights θ of the neural network using gradient descent, so that the Q-values output by the network are as close as possible to the actual Q-values. The loss function can be defined as follows:

L(θ)=Es,a,r,s′[(r+γmaxa′Q(s′,a′;θ−)−Q(s,a;θ))2](8)

Specifically, DQN uses two neural networks with the same structure but different parameters: the online network and the target network. In Eq. (8), Q(s,a;θ) represents the output of the online network, which is used to evaluate the Q-value of the current state-action pair; Q(s′,a′;θ−) represents the output of the target network, and the target Q-value calculation process is reflected in Eq. (7). The parameters are then updated based on the loss function, with the online network updating its parameters at each iteration, while the target network updates its parameters only periodically.

In a given state, the agent in a reinforcement learning task can only choose one action, either exploitation or exploration. Therefore, a balance between the two needs to be achieved in order to obtain the optimal task outcome. To further improve learning efficiency, this paper improves the ε-greedy strategy. This paper proposes a swarm A*-guided Deep Q-Network (SADQN) algorithm, in which a heuristic action guidance mechanism based on the A* algorithm is introduced during the DQN training process to replace the traditional random action selection strategy. This improved action selection strategy not only preserves the adaptive exploration ability of reinforcement learning but also accelerates the accumulation of effective experiences through heuristic guidance. The ε-greedy strategy designed in this paper is as follows:

at={ A* ,ρ<εarg⁡maxaQ(s,a;θ),ρ≥ε(9)

where ρ is a randomly generated number between 0 and 1, and in this paper, ε is set to 0.9. When ρ < ε, the UAVs select the action with the maximum Q-value based on the Q-network; otherwise, it makes action decisions with the A* algorithm. Subsequently, we introduce how the A* algorithm selects actions for coverage path planning. Each UAV needs to plan its path based on the globally observed map fused from multi-scale maps. Starting from the current position Pi, the objective is to minimize the following:

f(x)=g(x)+h(x)(10)

where g(x) represents the cost incurred from the starting point Pi to the current position xi, which is the sum of the cost of each action gi taken from the starting point to the current position, where gi∈G and G={0,0.1,0.4,0.2,0.2}. h(x) represents the estimated cost at point x, which is set as the heuristic cost of the given position, i.e., the Manhattan distance between this position and the starting point.

4.2 State Space, Action Space, Reward

The state space, action space, and instantaneous reward for the UAVi at each time step t can be respectively represented as sit, ait, rit.

a. State space and action space

In a discrete grid map, the position of the UAVi at time step t can be denoted as Pit=(x,y). In the distributed approach, to enhance cooperation, the state space of each UAV is relatively complex, including the current position, action decisions, and local environmental map information. Therefore, the state vector of UAVi can be represented as sit={Pit,ait,mapit}.

The algorithm discretizes the UAV’s action space, where the UAV can move diagonally through horizontal and vertical actions. Thus, the action set of the UAVs is represented as A={a0,a1,a2,a3,a4}. Where a0 represents the stationary action, and a1,a2,a3,a4 represent actions moving upwards, downwards, leftwards, and rightwards, respectively. Therefore, the action of UAVi at time t can be represented as ait∈A.

b. Reward

The core objective of the CPP for the UAV swarm is to maximize coverage efficiency while ensuring safe flight operations. To achieve this goal, we have designed multiple task-oriented reward functions to guide the UAV swarm in achieving optimal collision-free coverage. Specifically, to avoid path repetition and collisions, we have designed a reward based on effective coverage, denoted as rarea:

rarea={1,map(x,y)=0−1,map(x,y)=1−10,map(x,y)=−1(11)

In Eq. (11), map(x,y) represents the environmental state information value of the current grid. The UAVi will be assigned different coverage reward values based on the different environmental state of the current grid. In addition, frequent turns by the UAVi can lead to additional energy consumption, significantly increasing energy usage. To reduce mission energy consumption, we have designed a turn-based reward rturn:

rturn={0.5,a=a1−0.5,a=a3,a4−1,a=a0,a2(12)

In Eq. (12), the UAVi is assigned different turning reward values according to its selected flight action. To incentivize the UAVs to use the A* planned path in the early stages of training and accelerate the training process, a positive reward rA∗=0.1 is provided when each UAV employs the A* algorithm for action decision-making. And when the UAV swarm completes the CPP task, each UAV receives a reward rend=1 for reaching its final position state.

The total reward rit obtained by the UAVs at each time step is composed of the four components mentioned above, namely:

rit=rarea+rturn+rA∗+rend(13)

4.3 Prioritized Experience Replay

The SADQN algorithm of this paper deployed on UAVi has its Q-network taking the current state si as input and outputting the estimated Q-value Q(si,ai,θi); the target Q-network takes the next state si′ as input and outputs the maximum Q-value maxai′Q(si′,ai′;θi−). Therefore, the target Q-value can be represented as Eq. (14).

yi={ri,is endri+γmaxai′Q(si′,ai′;θi−),otherwise(14)

To effectively utilize learning experiences, we adopt a prioritized experience replay strategy to improve learning efficiency and stability during training, accelerating the algorithm’s convergence. In the SADQN algorithm proposed in this paper, the UAVi takes action ai in the current state si, transitions to the next state si′, and receives the corresponding reward ri. The experience sample (si,ai,ri,si′) is then stored in the experience replay buffer Di of UAVi. UAVi selects experience samples from its replay buffer based on the TD-error of the j-th sample to determine the sampling probability. The TD-error δi(j) represents the difference between the current Q-value and the target Q-value, and it is defined as follows in Eq. (15):

δi(j)=(yi−Q(si,ai;θi))(j)(15)

Pi(j)=(pi(j))α∑k(pi(k))α(16)

pi(j)=|δi(j)|+ε(17)

ωi(j)=α⋅(n⋅pi(j))−β(18)

Eq. (16) defines the sampling probability for each sample, where pi(j) represents the priority of the i-th sample in the experience pool, as defined in Eq. (17). Additionally, k represents the number of samples in the experience pool, and the parameter α controls the strength of prioritized replay. Eq. (18) defines the importance weight ωi(j), which is used to eliminate the bias introduced by prioritized experience replay. The parameter β determines the intensity of bias correction. Therefore, the loss function considering the priority of experience samples can be defined as:

Li(θi)=E[ωi(j)⋅(δi(j))2](19)

θi←θi+αδi(j)⋅∇θi(Q(si,ai;θi))(j)(20)

Eq. (20) is used to update the parameters θi of the Q-network. Every C steps, the parameters of the Q-network are copied to the target network (i.e., θi−=θi), and the target Q-values yi are generated using the target Q-network for the subsequent C steps. The detailed process of the SADQN algorithm is as follows (Algorithm 1):

images

5 Simulation Result

This section validates the correctness and effectiveness of the proposed SADQN method by comparing its results with those of six other solutions, including GBNN, A*, TMSTC*, SADQN-nA, SADQN-nP, and DDPG. All algorithms were run on Python 3.9 on a Windows 11 system. The specific parameter settings of the algorithm are as follows: the learning rate lr=0.0001, the discount factor γ=0.85, the batch size for stochastic gradient B=64, the experience replay buffer capacity M=1000000, the neural network parameters θi for each UAV are randomly initialized, the target network update frequency C=4, the parameter α for prioritized experience replay is set to 0.6, and the parameter β is set to 0.7.

5.1 Analysis Performance of SADQN

We map the UAV flight trajectories onto a two-dimensional plane. Fig. 3a shows the irregular environment, where the initial discretized map is shown in Fig. 3b, and the final optimized map is presented in Fig. 3c. And in the irregular environment depicted in Fig. 3c, three UAVs are deployed to perform the coverage path planning task. The green dashed lines represent the starting positions of the UAVs.

images

Figure 3: Schematic diagram of the discretized target area

As shown in Fig. 4, the final coverage results of different algorithms are presented. We can observe that all the algorithms have completed the coverage task. Moreover, the paths generated by the proposed SADQN method and the A* algorithm show no overlapping segments, while significant repeated coverage is observed in the path maps of the other algorithms.

images

Figure 4: Comparison of coverage results for different algorithms. (a) GBNN; (b) A*, (c) TMSTC*; (d) SADQN; (e) SADQN_nP; (f) SADQN_nA; (g) DDPG

As shown in Table 1, the average results of the CPP task under different algorithms are presented. From this, we can observe that in the irregular environment, the average coverage efficiency of the proposed SADQN method can reach 2.798, which indicates that it effectively balances the workload distribution among the UAVs while minimizing repeated coverage. Notably, the task completion efficiency of our method is superior to that of the GBNN, A*, and TMSTC* algorithms. Where the GBNN algorithm exhibits a significant amount of redundant path planning. While the A* and TMSTC algorithms demonstrate high coverage efficiency, they fall short of the SADQN algorithm in terms of the number of task completion steps, primarily due to their lack of a collaborative mechanism. Our SADQN algorithm achieves cooperative coverage among UAVs by sharing environmental information through multi-scale maps, making it better suited for complex environments and resulting in superior path planning. Additionally, the SADQN algorithm performs better in terms of step count and number of turns, benefiting from our turn reward mechanism, which effectively reduces energy consumption caused by frequent turns. DDPG adopts a deterministic policy, making it prone to getting stuck in local optima during exploration. Although it enhances exploration by adding noise, this mechanism is insufficient in large state spaces and irregular environments, resulting in slow learning and suboptimal performance.

images

Fig. 5 illustrates the reward variation curves of four different algorithms during training, with the shaded areas representing the standard deviation of the reward values, reflecting the range of reward fluctuations. From Fig. 5, we can observe that the SADQN algorithm converges the fastest, stabilizing after 200 episodes. The reward values of the SADQN_nA algorithm are relatively low, indicating that the A*-guided algorithm provides effective prior knowledge, accelerating the training process. The reward increase of the SADQN_nP algorithm is slower, suggesting that prioritized experience replay effectively utilizes important experiences, enhancing training efficiency. The combination of the A* algorithm and prioritized experience replay makes the SADQN algorithm perform the best. The DDPG algorithm converges slowly with lower reward values, primarily because its exploration mechanism and action selection methods are unsuitable for complex environments, resulting in significantly inferior performance.

images

Figure 5: Reward curve comparison graph

The UAV swarm may encounter unexpected situations such as malfunctions, energy depletion, or communication interruptions while performing complex tasks, causing some UAVs to stop working. In such cases, ensuring full coverage becomes crucial. As shown in Fig. 6, in a coverage task coordinated by three UAVs, one UAV stops working due to an unexpected issue. The remaining two UAVs can quickly adjust their strategies and take over the coverage task for the entire area to ensure the successful completion of full coverage.

images

Figure 6: Response of the UAV swarm to the failure situation

5.2 Generalization Ability

To validate the algorithm’s generalization ability, this section compares the results of the CPP problem in two irregular environments, Env1 and Env2, as shown in Fig. 7a,b. It can be observed that the number of coverage grids in the target area was reduced by 10.1% and 9.1%, respectively, after processing with the improved cell decomposition method. In this section, three UAVs were deployed to perform coverage path planning tasks in two experimental scenarios shown in Fig. 7a,b. The green lines outline the starting positions of each UAV, and they will collaborate to achieve full coverage of the target area.

images

Figure 7: Discretization of irregular environments

Table 2 presents the completion steps, coverage efficiency, and turning counts. From Table 2, it can be observed that all algorithms achieved complete coverage. Our proposed SADQN method performed well in both environments, with minimal occurrence of duplicate coverage. Compared to other algorithms, SADQN demonstrated lower average steps to completion steps, further enhancing its coverage efficiency, with an average coverage efficiency exceeding 2.6. Moreover, while our method ensures a high coverage efficiency through reasonable path planning, it also involves fewer turns, enabling more efficient coverage path planning.

images

6 Conclusion

This research initiates with the discretization of irregular environments through an enhanced cell decomposition technique, thereby effectively partitioning the complex terrains into manageable grids. Simultaneously, the observation scope of UAVs is expanded by leveraging multi-scale maps, endowing them with a broader field of view to better perceive and adapt to the surroundings. An inventive SADQN algorithm is put forth, which ingeniously incorporates the A* algorithm to bolster the DQN process, specifically devised to tackle the intricate coverage path planning challenges that UAV swarms encounter in complex and irregular settings. The proposed algorithm has been rigorously validated within obstacle-ridden irregular environments. The experimental outcomes unequivocally demonstrate its superiority over existing benchmark algorithms. In terms of convergence speed, it exhibits a remarkable acceleration, swiftly arriving at optimal solutions and reducing the computational burden. Regarding completion steps, it streamlines the overall process, minimizing unnecessary maneuvers and enhancing operational efficiency. The coverage efficiency soars to an impressive 2.6, signifying a significant leap in the thoroughness of area coverage. Moreover, the algorithm also considers the turning counts, optimizing flight paths to curtail energy consumption associated with frequent directional changes. Notably, in the face of sudden failures, the resilience of the system shines through. The remaining UAVs are still capable of tenaciously adhering to the original CPP task objectives, ensuring the continuity and integrity of the mission. This robustness adds an extra layer of reliability to UAV swarm operations, especially in critical or unpredictable scenarios.

Acknowledgement: The authors are thankful to researchers in Jiangsu University of Science and Technology for the helpful discussion.

Funding Statement: The authors received no specific funding for this study.

Author Contributions: The authors confirm contribution to the paper as follows: study conception and design: Zhuoyan Xie, Qi Wang; data collection: Zhuoyan Xie; analysis and interpretation of results: Zhuoyan Xie, Bin Kong; draft manuscript preparation: Zhuoyan Xie, Shang Gao. All authors reviewed the results and approved the final version of the manuscript.

Availability of Data and Materials: The data that support the findings of this study are available from the corresponding authors, upon reasonable request.

Ethics Approval: Not applicable.

Conflicts of Interest: The authors declare no conflicts of interest to report regarding the present study.

Nomenclature

l,w	Length and width of the target area
x,y	Target area coordinate
G	Target grid area
N	Total number of Unmanned Aerial
UAVi	The i-th Unmanned Aerial Vehicle
Pi	The i-th UAV’s grid center coordinates
Covrx,y	Environmental information state value
mapi	The i-th UAV’s coverage state matrix
stepi	The i-th UAV’s number of steps
η	Coverage efficiency
A	Set of the UAV actions
s,a,r	State, action, and reward of the DQN algorithm
sit,ait,rit	State, action, and reward of the UAVi at step t
θ	DQN neural network parameters
Q(s,a;θ)	Value function of the DQN algorithm
yi	Target Q-value
L(θ)	Loss function
ε	Greedy algorithm parameters

References

1. Cabreira TM, Brisolara LB, Paulo RFJ. Survey on coverage path planning with unmanned aerial vehicles. Drones. 2019;3(1):4. doi:10.3390/drones3010004. [Google Scholar] [CrossRef]

2. Mukhamediev RI, Yakunin K, Aubakirov M, Assanov I, Kuchin Y, Symagulov A, et al. Coverage path planning optimization of heterogeneous UAVs group for precision agriculture. IEEE Access. 2023;11:5789–803. doi:10.1109/access.2023.3235207. [Google Scholar] [CrossRef]

3. Pérez-González A, Benítez-Montoya N, Jaramillo-Duque Á, Cano-Quintero JB. Coverage path planning with semantic segmentation for UAV in PV plants. Appl Sci. 2021;11(24):12093. doi:10.3390/app112412093. [Google Scholar] [CrossRef]

4. Muñoz J, López B, Quevedo F, Monje CA, Garrido S, Moreno LE. Multi UAV coverage path planning in urban environments. Sensors. 2021;21(21):7365. doi:10.3390/s21217365. [Google Scholar] [PubMed] [CrossRef]

5. Cabreira TM, Di Franco C, Ferreira PR, Buttazzo GC. Energy-aware spiral coverage path planning for UAV photogrammetric applications. IEEE Robot Autom Lett. 2018;3(4):3662–8. doi:10.1109/lra.2018.2854967. [Google Scholar] [CrossRef]

6. Nedjati A, Izbirak G, Vizvari B, Arkat J. Complete coverage path planning for a multi-UAV response system in post-earthquake assessment. Robotics. 2016;5(4):26. doi:10.3390/robotics5040026. [Google Scholar] [CrossRef]

7. Collins L, Ghassemi P, Esfahani ET, Doermann D, Dantu K, Chowdhury S. Scalable coverage path planning of multi-robot teams for monitoring non-convex areas. In: Proceedings of the 2021 IEEE International Conference on Robotics and Automation (ICRA); 2021 May 30–Jun 5; Xi’an, China. Piscataway, NJ, USA: IEEE; 2021. p. 7393–9. [Google Scholar]

8. Bolu A, Korçak Ö. Path planning for multiple mobile robots in smart warehouse. In: Proceedings of the 2019 7th International Conference on Control, Mechatronics and Automation (ICCMA); 2019 Nov 6–8; Piscataway, NJ, USA: IEEE; 2019. p. 144–50. [Google Scholar]

9. Luna MA, Molina M, Da-Silva-Gomez R, Melero-Deza J, Arias-Perez P, Campoy P. A multi-UAV system for coverage path planning applications with in-flight re-planning capabilities. J Field Robot. 2024;41(5):1480–97. doi:10.1002/rob.22342. [Google Scholar] [CrossRef]

10. Jonnarth A, Zhao J, Felsberg M. End-to-end reinforcement learning for online coverage path planning in unknown environments. arXiv:2306.16978. 2023. [Google Scholar]

11. Heydari J, Saha O, Ganapathy V. Reinforcement learning-based coverage path planning with implicit cellular decomposition. arXiv:2110.09018. 2021. [Google Scholar]

12. Cho SW, Park JH, Park HJ, Kim S. Multi-uav coverage path planning based on hexagonal grid decomposition in maritime search and rescue. Mathematics. 2021;10(1):83. doi:10.3390/math10010083. [Google Scholar] [CrossRef]

13. Lu J, Zeng B, Tang J, Lam TL, Wen J. TMSTC*: a path planning algorithm for minimizing turns in multi-robot coverage. IEEE Robot Autom Lett. 2023;8(8):5275–82. doi:10.1109/lra.2023.3293319. [Google Scholar] [CrossRef]

14. Maza I, Ollero A. Multiple UAV cooperative searching operation using polygon area decomposition and efficient coverage algorithms. In: Distributed autonomous robotic systems. Vol. 6. Tokyo, Japan: Springer; 2007. p. 221–30 doi: 10.1007/978-4-431-35873-2_22. [Google Scholar] [CrossRef]

15. Chen G, Shen Y, Zhang Y, Zhang W, Wang D, He B. 2D multi-area coverage path planning using L-SHADE in simulated ocean survey. Appl Soft Comput. 2021;112(7):107754. doi:10.1016/j.asoc.2021.107754. [Google Scholar] [CrossRef]

16. Xu S, Zhou Z, Li J, Wang L, Zhang X, Gao H. Communication-constrained UAVs coverage search method in uncertain scenarios. IEEE Sens J. 2024;24(10):17092–101. doi:10.1109/jsen.2024.3384261. [Google Scholar] [CrossRef]

17. Qiu X, Song J, Zhang X, Liu S. A complete coverage path planning method for mobile robot in uncertain environments. In: Proceedings of the 2006 6th World Congress on Intelligent Control and Automation; 2006 Jun 21–23; Dalian, China. Piscataway, NJ, USA: IEEE; 2006. p. 8892–6. [Google Scholar]

18. Bine LM, Boukerche A, Ruiz LB, Loureiro AA. A novel ant colony-inspired coverage path planning for internet of drones. Comput Netw. 2023;235:109963. doi:10.1016/j.comnet.2023.109963. [Google Scholar] [CrossRef]

19. Li W, Zhao T, Dian S. Multirobot coverage path planning based on deep Q-network in unknown environment. J Robot. 2022;2022(1):6825902. doi:10.1155/2022/6825902. [Google Scholar] [CrossRef]

20. Le AV, Vo DT, Dat NT, Vu MB, Elara MR. Complete coverage planning using Deep Reinforcement Learning for polyiamonds-based reconfigurable robot. Eng Appl Artif Intell. 2024;138(4):109424. doi:10.1016/j.engappai.2024.109424. [Google Scholar] [CrossRef]

Cite This Article

APA Style

Xie, Z., Wang, Q., Kong, B., Gao, S. (2025). Enhanced Coverage Path Planning Strategies for UAV Swarms Based on SADQN Algorithm. Computers, Materials & Continua, 84(2), 3013–3027. https://doi.org/10.32604/cmc.2025.064147

Vancouver Style

Xie Z, Wang Q, Kong B, Gao S. Enhanced Coverage Path Planning Strategies for UAV Swarms Based on SADQN Algorithm. Comput Mater Contin. 2025;84(2):3013–3027. https://doi.org/10.32604/cmc.2025.064147

IEEE Style

Z. Xie, Q. Wang, B. Kong, and S. Gao, “Enhanced Coverage Path Planning Strategies for UAV Swarms Based on SADQN Algorithm,” Comput. Mater. Contin., vol. 84, no. 2, pp. 3013–3027, 2025. https://doi.org/10.32604/cmc.2025.064147

BibTex EndNote RIS

Copyright © 2025 The Author(s). Published by Tech Science Press.
This work is licensed under a Creative Commons Attribution 4.0 International License , which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Table of Content

Enhanced Coverage Path Planning Strategies for UAV Swarms Based on SADQN Algorithm

Abstract

Keywords

References

Cite This Article

908

402

0

Related articles

Further Information

Guidelines

Follow Us

Join Us

Contact Us

WhatsApp:

Share Link