Open Access
ARTICLE
Dynamic Decoupling-Driven Cooperative Pursuit for Multi-UAV Systems: A Multi-Agent Reinforcement Learning Policy Optimization Approach
1 School of Automation, Northwestern Polytechnical University, Xi’an, 710072, China
2 National Key Laboratory of Unmanned Aerial Vehicle Technology, Northwestern Polytechnical University, Xi’an, 710072, China
* Corresponding Author: Chengfu Wu. Email:
Computers, Materials & Continua 2025, 85(1), 1339-1363. https://doi.org/10.32604/cmc.2025.067117
Received 25 April 2025; Accepted 10 June 2025; Issue published 29 August 2025
Abstract
This paper proposes a Multi-Agent Attention Proximal Policy Optimization (MA2PPO) algorithm aiming at the problems such as credit assignment, low collaboration efficiency and weak strategy generalization ability existing in the cooperative pursuit tasks of multiple unmanned aerial vehicles (UAVs). Traditional algorithms often fail to effectively identify critical cooperative relationships in such tasks, leading to low capture efficiency and a significant decline in performance when the scale expands. To tackle these issues, based on the proximal policy optimization (PPO) algorithm, MA2PPO adopts the centralized training with decentralized execution (CTDE) framework and introduces a dynamic decoupling mechanism, that is, sharing the multi-head attention (MHA) mechanism for critics during centralized training to solve the credit assignment problem. This method enables the pursuers to identify highly correlated interactions with their teammates, effectively eliminate irrelevant and weakly relevant interactions, and decompose large-scale cooperation problems into decoupled sub-problems, thereby enhancing the collaborative efficiency and policy stability among multiple agents. Furthermore, a reward function has been devised to facilitate the pursuers to encircle the escapee by combining a formation reward with a distance reward, which incentivizes UAVs to develop sophisticated cooperative pursuit strategies. Experimental results demonstrate the effectiveness of the proposed algorithm in achieving multi-UAV cooperative pursuit and inducing diverse cooperative pursuit behaviors among UAVs. Moreover, experiments on scalability have demonstrated that the algorithm is suitable for large-scale multi-UAV systems.Keywords
Cite This Article
Copyright © 2025 The Author(s). Published by Tech Science Press.This work is licensed under a Creative Commons Attribution 4.0 International License , which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.


Submit a Paper
Propose a Special lssue
View Full Text
Download PDF
Downloads
Citation Tools