Open Access iconOpen Access

ARTICLE

Dynamic Decoupling-Driven Cooperative Pursuit for Multi-UAV Systems: A Multi-Agent Reinforcement Learning Policy Optimization Approach

Lei Lei1, Chengfu Wu2,*, Huaimin Chen2

1 School of Automation, Northwestern Polytechnical University, Xi’an, 710072, China
2 National Key Laboratory of Unmanned Aerial Vehicle Technology, Northwestern Polytechnical University, Xi’an, 710072, China

* Corresponding Author: Chengfu Wu. Email: email

Computers, Materials & Continua 2025, 85(1), 1339-1363. https://doi.org/10.32604/cmc.2025.067117

Abstract

This paper proposes a Multi-Agent Attention Proximal Policy Optimization (MA2PPO) algorithm aiming at the problems such as credit assignment, low collaboration efficiency and weak strategy generalization ability existing in the cooperative pursuit tasks of multiple unmanned aerial vehicles (UAVs). Traditional algorithms often fail to effectively identify critical cooperative relationships in such tasks, leading to low capture efficiency and a significant decline in performance when the scale expands. To tackle these issues, based on the proximal policy optimization (PPO) algorithm, MA2PPO adopts the centralized training with decentralized execution (CTDE) framework and introduces a dynamic decoupling mechanism, that is, sharing the multi-head attention (MHA) mechanism for critics during centralized training to solve the credit assignment problem. This method enables the pursuers to identify highly correlated interactions with their teammates, effectively eliminate irrelevant and weakly relevant interactions, and decompose large-scale cooperation problems into decoupled sub-problems, thereby enhancing the collaborative efficiency and policy stability among multiple agents. Furthermore, a reward function has been devised to facilitate the pursuers to encircle the escapee by combining a formation reward with a distance reward, which incentivizes UAVs to develop sophisticated cooperative pursuit strategies. Experimental results demonstrate the effectiveness of the proposed algorithm in achieving multi-UAV cooperative pursuit and inducing diverse cooperative pursuit behaviors among UAVs. Moreover, experiments on scalability have demonstrated that the algorithm is suitable for large-scale multi-UAV systems.

Keywords

Multi-agent reinforcement learning; multi-UAV systems; pursuit-evasion games

Cite This Article

APA Style
Lei, L., Wu, C., Chen, H. (2025). Dynamic Decoupling-Driven Cooperative Pursuit for Multi-UAV Systems: A Multi-Agent Reinforcement Learning Policy Optimization Approach. Computers, Materials & Continua, 85(1), 1339–1363. https://doi.org/10.32604/cmc.2025.067117
Vancouver Style
Lei L, Wu C, Chen H. Dynamic Decoupling-Driven Cooperative Pursuit for Multi-UAV Systems: A Multi-Agent Reinforcement Learning Policy Optimization Approach. Comput Mater Contin. 2025;85(1):1339–1363. https://doi.org/10.32604/cmc.2025.067117
IEEE Style
L. Lei, C. Wu, and H. Chen, “Dynamic Decoupling-Driven Cooperative Pursuit for Multi-UAV Systems: A Multi-Agent Reinforcement Learning Policy Optimization Approach,” Comput. Mater. Contin., vol. 85, no. 1, pp. 1339–1363, 2025. https://doi.org/10.32604/cmc.2025.067117



cc Copyright © 2025 The Author(s). Published by Tech Science Press.
This work is licensed under a Creative Commons Attribution 4.0 International License , which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
  • 2738

    View

  • 2182

    Download

  • 0

    Like

Share Link