Open Access
ARTICLE
A Power System Preventive Control Method Based on Generative Adversarial Proximal Policy Optimization
1 China Southern Power Grid Co., Ltd., Guangzhou, 510663, China
2 Yunnan Electric Power Co., Ltd., Kunming, 650041, China
3 China Southern Power Grid Digital Grid Research Institute Co., Ltd., Guangzhou, 510555, China
* Corresponding Author: Li Lin. Email:
(This article belongs to the Special Issue: Innovations and Challenges in Smart Grid Technologies)
Energy Engineering 2026, 123(7), 13 https://doi.org/10.32604/ee.2025.073445
Received 18 September 2025; Accepted 17 November 2025; Issue published 18 June 2026
Abstract
Traditional transient stability preventive control calculation methods suffer from low computational efficiency, struggling to meet the real-time decision demands of increasingly large-scale power systems. Meanwhile, reinforcement learning-based preventive control approaches, which adopt an “offline training, online application” framework, show greater promise in preventive control. However, they still face challenges such as low computational efficiency in electromechanical transient simulation and insufficient decision robustness. Therefore, this paper proposes a power system predictive control strategy based on Generative Adversarial Proximal Policy Optimization (GA-PPO). Firstly, considering multiple constraints in transient stability operation, a power system preventive control model is constructed with the objective of minimizing the total amount of adjustments, along with its Markov Decision Process (MDP) formulation. Then, the discriminator of Generative Adversarial Network (GAN) measures the gap between the expert demonstration distribution and the generated trajectory distribution, providing correction parameters for the advantage function of the Proximal Policy Optimization (PPO) algorithm, enhancing the agent’s exploration efficiency. Finally, the discriminator’s update mechanism is enhanced by Wasserstein distance, ensuring more stable training while enabling continuous adversarial interaction between discriminator and generator to explore higher convergent rewards. Case studies demonstrate that the proposed GA-PPO algorithm significantly reduces training time and achieves higher convergent rewards compared to PPO and Soft Actor-Critic (SAC) algorithms.Keywords
Cite This Article
Copyright © 2026 The Author(s). Published by Tech Science Press.This work is licensed under a Creative Commons Attribution 4.0 International License , which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.


Submit a Paper
Propose a Special lssue
View Full Text
Download PDF
Downloads
Citation Tools