Home / Journals / ENERGY / Online First / doi:10.32604/ee.2025.073445
Special Issues

Open Access

ARTICLE

A Power System Preventive Control Method Based on Generative Adversarial Proximal Policy Optimization

Yun Yu1, Li Lin2,*, Ximing Zhang1, Yang Yu3, Wei Zhang2, Kai Cheng3
1 China Southern Power Grid Co., Ltd., Guangzhou, 510663, China
2 Yunnan Electric Power Co., Ltd., Kunming, 650041, China
3 China Southern Power Grid Digital Grid Research Institute Co., Ltd., Guangzhou, 510555, China
* Corresponding Author: Li Lin. Email: email
(This article belongs to the Special Issue: Innovations and Challenges in Smart Grid Technologies)

Energy Engineering https://doi.org/10.32604/ee.2025.073445

Received 18 September 2025; Accepted 17 November 2025; Published online 22 December 2025

Abstract

Traditional transient stability preventive control calculation methods suffer from low computational efficiency, struggling to meet the real-time decision demands of increasingly large-scale power systems. Meanwhile, reinforcement learning-based preventive control approaches, which adopt an “offline training, online application” framework, show greater promise in preventive control. However, they still face challenges such as low computational efficiency in electromechanical transient simulation and insufficient decision robustness. Therefore, this paper proposes a power system predictive control strategy based on Generative Adversarial Proximal Policy Optimization (GA-PPO). Firstly, considering multiple constraints in transient stability operation, a power system preventive control model is constructed with the objective of minimizing the total amount of adjustments, along with its Markov Decision Process (MDP) formulation. Then, the discriminator of Generative Adversarial Network (GAN) measures the gap between the expert demonstration distribution and the generated trajectory distribution, providing correction parameters for the advantage function of the Proximal Policy Optimization (PPO) algorithm, enhancing the agent’s exploration efficiency. Finally, the discriminator’s update mechanism is enhanced by Wasserstein distance, ensuring more stable training while enabling continuous adversarial interaction between discriminator and generator to explore higher convergent rewards. Case studies demonstrate that the proposed GA-PPO algorithm significantly reduces training time and achieves higher convergent rewards compared to PPO and Soft Actor-Critic (SAC) algorithms.

Keywords

Generative adversarial; proximal policy optimization; transient stability; preventive control
  • 75

    View

  • 12

    Download

  • 0

    Like

Share Link